0% found this document useful (0 votes)
10 views

Advanced Database Systems Handout (1)

The document discusses the evolution of database technology towards Object-Oriented Database Management Systems (OODBMS) and Object-Relational Database Management Systems (ORDBMS), highlighting the limitations of traditional Relational Database Management Systems (RDBMS) for complex applications. It introduces key concepts such as object identity, encapsulation, and inheritance, emphasizing the importance of maintaining a direct correspondence between real-world entities and database objects. The document also outlines various database models and the characteristics of object-oriented databases, including their ability to handle complex data structures and operations.

Uploaded by

mikiwendiye92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Advanced Database Systems Handout (1)

The document discusses the evolution of database technology towards Object-Oriented Database Management Systems (OODBMS) and Object-Relational Database Management Systems (ORDBMS), highlighting the limitations of traditional Relational Database Management Systems (RDBMS) for complex applications. It introduces key concepts such as object identity, encapsulation, and inheritance, emphasizing the importance of maintaining a direct correspondence between real-world entities and database objects. The document also outlines various database models and the characteristics of object-oriented databases, including their ability to handle complex data structures and operations.

Uploaded by

mikiwendiye92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

change the its

object. They
tmodifying object’s
can be state
attribute
used by
values
to or
Chapter One
Concepts for Object-Oriented Databases
Introduction
Database technology has concentrated on the static aspects of information storage. With the

the arrival of the third generation of Database Management Systems, namely Object-Oriented

Database Management Systems (OODBMSs) and Object-Relational Database Management


Systems (ORDBMSs), the two disciplines have been combined to allow the concurrent Modelling
of both data and the processes acting upon the data.

In database systems, we have seen the widespread acceptance of RDBMSs for traditional
business applications, such as order processing, inventory control, banking, and airline
reservations. However, existing RDBMSs have proven inadequate for applications whose needs
are quite different from those of traditional business database applications. These applications
include: computer-aided design (CAD), computer-aided manufacturing (CAM), computer-aided
software engineering (CASE), Network management systems, digital publishing, geographic
information systems (GIS), office information systems (OIS) and multimedia systems, interactive
and dynamic Web sites.

Strengths of the relational model are its simplicity, its suitability for Online Transaction

Processing (OLTP), symmetric structure, and its support for data independence. However, the
relational data model, and relational DBMSs, possess significant weaknesses as they are poor

in representing real-world entities, Semantic overloading, poor support for integrity, and general

constraints, homogeneous data structure, limited operations, difficulty handling recursive

queries and impedance mismatch.

Furthermore, in RDBMS, Transactions in business processing are generally short-lived and the
concurrency control primitives and protocols such as two-phase locking are not particularly
suited for long-duration transactions, Schema changes are difficult and RDBMSs were
designed to use content-based associative access (that is, declarative statements with

1
selection based on one or more predicates) and are poor at navigational access (that is,
access based on movement between individual records).

These limitations of early data models and the need for more complex applications, need for
additional data modeling features and increased use of object-oriented programming languages
resulted in the creation of object oriented databases.

DBMS Database Models


A Database model defines the logical design and structure of a database and defines how data
will be stored, accessed and updated in a database management system. The different database
models are: Hierarchical Model, Network Model, Entity-relationship Model and Relational
Model.
• Hierarchical Model: This database model organises data into a tree-like-structure, with
a single root, to which all the other data is linked. The hierarchy starts from the Root
data, and expands like a tree, adding child nodes to the parent nodes.
• Network Model: This is an extension of the Hierarchical model. In this model data is
organised more like a graph, and are allowed to have more than one parent node and
was commonly used to map many-to-many data relationships.
• Entity-relationship Model: In this database model, relationships are created by dividing
object of interest into entity and its characteristics into attributes. Different entities are
related using relationships.
• Relational Model: In this data model, data is organised in two-dimensional tables and
the relationship is maintained by storing a common field.

• 1.1 Overview of Object-Oriented Concepts


Definitions:
• An Object is a uniquely identifiable entity that contains both the attributes that describe
the state of a ‘real world’ object and the actions that are associated with it.
• OODM: A (logical) data model that captures the semantics of objects supported in
object-oriented programming.
• OODB: A persistent and sharable collection of objects defined by an OODM.

2
• OODBMS: The manager of an OODB
• A data model: A particular way of describing data, relationships between data, and
constraints on the data.
• Data persistence: The ability for data to outlive the execution of a program and possibly
the lifetime of the program itself.

OO databases try to maintain a direct correspondence between real-world and database objects
so that objects do not lose their integrity and identity and can easily be identified and operated
The Object Oriented (OO) Data Model: In this data model, both data and their
relationships are contained in a single structure known as an object.
upon. An Object is a uniquely identifiable entity that contains both the attributes that describe
the state of a ‘real world’ object and the actions that are associated with it. (Simula 1960s). An
object, similar to program variable in programming language, is composed of two components:
state (value) and behaviour (operations), except that it will typically have a complex data
structure as well as specific operations defined by the programmer.

In OO databases, objects may have an object structure of arbitrary complexity in order to contain
all of the necessary information that describes the object. In contrast, in traditional database
systems, information about a complex object is often scattered over many relations or records,
leading to loss of direct correspondence between a real-world object and its database
representation.

The internal structure of an object in OOPLs includes the specification of instance variables,

which hold the values that define the internal state of the object. An instance variable is similar

to the concept of an attribute, except that instance variables may be encapsulated within the

object and thus are not necessarily visible to external users. Some OO models insist that all

operations a user can apply to an object must be predefined.

3
This forces a complete encapsulation of objects. To encourage encapsulation, an operation is
defined in two parts:
• Signature or interface of the operation, specifies the operation name and arguments (or
parameters).
• Method or body, specifies the implementation of the operation.

The object type definition includes an operation signature for each operation that specifies the
name of the operation, the names and types of each argument, the names of any exceptions that
can be raised, and the types of the values returned, if any. Operations can be invoked by passing
a message to an object, which includes the operation name and the parameters. The object then
executes the method for that operation. This encapsulation permits modification of the internal
structure of an object, as well as the implementation of its operations, without the need to
disturb the external programs that invoke these operations

Some OO systems provide capabilities for dealing with multiple versions of the same object (a
feature that is essential in design and engineering applications). For example, an old version of
an object that represents a tested and verified design should be retained until the new version
is tested and verified which is very crucial for designs in manufacturing process control,
architecture and software systems.

Operator polymorphism/ operator overloading: This refers to an operation’s ability to be


applied to different types of objects; in such a situation, an operation name may refer to several
distinct implementations, depending on the type of objects it is applied to.

1.2 Object Identity, Object Structure, and Type Constructors


An object is described by four characteristics: structure, identifier, name, and lifetime.
Object Identity
An OO database system provides a unique identity to each independent object stored in the
database. This unique identity is typically implemented via a unique, system-generated object

4
identifier, or OID. In an object-oriented system, each object is assigned an Object Identifier
(OID) when it is created that is:
 system-generated;
 unique to that object;
 Invariant/Immutable, in the sense that it cannot be altered during its lifetime. Once the
 object is created, this OID will not be reused for any other object, even after the object
 has been deleted;
 Independent of the values of its attributes (that is, its state). Two objects could have
 the same state but would have different identities;
 Invisible to the user (ideally).

Thus, object identity ensures that an object can always be uniquely identified, thereby
automatically providing entity integrity. There are several advantages to using OIDs as
the mechanism for object identity:
 They are efficient: OIDs require minimal storage within a complex object. Typically,
 they are smaller than textual names, foreign keys, or other semantic-based references.
 They are fast: OIDs point to an actual address or to a location within a table that gives
 the address of the referenced object. This means that objects can be located quickly
 whether they are currently stored in local memory or on disk.
 They are independent: of content OIDs do not depend upon the data contained in the
 object in any way. This allows the value of every attribute of an object to change, but
 for the object to remain the same object with the same OID.
 They cannot be modified by the user: if the OIDs are system-generated and kept
 invisible, or at least read-only, the system can ensure entity and referential integrity
 more easily. Further, this avoids the user having to maintain integrity.

Object Structure
An object structure is the common data layer that the integration framework components use
for outbound application message processing. In OODBS, the state (current value) of a
complex object may be constructed from other objects (or other values) using type constructors.

Type Constructors
A type constructor is a feature of a typed formal language that builds new types from old ones.

5
Basic types are considered to be built using nullary type constructors. Complex objects are
built from simpler ones by applying constructors to them. Attributes can be classified as simple
or complex. A simple attribute can be a primitive type such as integer, string, real, and so on,
which takes on literal values. The simplest objects are objects such as integers, characters,
byte strings of any length, Booleans and floats. Basic type constructors are atom, tuple, and
set; can also include list, bag, and array.
 The atom constructor is used to represent all basic atomic values, such as integers, real
 numbers, character strings, booleans, and any other basic data types that the system
 supports directly.
 Sets are critical because they are a natural way of representing collections from the real
 world.
 Tuples are critical because they are a natural way of representing properties of an entity.
 Lists or arrays are important because they capture order

6
Figure: Specifying the object types EMPLOYEE, DATE, and DEPARTMENT using type constructors.

1.3 Encapsulation of Operations, Methods, and Persistence


Encapsulation is one of the main characteristics of OO languages and systems which is Related
to the concepts of abstract data types and information hiding in programming languages. An
Abstract Data Type (ADT) is an abstract concept defined by axioms which represent some data
and operations on that data. An abstract data type defines not only a data representation for
objects of the type but also the set of operations that can be performed on objects of the type.
Furthermore, the abstract data type can protect the data representation from direct access by
other parts of the program. Abstract Data Types are focused on what, not how (they're framed
declaratively, and do not specify algorithms or data structures). Common examples include
lists, stacks, sets, etc.

The concept of encapsulation means that an object contains both a data structure and the set of
operations that can be used to manipulate it. The concept of information hiding means that the
external aspects of an object are separated from its internal details, which are hidden from the
outside world. In this way the internal details of an object can be changed without affecting the
applications that use it, provided the external details remain the same. This prevents an
application becoming so interdependent that a small change has enormous ripple effects. In
other words information hiding provides a form of data independence. These concepts simplify
the construction and maintenance of applications through modularization. An object is a ‘black
box’ that can be constructed and modified independently of the rest of the system, provided the
external interface is not changed.

There are two views of encapsulation: the object-oriented programming language (OOPL) view
and the database adaptation of that view. In some OOPLs encapsulation is achieved through
Abstract Data Types (ADTs). In this view an object has an interface part and an implementation
part. The interface provides a specification of the operations that can be performed on the
object; the implementation part consists of the data structure for the ADT and the functions

7
that realize the interface. Only the interface part is visible to other objects or users. In the
database view, proper encapsulation is achieved by ensuring that programmers have access
only to the interface part. In this way encapsulation provides a form of logical data
independence: we can change the internal implementation of an ADT without changing any of
the applications using that ADT.
Specifying Object Behaviour via Class Operations (methods)
The main idea is to define the behaviour of a type of object based on the operations that can be

externally applied to objects of that type. Methods/operations define the behaviour of the

object. They can be used to change the object’s state by modifying its attribute values or to query
the value of selected attributes. In general, the implementation of an operation can be specified in
a general-purpose programming language that provides flexibility and power in defining the
operations.

A method consists of a name and a body that performs the behaviour associated with the

method name. In an object-oriented language, the body consists of a block of code that carries out
the required functionality. For example, the next code represents the method to update a member
of staff’s salary. The name of the method is updateSalary, with an input parameter increment,
which is added to the instance variable salary to produce a new salary.

Messages are the means by which objects communicate. A message is simply a request from
one object (the sender) to another object (the receiver) asking the second object to execute one
of its methods. The sender and receiver may be the same object.
Classes are blueprints for defining a set of similar objects. Thus, objects that have the same
attributes and respond to the same messages can be grouped together to form a class.

8
The attributes and associated methods are defined once for the class rather than separately for
each object. A class is also an object and has its own attributes and methods, referred to as class
attributes and class methods, respectively. Class attributes describe the general characteristics of
the class, such as totals or averages. For database applications, the requirement that all
objects be completely encapsulated is too stringent. One way of relaxing this requirement is to
divide the structure of an object into visible and hidden attributes (instance variables)

Figure: Adding operations to the definitons of EMPLOYEE and DEPARTMENT

Specifying Object Persistence via Naming and Reachability


 Naming Mechanism: Assign an object a unique persistent name through which it can
be retrieved by this and other programs.

9
 Reachability Mechanism: Make the object reachable from some persistent object. An
object B is said to be reachable from an object A if a sequence of references in the
object graph lead from object A to object B.

1.4 Type Hierarchies and Inheritance


An entity type represents a set of entities of the same type such as Staff, Branch, and
PropertyForRent. We can also form entity types into a hierarchy containing superclasses and
subclasses.
 Superclass: An entity type that includes one or more distinct subgroupings of its

occurrences, which require to be represented in a data model.

 Subclass: A distinct subgrouping of occurrences of an entity type, which require to be


represented in a data model. Entity types that have distinct subclasses are called
superclasses. For example, the entities that are members of the Staff entity type may
be classified as Manager, SalesPersonnel, and Secretary. In other words, the Staff
entity is referred to as the superclass of the Manager, SalesPersonnel, and Secretary
subclasses. The relationship between a superclass and any one of its subclasses is called a
superclass/subclass relationship. For example, Staff/Manager has a superclass/subclass
relationship.

Inheritance allows one class to be defined as a special case of a more general class. These

special cases are known as subclasses and the more general cases are known as superclasses.

The process of forming a superclass is referred to as generalization; forming a subclass is


specialization. Generalization is the process of minimizing the differences between entities by

identifying their common characteristics. The process of generalization is a bottom-up

approach, which results in the identification of a generalized superclass from the original entity

types.

Specialization is the process of maximizing the differences between members of an entity by


identifying their distinguishing characteristics. Specialization is a top-down approach to defining

10
a set of superclasses and their related subclasses. The set of subclasses is defined on the basis of
some distinguishing characteristics of the entities in the superclass.

A subclass inherits all the properties of its superclass and additionally defines its own unique

properties (attributes and methods). All instances of the subclass are also instances of the
superclass. The principle of substitutability states that an instance of the subclass can be used
whenever a method or a construct expects an instance of the superclass.

Type (class) Hierarchy


A type in its simplest form can be defined by giving it a type name and then listing the names
of its visible (public) functions. When specifying a type in this section, we use the following
format:
TYPE_NAME: function, function, . . . , function
Example: PERSON: Name, Address, Birthdate, Age, SSN
Subtype/Subclass: When the designer or user must create a new type that is similar but not
identical to an already defined type.
Supertype/Superclass: It inherits all the functions of the subtype
Example (1): creating subtypes from PERSON supertype
PERSON: Name, Address, Birthdate, Age, SSN
EMPLOYEE: Name, Address, Birthdate, Age, SSN, Salary, HireDate, Seniority
STUDENT: Name, Address, Birthdate, Age, SSN, Major, GPA

EMPLOYEE subtype-of PERSON: Salary, HireDate, Seniority


– STUDENT subtype-of PERSON: Major, GPA
Example (2):
– Consider a type that describes objects in plane geometry, which may be defined as
follows:
– GEOMETRY_OBJECT: Shape, Area, ReferencePoint
– Now suppose that we want to define a number of subtypes for the
GEOMETRY_OBJECT type, as follows:

11
– RECTANGLE subtype-of GEOMETRY_OBJECT: Width, Height
– TRIANGLE subtype-of GEOMETRY_OBJECT: Side1, Side2, Angle
– CIRCLE subtype-of GEOMETRY_OBJECT: Radius
Superclass/Subclass Relationships
Each member of a subclass is also a member of the superclass. In other words, the entity in the
subclass is the same entity in the superclass, but has a distinct role. The relationship between a
superclass and a subclass is one-to-one (1:1) and is called a superclass/subclass relationship.

Chapter Two
Query Processing and Optimization
A query means a request for information. A query based on a set of pre-defined code, so
your database understands the instruction. We refer to this code as the query language. The
standard for database management is Structured Query Language (SQL).

Query Processing is the process of choosing a suitable execution strategy for processing a query.
The activities implemented by query processing are the activities involved in parsing, validating,
optimizing, and executing a query. The aims of query processing are to transform a query written
in a high-level language, typically SQL, into a correct and efficient execution strategy expressed
in a low-level language (procedural). A low-level programming language is a programming
language that provides little or no abstraction from a computer's instruction set architecture—
commands or functions in the language map closely to processor instructions.

When the relational model was first launched commercially, one of the major criticisms often cited
was inadequate performance of queries. Since then, a significant amount of research has been
devoted to developing highly efficient algorithms for processing queries. There are many ways in
which a complex query can be performed, and one of the aims of query processing is

to determine which one is the most cost effective.


Query processing can be divided into four main phases:
 Query decomposition (consisting of parsing and validation),
 Query optimization,
 Code generation, and
12
 Runtime query execution.

Figure 2.1: Query Processing


Query decomposition is the first phase of query processing which involves scanning, parsing,
and validating a given query. The aims of query decomposition are to transform a high-level
query into a relational algebra query and checks that it is syntactically and semantically correct.
A relational algebra query: is a procedural query language, which takes instances of relations
as input and yields instances of relations as output. It uses operators (unary or binary) to
perform queries. They accept relations as their input and yield relations as their output.
The relational algebra is very important for several reasons:
 It provides a formal foundation for relational model operations.
 It is used as a basis for implementing and optimizing queries in the query
processing

13
and optimization modules that are integral parts of relational database
management systems (RDBMSs)
 Some of its concepts are incorporated into the SQL standard query
language for RDBMSs.

The fundamental operations of relational algebra are: Select, Project, Union, Set difference,
Cartesian product, and Rename.
Select Operation (σ): It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r) Where σ stands for selection predicate and r stands for relation. p is
prepositional logic formula which may use connectors like and, or, and not. These terms
may use relational operators like − =, ≠, ≥, <, >, ≤.
Example: σsubject = "database"(Books)
Project Operation (∏): It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
Example: ∏subject, author (Books)
Union Operation (∪): It performs binary union between two given relations and is defined as
r ∪ s = { t | t ∈ r or t ∈ s}.
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
Example: ∏ author (Books) ∪ ∏ author (Articles)
For a union operation to be valid, the following conditions must hold − r and s must have the
same number of attributes.
 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.

Set Difference (−): The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.

14
Example: ∏ author (Books) − ∏ author (Articles)
Cartesian Product (Χ): Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
Example: σauthor = ‘Jonathan’(Books Χ Articles)
Rename Operation (ρ)(rho): The results of relational algebra are also relations but without
any name. The rename operation allows us to rename the output relation.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are: Set intersection, Assignment and Join Operations.
 Join Operations: A Join operation combines related tuples from different relations,
if and only if a given join condition is satisfied. It is denoted by ⋈.
 A NATURAL JOIN is a JOIN operation that creates an implicit join clause for you
based on the common columns in the two tables being joined. Common columns
are columns that have the same name in both tables. A NATURAL JOIN can
be an
INNER join, a LEFT OUTER join, or a RIGHT OUTER join.
The intersection operator gives the common data values between the two data sets
that are intersected. The two data sets that are intersected should be similar for the
intersection operator to work. Intersection also removes all duplicates before displaying
the result. It is denoted by ∩.
• Assignment Operator

Aggregate Functions and Grouping


A type of request that cannot be expressed in the basic relational algebra is to specify
mathematical aggregate functions on collections of values from the database. Examples of such
functions include retrieving the average or total salary of all employees or the total number of
employee tuples. Common functions applied to collections of numeric values include SUM,
AVERAGE, MAXIMUM, and MINIMUM. These functions are used in simple statistical

15
queries that summarize information from the database tuples.
o The COUNT function is used for counting tuples or values.
o Use of the Functional operator ℱ
 ℱMAX Salary (Employee) retrieves the maximum salary
value from the
Employee relation
o ℱMIN Salary, ℱSUM Salary, ℱCOUNT SSN, AVERAGE Salary (Employee)
Algorithms for Executing Query Operations
 Translating SQL Queries into Relation Algebra
 Algorithms for External Sorting
 Algorithms for SELECT and JOIN Operations
 Algorithms for PROJECT and SET Operations
 Implementing Aggregate Operations and OUTER JOINS
 Combining Operations Using Pipelining
 Using Heuristics in Query Optimization

Using Selectivity and Cost Estimates in Query Optimization


Translating SQL Queries into Relational Algebra
An SQL query is first translated into an equivalent extended relation algebra expression (as a
query tree) that is then optimized.
 Query block: The basic unit that can be translated into the algebraic
operators and optimized.
 A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if these are part
of the block.
 Nested queries within a query are identified as separate query blocks.
 Aggregate operators in SQL must be included in the extended algebra.

Example: the following is an example of translating a given nested SQL query into an
equivalent relational algebra expression.
Query Optimization
In first generation network and hierarchical database systems, the low-level procedural query

16
language is generally embedded in a high-level programming language such as COBOL, and
it is the programmer’s responsibility to select the most appropriate execution strategy. In
contrast, with declarative languages such as SQL, the user specifies what data is required rather
than how it is to be retrieved. This relieves the user of the responsibility of determining, or
even knowing, what constitutes a good execution strategy and makes the language more
universally usable. Additionally, giving the DBMS the responsibility for selecting the best
z

Query Optimization
In first generation network and hierarchical database systems, the low-level procedural query

language is generally embedded in a high-level programming language such as COBOL, and

it is the programmer’s responsibility to select the most appropriate execution strategy.

In contrast, with declarative languages such as SQL, the user specifies what data is required rather
than how it is to be retrieved. This relieves the user of the responsibility of determining, or even
knowing, what constitutes a good execution strategy and makes the language more
universally usable. Additionally, giving the DBMS the responsibility for selecting the best

strategy prevents users from choosing strategies that are known to be inefficient and gives the

17
DBMS more control over system performance.
An important aspect of query processing is query optimization. As there are many equivalent
transformations of the same high-level query, the aim of query optimization is to choose the
one that minimizes resource usage. Generally, we try to reduce the total execution time of the
query, which is the sum of the execution times of all individual operations that make up the
query. However, resource usage may also be viewed as the response time of the query, in which
case we concentrate on maximizing the number of parallel operations. Since the problem is
computationally intractable with a large number of relations, the strategy adopted is generally
reduced to finding a near optimum solution.
Query optimization is the process of choosing a suitable execution strategy for processing a
query. An internal representation (query tree or query graph) of the query is created after
scanning, parsing, and validating. The aim of query optimization is to choose the one that
minimizes resource usage. Generally, we try to reduce the total execution time of the query,
which is the sum of the execution times of all individual operations that make up the query

Dynamic versus static optimization


There are two choices for when the first three phases of query processing can be carried out.
 One option is to dynamically carry out decomposition and optimization every time
the query is run.
 The alternative option is static query optimization, where the query is parsed,
validated, and optimized once.

Advantages and disadvantages of dynamic and static query optimization


Dynamic
 The advantage of dynamic query optimization arises from the fact that all
information required to select an optimum strategy is up to date.
 The disadvantages are that the performance of the query is affected because the
query has to be parsed, validated, and optimized before it can be executed.

18
Static
 The advantages of static optimization are that the runtime overhead is removed,
and there may be more time available to evaluate a larger number of execution
strategies, thereby increasing the chances of finding a more optimum strategy.

The disadvantages arise from the fact that the execution strategy that is chosen as being optimal
when the query is compiled may no longer be optimal when the query is run.

However, a hybrid approach could be used to overcome this disadvantage, where the query is re-
optimized if the system detects that the database statistics have changed significantly since the
query was last compiled.

Figure 2.2: Typical steps in processing a high level query


Techniques of query optimization
There are two main techniques for query optimization, although the two strategies are usually
combined in practice. The first technique uses heuristic rules that order the operations in a query.

19
The other technique systematically estimates the cost of different execution strategies and choosing
the lowest cost estimate.

Using Heuristics in Query Optimization


The heuristic approach to query optimization, uses transformation rules to convert one
relational algebra expression into an equivalent form that is known to be more efficient.

Process for heuristics optimization


 The parser of a high-level query generates an initial internal representation;
 Apply heuristics rules to optimize the internal representation.
 A query execution plan is generated to execute groups of operations based on the access
paths available on the files involved in the query.

The main heuristic is to apply first the operations that reduce the size of intermediate results.
• E.g., Apply SELECT and PROJECT operations before applying the JOIN or other
binary operations.

Query tree
A query tree is tree data structure that corresponds to a relational algebra expression. It
represents the input relations of the query as leaf nodes of the tree, and represents the relational
algebra operations as internal nodes. An execution of the query tree consists of executing an
internal node operation whenever its operands are available and then replacing that internal node
by the relation that results from executing the operation.

Query graph
A graph data structure that corresponds to a relational calculus expression. It does not indicate an
order on which operations to perform first. There is only a single graph corresponding to each
query.

Example:
For every project located in ‘Stafford’, retrieve the project number, the controlling department
number and the department manager’s last name, address and birthdate

SQL query:

20
Relation algebra:
pPNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((sPLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

Figure 2.3: (a) Query tree for the relational algebra, (b) Query tree for SQL query

Heuristic Optimization of Query Trees:


 The same query could correspond to many different relational algebra expressions and
hence many different query trees.
 The task of heuristic optimization of query trees is to find a final query tree that is efficient
to execute.

Heuristic Transformation Rules for the Relational Algebra Operations


1. Cascade of : A conjunctive selection condition can be broken up into a cascade
(sequence) of individual  operations:

21
2. Commutativity of : The  operation is commutative:

3. Cascade of : In a cascade (sequence) of  operations, all but the last one can be
ignored:

4. Commuting  with : If the selection condition c involves only the attributes A1, ...,
An in the projection list, the two operations can be commuted:

5. Commutativity of ( and x ): The operation is commutative as is the x operation:

6. Commuting  with (or x ): If all the attributes in the selection condition c involve only the
attributes of one of the relations being joined—say, R—the two operations can be commuted as
follows:

7. Commuting  with (or x): Suppose that the projection list is L = {A1, ..., An, B1, ..., Bm},
where A1, ..., An are attributes of R and B1, ..., Bm are attributes of S. If the join condition c
involves only attributes in L, the two operations can be commuted as follows:

22
If the join condition C contains additional attributes not in L, these must be added to the projection
list, and a final  operation is needed.

8. Commutativity of set operations: The set operations υ and ∩ are commutative but “–” is not.

9. Associativity of , x, υ, and ∩ : These four operations are individually associative; that is, if 
stands for any one of these four operations (throughout the expression), we have

10. Commuting  with set operations: The  operation commutes with υ, ∩, and –. If  stands for
any one of these three operations, we have

11. The  operation commutes with υ.

12. Converting a (, x) sequence into : If the condition c of a  that follows a


x Corresponds to a join condition, convert the (, x) sequence into a as follows:

Outline of a Heuristic Algebraic Optimization Algorithm:


1. Using rule 1, break up any select operations with conjunctive conditions into a cascade of select
operations.

2. Using rules 2, 4, 6, and 10 concerning the commutativity of select with other operations,
move each select operation as far down the query tree as is permitted by the attributes involved in
the select condition.

23
3. Using rule 9 concerning associativity of binary operations, rearrange the leaf nodes of the tree
so that the leaf node relations with the most restrictive select operations are executed first in the
query tree representation.

4. Using Rule 12, combine a Cartesian product operation with a subsequent select operation
in the tree into a join operation.

5. Using rules 3, 4, 7, and 11 concerning the cascading of project and the commuting of project
with other operations, break down and move lists of projection attributes down the tree as far as
possible by creating new project operations as needed.

6. Identify subtrees that represent groups of operations that can be executed by a single algorithm.

Summary of Heuristics for Algebraic Optimization:


1. The main heuristic is to apply first the operations that reduce the size of intermediate results.

2. Perform select operations as early as possible to reduce the number of tuples and perform
project operations as early as possible to reduce the number of attributes. (This is done by moving
select and project operations as far down the tree as possible.)

3. The select and join operations that are most restrictive should be executed before other similar
operations. (This is done by reordering the leaf nodes of the tree among themselves and
adjusting the rest of the tree appropriately.)

Query Execution Plans


Execution plans can tell you how a query will be executed, or how a query was executed. An
execution plan for a relational algebra query consists of a combination of the relational algebra
query tree and information about the access methods to be used for each relation as well as the
methods to be used in computing the relational operators stored in the tree.

Using Selectivity and Cost Estimates in Query Optimization


Cost-based query optimization
Cost-based query optimization compares different strategies based on relative costs (amount of
time that the query needs to run) and selects and executes one that minimizes the cost. The cost of
a strategy is just an estimate based on how many estimated CPU and I/O resources that the query

24
will use. Estimate and compare the costs of executing a query using different execution
strategies and choose the strategy with the lowest cost estimate.

Cost Components for Query Execution


1. Access cost to secondary storage

2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
NB: Different database systems may focus on different cost components.
Catalog Information Used in Cost Functions
• Information about the size of a file
 number of records (tuples) (r),
 record size (R),
 number of blocks (b)
 blocking factor (bfr)

• Information about indexes and indexing attributes of a file


 Number of levels (x) of each multilevel index
 Number of first-level index blocks (bI1)
 Number of distinct values (d) of an attribute
 Selectivity (sl) of an attribute
 Selection cardinality (s) of an attribute. (s = sl * r)

Semantic Query Optimization


Semantic query optimization is the process of transforming a query issued by a user into a
different query which, because of the semantics of the application, is guaranteed to yield the correct
answer for all states of the database. Semantic Query Optimization uses constraints specified
on the database schema in order to modify one query into another query that is more efficient to
execute.

25
Consider the following SQL query,

Explanation:
 Suppose that we had a constraint on the database schema that stated that no employee
can earn more than his or her direct supervisor. If the semantic query optimizer checks for
the existence of this constraint, it need not execute the query at all because it knows
that the result of the query will be empty.

Techniques known as theorem proving can be used for this purpose.

26
Chapter Three
Transaction Processing Concepts
Introduction
Database management system (DBMS) is a software package/system to facilitate creation and
maintenance of database. DBMS Support different types of databases. Databases can be
classified according to the number of users: database locations: and expected type and
extent of use. Depending on the number of users databases are classified into single user
databases and multiuser databases.
 Single user databases: At most one user at a time can use the system
 Multiuser databases: Many users can access the system concurrently.

Transaction and System Concepts


Concurrency means multiple computations are happening at the same time.
Concurrent processing describes two tasks occurring asynchronously, meaning the order in
which the tasks are executed is not predetermined. This could be done in two ways:
Interleaved processing: Concurrent execution of processes is interleaved in a single CPU and
Parallel processing: Processes are concurrently executed in multiple CPUs. Many DBMSs allow
users to undertake simultaneous operations on the database. If these operations are not controlled,
the accesses may interfere with one another and the database can become inconsistent.
Among the various functions the three closely related functions that are intended to ensure that the
database is reliable and remains in a consistent state are:transaction support,

 concurrency control services, and


 recovery services

Transaction Support
Transaction: An action, or series of actions, carried out by a single user or application
program, which reads or updates the contents of the database. A transaction is a logical unit of
work on the database. It may be an entire program, a part of a program, or a single command (for
example, the SQL command INSERT or UPDATE), and it may involve any number of operations
on the database. In the database context, the execution of an application program can be thought

27
of as one or more transactions with non-database processing taking place in between. A
transaction should always transform the database from one consistent state to another.

Basic operations are read and write:


 read_item(X): Reads a database item named X into a program variable. To simplify our
notation, we assume that the program variable is also named X.
 write_item(X): Writes the value of program variable X into the database item named X.

Read and Write Operations


 Basic unit of data transfer from the disk to the computer main memory is one block. In general,
a data item (that is read or written) will be the field of some record in the database, although
it may be a larger unit such as a record or even a whole block.read_item(X) command includes
the following steps:
 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if that disk block is not already in
some main memory buffer). Copy item X from the buffer to the program variable
named X. write_item(X) command includes the following steps:
 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if that disk block is not already in
some main memory buffer).
 Copy item X from the program variable named X into its correct location in the buffer.
 Store the updated block from the buffer back to disk (either immediately or at some
later point in time).

Example:
Two sample transactions to illustrate the concepts of a transaction,
a. Transaction T1
b. Transaction T2

28
A transaction should always transform the database from one consistent state to another,
although we accept that consistency may be violated while the transaction is in progress. A
transaction can have one of two outcomes. If it completes successfully, the transaction is said to
have committed and the database reaches a new consistent state. On the other hand, if the
transaction does not execute successfully, the transaction is aborted. If a transaction is aborted, the
database must be restored to the consistent state it was in before the transaction started.

Such a transaction is rolled back or undone. A committed transaction cannot be aborted. If we


decide that the committed transaction was a mistake, we must perform another compensating
transaction to reverse its effects. However, an aborted transaction that is rolled back can be
restarted later and, depending on the cause of the failure, may successfully execute and commit at
that time.

ii. Concurrency Control


Many DBMSs allow users to undertake simultaneous operations on the database. If these
operations are not controlled, the accesses may interfere with one another and the database can
become inconsistent. To overcome this, the DBMS implements a concurrency control protocol that
prevents database accesses from interfering with one another. Concurrency control is the process
of managing simultaneous operations on the database without having them interfere with one
another.

The Need for Concurrency Control


A major objective in developing a database is to enable many users to access shared data
concurrently. Concurrent access is relatively easy if all users are only reading data, as there is no
way that they can interfere with one another. However, when two or more users are

29
accessing the database simultaneously and at least one is updating data, there may be
interference that can result in inconsistencies. Therefore, we examine three potential problems
caused by concurrency: the lost update problem, the uncommitted dependency problem, and the
inconsistent analysis problem.

The Lost Update Problem: An apparently successfully completed update operation by one user
can be overridden by another user. This occurs when two transactions that access the same database
items have their operations interleaved in a way that makes the value of some database item
incorrect.

Figure 3.1: The lost update problem


The uncommitted dependency problem: also known as the Temporary Update (or Dirty Read)

Problem occurs when one transaction is allowed to see the intermediate results of another
transaction before it has committed. This means that when one transaction updates a database item
and then the transaction fails for some reason. However, the updated item is accessed by another
transaction before it is changed back to its original value.

30
Figure 3.2: The Temporary Update (or Dirty Read) Problem

The inconsistent analysis problem: also known as The Incorrect Summary Problem occurs when a
transaction reads several values from the database but a second transaction updates some of them
during the execution of the first. For example, a transaction that is summarizing data in a database
(for example, totalling balances) will obtain inaccurate results if, while it is executing, other
transactions are updating the database. Another example is if one transaction is calculating an
aggregate summary function on a number of records while other transactions are updating some
of these records, the aggregate function may calculate some values before they are updated and
others after they are updated.

Figure 3.3: The Incorrect Summary Problem

31
iii. Recovery Services
Database recovery is the process of restoring the database to a correct state following a failure.

The failure may be the result of a system crash due to hardware or software errors, a media failure,
such as a head crash, or a software error in the application, such as a logical error in the program
that is accessing the database. It may also be the result of unintentional or intentional corruption
or destruction of data or facilities by system administrators or users. Whatever the underlying cause
of the failure, the DBMS must be able to recover from the failure and restore the database to a
consistent state.

What causes a Transaction to fail (Why recovery is needed?)


1. A computer failure (system crash): A hardware or software error occurs in the computer
system during transaction execution. If the hardware crashes, the contents

of the computer’s internal memory may be lost.

2. A transaction or system error: Some operation in the transaction may cause it to fail, such as
integer overflow or division by zero. Transaction failure may also occur because of erroneous
parameter values or because of a logical programming error.

In addition, the user may interrupt the transaction during its execution.

3. Local errors or exception conditions detected by the transaction: Certain conditions necessitate
cancellation of the transaction. For example, data for the transaction may not be found. A
condition, such as insufficient account balance in a banking database, may cause a transaction,
such as a fund withdrawal from that account, to be cancelled. A programmed abort in the
transaction causes it to fail.

4. Concurrency control enforcement: The concurrency control method may decide to

abort the transaction, to be restarted later, because it violates serializability or because


several transactions are in a state of deadlock.

5. Disk failure: Some disk blocks may lose their data because of a read or write malfunction
or because of a disk read/write head crash. This may happen during a read or a write operation of
the transaction.

32
6. Physical problems and catastrophes: This refers to an endless list of problems that includes
power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and
mounting of a wrong tape by the operator.

A transaction is an atomic unit of work that is either completed in its entirety or not done at all.
For recovery purposes, the system needs to keep track of when the transaction starts,
terminates, and commits or aborts.

Transaction States in DBMS


 Active State: When the instructions of the transaction are running then the transaction is in
active state. If all the ‘read and write’ operations are performed without any error then it goes
to the “partially committed state”; if any instruction fails, it goes to the “failed state”.
 Partially Committed State: After completion of all the read and write operation the
changes are made in main memory or local buffer. If the the changes are made
permanent on the Data Base then the state will change to “committed state” and in case of
failure it will go to the “failed state”.
 Committed State: It is the state when the changes are made permanent on the Data Base and
the transaction is complete and therefore terminated in the “terminated state”.
 Failed State: When any instruction of the transaction fails, it goes to the “failed state” or if
failure occurs in making a permanent change of data on Data Base.
 Aborted State: After having any type of failure the transaction goes from “failed state” to
“aborted state” and since in previous states, the changes are only made to local buffer or main
memory and hence these changes are deleted or rolled-back.
 Terminated State: If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the old
transaction is terminated.

33
Figure 3.4: State transition diagram illustrating the states for transaction execution

Recovery Manager
If a failure occurs during the transaction, then the database could be inconsistent. It is the task of
the recovery manager to ensure that the database is restored to the state it was in before the start
of the transaction, and therefore a consistent state. Recovery manager keeps track of the following
operations for recovery purposes:

 begin_transaction: This marks the beginning of transaction execution.


 read or write: These specify read or write operations on the database items that are
executed as part of a transaction.
 End_transaction: This specifies that read and write transaction operations have ended and
marks the end limit of transaction execution.
 commit_transaction: This signals a successful end of the transaction so that any
changes (updates) executed by the transaction can be safely committed to the database
and will not be undone.
 rollback (or abort): This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database must be undone.

Operators used by recovery manager:

34
 undo: Similar to rollback except that it applies to a single operation rather than to a
whole transaction.
 redo: This specifies that certain transaction operations must be redone to ensure that
all the operations of a committed transaction have been applied successfully
to the database

The System Log


Log or Journal: The log keeps track of all transaction operations that affect the values of
database items. This information may be needed to permit recovery from transaction failures.

The log is kept on disk, so it is not affected by any type of failure except for disk or catastrophic
failure. In addition, the log is periodically backed up to archival storage (tape) to guard against
such catastrophic failures.

Example: Sample system log record


 T in the following discussion refers to a unique transaction-id that is generated
automatically by the system and is used to identify each transaction:
 Types of log record:
 [start_transaction,T]: Records that transaction T has started execution.
 [write_item,T,X,old_value,new_value]: Records that transaction T has
changed the value of database item X from old_value to new_value.
 [read_item,T,X]: Records that transaction T has read the value of
database item X.
 [commit,T]: Records that transaction T has completed successfully, and
affirms that its effect can be committed (recorded permanently) to the
database.
 [abort,T]: Records that transaction T has been aborted.

Recovery using log records


If the system crashes, we can recover to a consistent database state by examining the log:

1. Because the log contains a record of every write operation that changes the value of

35
some database item, it is possible to undo the effect of these write operations of a
transaction T by tracing backward through the log and resetting all items changed by a write
operation of T to their old_values.

2. We can also redo the effect of the write operations of a transaction T by tracing forward through
the log and setting all items changed by a write operation of T (that did not get done permanently)
to their new_values.

Commit Point of a Transaction


• A transaction T reaches its commit point when all its operations that access the database have
been executed successfully and the effect of all the transaction operations on the database has been
recorded in the log. Beyond the commit point, the transaction is said to be committed, and its effect
is assumed to be permanently recorded in the database.
The transaction then writes an entry [commit,T] into the log.
Roll Back of transactions
• Needed for transactions that have a [start_transaction,T] entry into the log but no commit
entry [commit,T] into the log.
Redoing transactions
• Transactions that have written their commit entry in the log must also have recorded all their
write operations in the log; otherwise they would not be committed, so their effect on the database
can be redone from the log entries. (Notice that the log file must be kept on disk. At the time of a
system crash, only the log entries that have been written back to disk are considered in the recovery
process because the contents of main memory may be lost.)

Force writing a log


• Before a transaction reaches its commit point, any portion of the log that has not been written to
the disk yet must now be written to the disk. This process is called force-writing the log file before
committing a transaction.
Desirable Properties of Transactions
Transactions should possess several properties, often called the ACID properties; they should be
enforced by the concurrency control and recovery methods of the DBMS. The following are the
ACID properties:

36
• Atomicity: - A transaction is an atomic unit of processing; it should either be performed in its
entirety or not performed at all.

• Consistency preservation: - A transaction should be consistency preserving, meaning that if it is


completely executed from beginning to end without interference from other transactions, it should
take the database from one consistent state to another.

• Isolation: - A transaction should appear as though it is being executed in isolation from other
transactions, even though many transactions are executing concurrently. That is, the execution of
a transaction should not be interfered with by any other transactions executing concurrently.

• Durability or permanency: - The changes applied to the database by a committed transaction


must persist in the database. These changes must not be lost because of any failure

Schedules and Recoverability


The objective of a concurrency control protocol is to schedule transactions in such a way as to
avoid any interference between them, and hence prevent the types of problem described in the
previous section. One obvious solution is to allow only one transaction to execute at a time: one
transaction is committed before the next transaction is allowed to begin.

However, the aim of a multi-user DBMS is also to maximize the degree of concurrency or
parallelism in the system, so that transactions that can execute without interfering with one another
can run in parallel. For example, transactions that access different parts of the database can be
scheduled together without interference.

Schedule is a sequence of the operations by a set of concurrent transactions that preserves the

order of the operations in each of the individual transactions. A transaction comprises a


sequence of operations consisting of read and/or write actions to the database, followed by a
commit or abort action. A schedule S consists of a sequence of the operations from a set of n
transactions T1, T2, . . . , Tn, subject to the constraint that the order of operations for each
transaction is preserved in the schedule. Thus, for each transaction Ti in schedule S, the order of
the operations in Ti must be the same in schedule S.

Types of Schedules based Recoverability in DBMS

37
1. Recoverable Schedule: A schedule is said to be recoverable if it is recoverable as name suggest.
One where no transaction needs to be rolled back. A schedule S is recoverable if no transaction T
in S commits until all transactions T’ that have written an item that T reads have committed. A
schedule where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written
by Ti, then the commit operation of Ti precedes the commit operation of Tj.

2. Cascadeless Schedule: When no read or write-write occurs before execution of transaction then
corresponding schedule is called cascadeless schedule. One where every transaction reads only the
items that are written by committed transactions.

3. Schedules requiring cascaded rollback/ Cascaded Abort: A schedule in which uncommitted


transactions that read an item from a failed transaction must be rolled back. ransaction T1 abort as
T2 read data that written by T1 which is not committed. Hence it’s cascading rollback.

4. Strict Schedules: A schedule in which a transaction can neither read nor write an item X until
the last transaction that wrote X has committed. Strict schedule is strict in nature.

Characterizing Schedules based on Serializability


Serial schedule: a schedule where the operations of each transaction are executed
consecutively without any interleaved operations from other transactions. The transactions are
performed in serial order, two transactions T1 and T2, serial order would be T1 followed by T2, or
T2 followed by T1. Non-serial schedule is a schedule where the operations from a set of concurrent
transactions are interleaved.

Serializability
A schedule S is serializable if it is equivalent to some serial schedule of the same n transactions.

The objective of serializability is to find non serial schedules that allow transactions to execute
concurrently without interfering with one another, and thereby produce a database state that could
be produced by a serial execution. In serializability, the ordering of read and write operations
is important:

 If two transactions only read a data item, they do not conflict and order is not important.
 If two transactions either read or write completely separate data items, they do not conflict and
order is not important.

38
 If one transaction writes a data item and another either reads or writes the same data item, the
order of execution is important.

Characterizing Schedules based on Serializability


1. Result equivalent: Two schedules are called result equivalent if they produce the same final
state of the database.

2. Conflict equivalent: Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules. Two operations in a schedule are said to
conflict if they belong to different transactions, access the same database item, and either both are
write_item operations or one is a write_item and the other a read_item. If two conflicting
operations are applied in different orders in two schedules, the effect can be different on the
database or on the transactions in the schedule, and hence the schedules are not conflict equivalent.
If two conflicting operations are applied in different orders in two schedules, the effect can be
different on the database or on the transactions in the schedule,and hence the schedules are not
conflict equivalent.

3. Conflict serializable: Using the notion of conflict equivalence, we define a schedule S to be


conflict serializable if it is conflict equivalent to some serial schedule S’. A conflictserializable
schedule orders any conflicting operations in the same way as some serial execution.

Being serializable is not the same as being serial. Being serializable implies that the schedule is
the correct schedule.

 It will leave the database in a consistent state.


 The interleaving is appropriate and will result in a state as if the transactions were
serially executed, yet will achieve efficiency due to concurrent execution.

Testing for Conflict Serializability of a Schedule


There is a simple algorithm for determining whether a particular schedule is conflict
serializable or not. Most concurrency control methods do not actually test for serializability.

Rather protocols, or rules, are developed that guarantee that any schedule that follows these rules
will be serializable. Algorithm can be used to test a schedule for conflict serializability.

39
View Equivalence and View Serializability
Another less restrictive definition of equivalence of schedules is called view equivalence. This
leads to another definition of serializability called view serializability. Two schedules S and S are
said to be view equivalent if the following three conditions hold:

1. The same set of transactions participates in S and S’, and S and S’ include the same operations
of those transactions.

2. For any operation Ri(X) of Ti in S, if the value of X read by the operation has been written by
an operation Wj(X) of Tj (or if it is the original value of X before the schedule started), the same
condition must hold for the value of X read by operation Ri(X) of Ti in S’.

3. If the operation Wk(Y) of Tk is the last operation to write item Y in S, then Wk(Y) of Tk must
also be the last operation to write item Y in S’.

The idea behind view equivalence is that, as long as each read operation of a trans-action reads the
result of the same write operation in both schedules, the write operations of each transaction must
produce the same results. The read operations are hence said to see the same view in both
schedules. A schedule S is said to be view serializable if it is view equivalent to a serial

schedule.
Relationship between view and conflict equivalence
The definitions of conflict serializability and view serializability are similar if a condition known
as the constrained write assumption (or no blind writes) holds on all transactions in the schedule.
This condition states that any write operation wi(X) in Ti is preceded by a ri(X) in Ti and that the
value written by wi(X) in Ti depends only on the value of X read by ri(X).

This assumes that computation of the new value of X is a function f(X) based on the old value of
X read from the database. A blind write is a write operation in a transaction T on an item X that is
not dependent on the value of X, so it is not preceded by a read of X in the transaction

T.

40
Conflict serializability is stricter than view serializability. With unconstrained write (or blind
write), a schedule that is view serializable is not necessarily conflict serializable. Any conflict
serializable schedule is also view serializable, but not vice versa.

Other Types of Equivalence of Schedules


Serializability of schedules is sometimes considered to be too restrictive as a condition for
ensuring the correctness of concurrent executions. Some applications can produce schedules that
are correct by satisfying conditions less stringent than either conflict serializability or view
serializability. An example is the type of transactions known as debit-credit transactions—for
example, those that apply deposits and withdrawals to a data item whose value is the current
balance of a bank account

The semantics of debit-credit operations is that they update the value of a data item X by either
subtracting from or adding to the value of the data item. Because addition and subtraction
operations are commutative—that is, they can be applied in any order—it is possible to produce
correct schedules that are not serializable.

Transaction Support in SQL


The basic definition of an SQL transaction is similar to our already defined concept of a
transaction. That is, it is a logical unit of work and is guaranteed to be atomic. A single SQL
statement is always considered to be atomic—either it completes execution without an error or it
fails and leaves the database unchanged.

With SQL, there is no explicit Begin_Transaction statement. Transaction initiation is done


implicitly when particular SQL statements are encountered. However, every transaction must have
an explicit end statement, which is either a COMMIT or a ROLLBACK. Every transaction
has certain characteristics attributed to it. These characteristics are specified by a SET
TRANSACTION statement in SQL. The characteristics are the access mode, the diagnostic
area size, and the isolation level.

The access mode can be specified as READ ONLY or READ WRITE. The default is READ
WRITE, unless the isolation level of READ UNCOMMITTED is specified, in which case READ
ONLY is assumed. A mode of READ WRITE allows select, update, insert, delete, and create

41
commands to be executed. A mode of READ ONLY, as the name implies, is simply for data
retrieval.

This transaction consists of first inserting a new row in the EMPLOYEE table and then
updating the salary of all employees who work in department 2. If an error occurs on any of the
SQL statements, the entire transaction is rolled back. This implies that any updated salary (by this
transaction) would be restored to its previ-ous value and that the newly inserted row would be
removed

42

You might also like