Module4 PartA
Module4 PartA
1
Module 4
Module:4 QUERY PROCESSING AND
TRANSACTION PROCESSING
Translating SQL Queries into Relational Algebra –
heuristic query optimization – Introduction to
Transaction Processing – Transaction and System
concepts - Desirable properties of Transactions –
Characterizing schedules based on recoverability –
Characterizing schedules based on serializability
2
outline
Relational Algebra
Unary Relational Operations
Relational Algebra Operations
From Set Theory
Binary Relational Operations
Additional Relational
Operations
Examples of Queries in
Relational Algebra
3
Relational Algebra
4
Unary Relational Operations
SELECT (symbol: σ)
PROJECT (symbol: π)
RENAME (symbol: ρ)
5
Example Schema for illustration
6
SELECT (σ)
7
Select Operation - Example
8
Projection(π)
LNAME,FNAME,SALARY (EMPLOYEE)
9
Illustration – Select and Project
Operations for Company Schema
10
Rename (ρ)
11
Rename (ρ)
(contd.)
Example:
FNAME, LNAME, SALARY ( DNO=5(EMPLOYEE))
OR We can explicitly show the
sequence of operations, giving a name to each
intermediate relation:
DEP5_EMPS DNO=5(EMPLOYEE)
RESULT FNAME, LNAME, SALARY
(DEP5_EMPS)
12
Relational Algebra Operations From
Set Theory
UNION (υ)
INTERSECTION ( ),
DIFFERENCE (-)
CARTESIAN PRODUCT ( x )
13
Union operation (υ)
UNION is symbolized by ∪
symbol.
It includes all tuples that are in
tables A or in B.
It also eliminates duplicate tuples.
set A UNION set B would be
expressed as:
The result <- A ∪ B
14
Union operation (υ)
contd.
15
Example
To retrieve the social security
numbers of all employees who
work in department 5 (Result 1
below) or directly supervise an
employee who works in
department 5 (Result 2 below)
16
Set Difference (-)
17
Intersection
18
Relational Algebra Operations From
Set Theory (cont.)
19
Cartesian product(X)
FEMALE_EMPS SEX=’F’(EMPLOYEE)
EMPNAMES FNAME, LNAME, SSN (FEMALE_EMPS)
20
Join Operations
21
Types of joins
Various forms of join
operation are:
Inner Joins
Theta join
EQUI join
Natural join
Outer joins
Left Outer Join
Right Outer Join
Full Outer Join
22
SQL Joins
• SQL Join is used to fetch data from two or more tables,
which is joined to appear as single set of data.
• It is used for combining column from two or more
tables by using values common to both tables.
• Types of Join
• Inner
• Outer (Left, Right)
• Cross
• Natural
• Cartesian
Different Types of SQL JOIN’s
Inner Join
• The INNER JOIN keyword selects records that have matching values in
both tables.
Explicit Inner Join and Cartesian
• The INNER JOIN keyword selects records that have matching values in
both tables.
Implicit Inner Join and Cartesian
• The INNER JOIN keyword selects records that have matching values in
both tables.
• The INNER JOIN keyword selects records that have matching values in
both tables.
2 BBB 1
CHENNAI
2 BBB 2
MUMBAI
4 DDD 1
CHENNAI
2 BBB 3
DELHI
4 DDD 2
MUMBAI
4 DDD 3 DELHI
Cross Join (with Equality)
• The INNER JOIN keyword selects records that have matching values in
both tables.
2 BBB 2 MUMBAI
Left Outer Join
• LEFT (OUTER) JOIN: Returns all records from the left table, and the
matched records from the right table
2 BBB
4
MUMBAI DDD
Right Outer Join
• RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
2 BBB MUMBAI
3 DEHLI
Full Outer Join
Inner Join
35
Inner join Type : Theta Join
It is denoted by symbol θ
Example: A ⋈θ B
36
Join operation on Department and
Employee relations
37
Inner join Type : EQUI join
38
Natural join (⋈)
39
Natural join - Example
40
OUTER JOIN
41
Left Outer Join
42
Left Outer Join- Example
43
Right Outer Join
45
Full Outer Join
46
Full Outer Join - Example
47
Additional Relational
Operations
48
Additional Relational
Operations (cont.)
Use of the Functional operator ℱ
50
Introduction to Query
Processing
51
Basic Steps in Query Processing
1. Parsing and
translation
2. Optimization
3. Evaluation
52
Basic Steps in Query Processing
(Cont.)
Process for heuristics optimization
53
Query Representation
Query tree:
A tree data structure that corresponds to a relational
algebra expression.
It represents the input relations of the query as leaf nodes
of the tree and the relational algebra operations as
internal nodes.
An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
relation that results from executing the operation.
Query graph:
A graph data structure that corresponds to a relational
calculus expression.
It does not indicate an order on which operations to
perform first. There is only a single graph corresponding
to each query.
54
1. Translating SQL Queries into
Relational Algebra
Query block: the basic unit that can be
translated into the algebraic operators
and optimized.
A query block contains a single SELECT-
FROM-WHERE expression, as well as
GROUP BY and HAVING clause if these
are part of the block.
Nested queries within a query are
identified as separate query blocks.
Aggregate operators in SQL must be
included in the extended algebra.
55
Translating SQL Queries into
Relational Algebra- Example
56
Example
Example:
For every project located in ‘Stafford’,
retrieve the project number, the
controlling department number and
the department manager’s last name,
address and birthdate.
Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT))
MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE FROM PROJECT AS
P,DEPARTMENT AS D, EMPLOYEE AS E WHERE
P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
57
Query tree
58
Query graph
59
2. Heuristic Optimization
Heuristic Optimization of Query Trees:
The same query could correspond to
many different relational algebra
expressions — and hence many
different query trees.
60
Steps in optimizing a Query tree
1. Moving SELECT operations down the query tree
2. Applying the more restrictive SELECT operation
first
3. Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations
4. Moving the PROJECT operations down the tree
Example:
Q: SELECT LNAME
FROM EMPLOYEE,
WORKS_ON, PROJECT
WHERE PNAME =
‘AQUARIUS’ AND PNMUBER=PNO AND
ESSN=SSN AND BDATE > ‘1957-12-31’;
61
Steps in optimizing a Query tree-
Illustration
62
Steps in optimizing a Query tree
(Contd.)
63
Steps in optimizing a Query tree
(Contd.)
64
Using Heuristics in Query
Optimization –Transformation rules
65
Transformation rules (Cont.)
66
Transformation rules (Cont.)
67
Transformation rules (Cont.)
68
Transaction Processing
69
Transaction - Definition
73
Transaction Properties –
ACID Properties
75
Transaction Properties -
Example of Fund Transfer
Transaction to transfer $50 from account A to account B:
1.read(A)
2.A := A – 50
3.write(A)
4.read(B)
5.B := B + 50
6.write(B)
Durability requirement:
Once the user has been notified that the transaction
has been completed (i.e., the transfer of the $50 has
taken place), the updates to the database by the
transaction must persist even if there are software or
hardware failures.
76
Transaction Properties -
Example of Fund Transfer
Consistency requirement:
The sum of A and B is unchanged by the execution
of the transaction
In general, consistency requirements include
Explicitly specified integrity constraints such as
primary keys and foreign keys
Implicit integrity constraints
e.g. sum of balances of all accounts, minus sum
of loan amounts must equal value of cash-in-
hand
A transaction must see a consistent database.
During transaction execution the database may be
temporarily inconsistent.
When the transaction completes successfully the
database must be consistent
Erroneous transaction logic can lead to
inconsistency
77
Transaction Properties -
Example of Fund Transfer
Isolation requirement — if between steps 3 and 6, another
transaction T2 is allowed to access the partially updated
database, it will see an inconsistent database (the sum A + B
will be less than it should be).
T1 T2
1.read(A)
2.A := A – 50
3.write(A)
read(A), read(B), print(A+B)
4.read(B)
5.B := B + 50
6.write(B)
Isolation can be ensured trivially by running transactions
serially
that is, one after the other.
However, executing multiple transactions concurrently has
significant benefits.
78
Transaction States
Active – The initial state; the transaction stays in this
state while it is executing
Partially committed – After the final statement has
been executed.
Failed -- After the discovery that normal execution can
no longer proceed.
Aborted – After the transaction has been rolled back
and the database restored to its state prior to the start of
the transaction. Two options after it has been aborted:
Restart the transaction
Can be done only if no internal logical error
Kill the transaction
Committed – after successful completion.
79
Transaction States
82
Schedules
Schedule – A sequence of instructions that specify the
chronological order in which instructions of concurrent
transactions are executed
A schedule for a set of transactions must consist of all
instructions of those transactions
Must preserve the order in which the instructions
appear in each individual transaction.
A transaction that successfully completes its execution
will have a commit instructions as the last statement
By default transaction assumed to execute commit
instruction as its last step
A transaction that fails to successfully complete its
execution will have an abort instruction as the last
statement
83
Schedule - 1
Let T1 transfer $50 from A to B, and T2 transfer 10%
of the balance from A to B.
A serial schedule in which T1 is followed by T2 :
89
Conflicting Instructions
Instructions li and lj of transactions Ti and Tj
respectively, conflict if and only if there exists some
item Q accessed by both li and lj, and at least one of
these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical)
temporal order between them.
If li and lj are consecutive in a schedule and they
do not conflict, their results would remain the
same even if they had been interchanged in the
schedule.
90
Conflict Serializability
If a schedule S can be transformed into a schedule S
´ by a series of swaps of non-conflicting instructions,
we say that S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it
is conflict equivalent to a serial schedule
91
Conflict Serializability
Schedule 3 can be transformed into Schedule 6, a serial
schedule where T2 follows T1, by series of swaps of non-
conflicting instructions.
Therefore Schedule 3 is conflict serializable.
Schedule 3 Schedule 6
Image Source: Database System Concepts by Abraham Silberschatz, Henry
F.Korth and S.Sudarshan, 92
Tata Mc Graw Hill, 2011
Conflict Serializability
Example of a schedule that is not conflict serializable:
Schedule 3 Schedule 6
Image Source: Database System Concepts by Abraham Silberschatz, Henry
F.Korth and S.Sudarshan, 93
Tata Mc Graw Hill, 2011
Conflict Serializability
Example of a schedule that is not conflict serializable:
Schedule 3 Schedule 6
Image Source: Database System Concepts by Abraham Silberschatz, Henry
F.Korth and S.Sudarshan, 94
Tata Mc Graw Hill, 2011
View Serializability
Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the
following three conditions are met, for each data item
Q,
1. If in schedule S, transaction Ti reads the initial
value of Q, then in schedule S’ also transaction
Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes
read(Q), and that value was produced by
transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that
was produced by the same write(Q) operation
of transaction Tj .
95
View Serializability
3. The transaction (if any) that performs the final
write(Q) operation in schedule S must also perform
the final write(Q) operation in schedule S’.
View equivalence is also based purely on reads and writes
alone.
A schedule S is view serializable if it is view equivalent to
a serial schedule.
Every conflict serializable schedule is also view
serializable.
Below is a schedule which is view-serializable but not
conflict serializable.
Every view serializable schedule that is not conflict
serializable has blind writes.
97
Recoverable Schedules
Need to address the effect of transaction failures on
concurrently
running transactions.
Recoverable schedule — if a transaction Tj reads a data
item previously written by a transaction Ti , then the commit
operation of Ti appears before the commit operation of Tj.
The following schedule (Schedule 11) is not recoverable if
T9 commits immediately after the read.
100
Deadlocks
Occur when 2 transactions exist in the following mode:
T1 = access data item X and Y
T2 = Access data items Y and X
101
Concurrency Control
A database must provide a mechanism that will ensure that
all possible schedules are
either conflict or view serializable, and
are recoverable and preferably cascadeless
A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency
Are serial schedules recoverable/cascadeless?
Testing a schedule for serializability after it has executed is
a little too late!
Goal – to develop concurrency control protocols that will
assure serializability.
102
Concurrency Control vs.
Serializability Tests
Concurrency-control protocols allow concurrent schedules,
but ensure that the schedules are conflict/view serializable,
and are recoverable and cascadeless .
Concurrency control protocols generally do not examine the
precedence graph as it is being created
Instead a protocol imposes a discipline that avoids
nonseralizable schedules.
Different concurrency control protocols provide different
tradeoffs between the amount of concurrency they allow
and the amount of overhead that they incur.
Tests for serializability help us understand why a
concurrency control protocol is correct.
103
Weak Levels of Consistency
Some applications are willing to live with weak
levels of consistency, allowing schedules that
are not serializable
E.g. a read-only transaction that wants to get an
approximate total balance of all accounts
E.g. database statistics computed for query
optimization can be approximate (why?)
Such transactions need not be serializable with
respect to other transactions
Tradeoff accuracy for performance
104
Levels of Consistency
Serializable — default
Repeatable read — only committed records to be read,
repeated reads of same record must return same value.
However, a transaction may not be serializable – it may find
some records inserted by a transaction but not find others.
Read committed — only committed records can be read, but
successive reads of record may return different (but committed)
values.
Read uncommitted — even uncommitted records may be
read.
• Lower degrees of consistency useful for gathering approximate
information about the database
• Warning: some database systems do not ensure serializable
schedules by default
– E.g. Oracle and PostgreSQL by default support a level of
consistency called snapshot isolation (not part of the SQL
standard)
105
Transaction Definition in SQL
Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
Commit work commits current transaction and begins a
new one.
Rollback work causes current transaction to abort.
In almost all database systems, by default, every SQL
statement also commits implicitly if it executes successfully
Implicit commit can be turned off by a database directive
E.g. in JDBC, connection.setAutoCommit(false);
106
Implementation of Isolation
Schedules must be conflict or view serializable, and
recoverable, for the sake of database consistency, and
preferably cascadeless.
A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency.
Concurrency-control schemes tradeoff between the amount
of concurrency they allow and the amount of overhead that
they incur.
Some schemes allow only conflict-serializable schedules to
be generated, while others allow view-serializable
schedules that are not conflict-serializable.
107
Implementation of Isolation
111
Test for Conflict Serializability
113