Module 4 (1)
Module 4 (1)
1
Module 4
Module:4 QUERY PROCESSING AND
TRANSACTION PROCESSING
Translating SQL Queries into Relational Algebra –
heuristic query optimization – Introduction to
Transaction Processing – Transaction and System
concepts - Desirable properties of Transactions –
Characterizing schedules based on recoverability –
Characterizing schedules based on serializability
2
Today’s outline
Relational Algebra
4
Unary Relational Operations
SELECT (symbol: σ)
PROJECT (symbol: π)
RENAME (symbol: ρ)
5
Example Schema for illustration
6
SELECT (σ)
7
Select Operation - Example
8
Projection(π)
LNAME,FNAME,SALARY (EMPLOYEE)
9
Illustration – Select and Project
Operations for Company Schema
10
Rename (ρ)
11
12
Example:
FNAME, LNAME, SALARY ( DNO=5(EMPLOYEE))
OR We can explicitly show the
sequence of operations, giving a name to each
intermediate relation:
DEP5_EMPS DNO=5(EMPLOYEE)
RESULT FNAME, LNAME, SALARY
(DEP5_EMPS)
13
Relational Algebra Operations From
Set Theory
UNION (υ)
INTERSECTION ( ),
DIFFERENCE (-)
CARTESIAN PRODUCT ( x )
14
Union operation (υ)
UNION is symbolized by ∪
symbol.
It includes all tuples that are in
tables A or in B.
It also eliminates duplicate tuples.
set A UNION set B would be
expressed as:
The result <- A ∪ B
15
Union operation (υ)
contd.
16
Example
To retrieve the social security
numbers of all employees who
work in department 5 (Result 1
below) or directly supervise an
employee who works in
department 5 (Result 2 below)
17
Set Difference (-)
18
Intersection
19
Relational Algebra Operations From
Set Theory (cont.)
20
Cartesian product(X)
FEMALE_EMPS SEX=’F’(EMPLOYEE)
EMPNAMES FNAME, LNAME, SSN (FEMALE_EMPS)
21
22
23
Example:
FNAME, LNAME, SALARY ( DNO=5(EMPLOYEE))
OR We can explicitly show the
sequence of operations, giving a name to each
intermediate relation:
DEP5_EMPS DNO=5(EMPLOYEE)
RESULT FNAME, LNAME, SALARY
(DEP5_EMPS)
24
Join Operations
25
Types of joins
Various forms of join operation
are:
Inner Joins:
Theta join
EQUI join
Natural join
Outer join:
Left Outer Join
Right Outer Join
Full Outer Join
26
SQL Joins
• SQL Join is used to fetch data from two or more tables, which is joined
to appear as single set of data.
• It is used for combining column from two or more tables by using
values common to both tables.
• Types of Join
• Inner
• Outer (Left, Right)
• Cross
• Natural
• Cartesian
Different Types of SQL JOIN’s
Equi Join
• The INNER JOIN keyword selects records that have matching values in
both tables.
Natural Join
• The INNER JOIN keyword selects records that have matching values in
both tables.
Left Outer Join
• LEFT (OUTER) JOIN: Returns all records from the left table, and the
matched records from the right table
2 BBB
4
MUMBAI DDD
Right Outer Join
• RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
2 BBB MUMBAI
3 DEHLI
Full Outer Join
Join operation on Department and
Employee relations
34
eno enam add dno Dname eno
e
1 Ram Delhi D1 HR 1
2 Varun Chen D2 IT 2
nai
D3 Accounts 4
3 Ravi Chen
D4 Finance 5
nai
4 Amrit Delhi
5 Nitin Noida
35
36
37
38
Natural join (⋈)
39
Natural join - Example
40
OUTER JOIN
41
Left Outer Join
42
Left Outer Join- Example
43
Right Outer Join
44
Right Outer Join - Example
45
Full Outer Join
In a full outer join, all tuples from
both relations are included in the
result, irrespective of the
matching condition.
46
Full Outer Join - Example
47
Additional Relational Operations
Aggregate Functions and Grouping
Common functions applied to
collections of numeric values
It includes SUM, AVERAGE,
MAXIMUM, and MINIMUM
The COUNT function is used for
counting tuples or values.
48
Additional Relational Operations
(cont.)
Use of the Functional operator ℱ
49
QUERY OPTIMIZATION
50
Introduction to Query Processing
51
Basic Steps in Query Processing
52
Basic Steps in Query Processing
(Cont.)
Process for heuristics optimization
53
Query Representation
Query tree:
A tree data structure that corresponds to a relational algebra
expression.
It represents the input relations of the query as leaf nodes of
the tree and the relational algebra operations as internal
nodes.
An execution of the query tree consists of executing an
internal node operation whenever its operands are available
and then replacing that internal node by the relation that
results from executing the operation.
Query graph:
A graph data structure that corresponds to a relational
calculus expression.
It does not indicate an order on which operations to perform
first. There is only a single graph corresponding to each
query.
54
1. Translating SQL Queries into
Relational Algebra
Query block: the basic unit that can be
translated into the algebraic operators
and optimized.
A query block contains a single SELECT-
FROM-WHERE expression, as well as
GROUP BY and HAVING clause if these
are part of the block.
Nested queries within a query are
identified as separate query blocks.
Aggregate operators in SQL must be
included in the extended algebra.
55
Translating SQL Queries into
Relational Algebra- Example
56
Example
Example:
For every project located in ‘Stafford’,
retrieve the project number, the
controlling department number and
the department manager’s last name,
address and birthdate.
Relation algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT))
MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE FROM PROJECT AS
P,DEPARTMENT AS D, EMPLOYEE AS E WHERE
P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
57
Query tree
58
Query graph
59
2. Heuristic Optimization
Heuristic Optimization of Query Trees:
The same query could correspond to
many different relational algebra
expressions — and hence many
different query trees.
60
Steps in optimizing a Query tree
1. Moving SELECT operations down the query tree
2. Applying the more restrictive SELECT operation
first
3. Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations
4. Moving the PROJECT operations down the tree
Example:
Q: SELECT LNAME
FROM EMPLOYEE,
WORKS_ON, PROJECT
WHERE PNAME =
‘AQUARIUS’ AND PNMUBER=PNO AND
ESSN=SSN AND BDATE > ‘1957-12-31’;
61
Steps in optimizing a Query tree-
Illustration
62
Steps in optimizing a Query tree
(Contd.)
63
Steps in optimizing a Query tree
(Contd.)
64
Using Heuristics in Query
Optimization –Transformation rules
65
Transformation rules (Cont.)
66
Transformation rules (Cont.)
67
Transformation rules (Cont.)
68
Transaction Processing
69
Transaction - Definition
71
A Transaction is…………………
– stand-alone, specified in
a high level language
like SQL submitted
interactively, or
– consist of database
operations embedded
within a program (most
transactions)
72
• Basic operations on an item X:
– read_item(X): Reads a database item
named X into a program variable. To
simplify our notation, we assume that the
program variable is also named X.
73
READ OPERATIONS:
2. read or write:
3.end_ transaction:
4. commit_transaction:
5. Abort or Roll back_transaction:
76
SYSTEM OPERATIONS DURING RECOVERY
77
Transaction - Example
80
Transaction Properties –
ACID Properties
82
Transaction Properties -
Example of Fund Transfer
Transaction to transfer $50 from account A to account B:
1.read(A)
2.A := A – 50
3.write(A)
4.read(B)
5.B := B + 50
6.write(B)
Durability requirement:
Once the user has been notified that the transaction
has been completed (i.e., the transfer of the $50 has
taken place), the updates to the database by the
transaction must persist even if there are software or
hardware failures.
83
Transaction Properties -
Example of Fund Transfer
Consistency requirement:
The sum of A and B is unchanged by the execution of the
transaction
84
Transaction Properties -
Example of Fund Transfer
Isolation requirement — if between steps 3 and 6, another
transaction T2 is allowed to access the partially updated
database, it will see an inconsistent database (the sum A + B
will be less than it should be).
T1 T2
1.read(A)
2.A := A – 50
3.write(A)
read(A), read(B), print(A+B)
4.read(B)
5.B := B + 50
6.write(B)
Isolation can be ensured trivially by running transactions
serially
that is, one after the other.
However, executing multiple transactions concurrently has
significant benefits.
85
Transaction States
Active – The initial state; the transaction stays in this
state while it is executing
Partially committed – After the final statement has
been executed.
Failed -- After the discovery that normal execution can
no longer proceed.
Aborted – After the transaction has been rolled back
and the database restored to its state prior to the start of
the transaction. Two options after it has been aborted:
Restart the transaction
Can be done only if no internal logical error
Kill the transaction
Committed – after successful completion.
86
Transaction States
90
Schedules
Schedule – A sequence of instructions that specify the
chronological order in which instructions of concurrent
transactions are executed
A schedule for a set of transactions must consist of all
instructions of those transactions
Must preserve the order in which the instructions
appear in each individual transaction.
A transaction that successfully completes its execution
will have a commit instructions as the last statement
By default transaction assumed to execute commit
instruction as its last step
A transaction that fails to successfully complete its
execution will have an abort instruction as the last
statement
91
Schedule - 1
Let T1 transfer $50 from A to B, and T2 transfer 10%
of the balance from A to B.
A serial schedule in which T1 is followed by T2 :
97
Serializability
Basic Assumption – Each transaction preserves database
consistency.
Thus serial execution of a set of transactions preserves
database consistency.
A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule. Different forms of schedule
equivalence give rise to the notions of:
1.conflict serializability
2.view serializability
Simplified view of transactions
We ignore operations other than read and write
instructions
We assume that transactions may perform arbitrary
computations on data in local buffers in between reads
and writes.
Our simplified schedules consist of only read and write
instructions.
98
Conflicting Instructions
Instructions li and lj of transactions Ti and Tj
respectively, conflict if and only if there exists some
item Q accessed by both li and lj, and at least one of
these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical)
temporal order between them.
If li and lj are consecutive in a schedule and they
do not conflict, their results would remain the
same even if they had been interchanged in the
schedule.
99
Conflict Serializability
If a schedule S can be transformed into a schedule S
´ by a series of swaps of non-conflicting instructions,
we say that S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it
is conflict equivalent to a serial schedule
100
Conflict Serializability
Schedule 3 can be transformed into Schedule 6, a serial
schedule where T2 follows T1, by series of swaps of non-
conflicting instructions.
Therefore Schedule 3 is conflict serializable.
Schedule 3 Schedule 6
Image Source: Database System Concepts by Abraham Silberschatz, Henry
F.Korth and S.Sudarshan,101
Tata Mc Graw Hill, 2011
Conflict Serializability
Example of a schedule that is not conflict serializable:
Schedule 3 Schedule 6
Image Source: Database System Concepts by Abraham Silberschatz, Henry
F.Korth and S.Sudarshan,102
Tata Mc Graw Hill, 2011
Conflict Serializability
Example of a schedule that is not conflict serializable:
Schedule 3 Schedule 6
Image Source: Database System Concepts by Abraham Silberschatz, Henry
F.Korth and S.Sudarshan,103
Tata Mc Graw Hill, 2011
View Serializability
Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the
following three conditions are met, for each data item
Q,
1. If in schedule S, transaction Ti reads the initial
value of Q, then in schedule S’ also transaction
Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes
read(Q), and that value was produced by
transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that
was produced by the same write(Q) operation
of transaction Tj .
104
View Serializability
3. The transaction (if any) that performs the final
write(Q) operation in schedule S must also perform
the final write(Q) operation in schedule S’.
View equivalence is also based purely on reads and writes
alone.
A schedule S is view serializable if it is view equivalent to
a serial schedule.
Every conflict serializable schedule is also view
serializable.
Below is a schedule which is view-serializable but not
conflict serializable.
Every view serializable schedule that is not conflict
serializable has blind writes.
109
Deadlocks
Occur when 2 transactions exist in the following mode:
T1 = access data item X and Y
T2 = Access data items Y and X
110
Concurrency Control
A database must provide a mechanism that will ensure that
all possible schedules are
either conflict or view serializable, and
are recoverable and preferably cascadeless
A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency
Are serial schedules recoverable/cascadeless?
Testing a schedule for serializability after it has executed is
a little too late!
Goal – to develop concurrency control protocols that will
assure serializability.
111
Concurrency Control vs.
Serializability Tests
Concurrency-control protocols allow concurrent schedules,
but ensure that the schedules are conflict/view serializable,
and are recoverable and cascadeless .
Concurrency control protocols generally do not examine the
precedence graph as it is being created
Instead a protocol imposes a discipline that avoids
nonseralizable schedules.
Different concurrency control protocols provide different
tradeoffs between the amount of concurrency they allow
and the amount of overhead that they incur.
Tests for serializability help us understand why a
concurrency control protocol is correct.
112
Weak Levels of Consistency
Some applications are willing to live with weak
levels of consistency, allowing schedules that
are not serializable
E.g. a read-only transaction that wants to get an
approximate total balance of all accounts
E.g. database statistics computed for query
optimization can be approximate (why?)
Such transactions need not be serializable with
respect to other transactions
Tradeoff accuracy for performance
113
Levels of Consistency
Serializable — default
Repeatable read — only committed records to be read,
repeated reads of same record must return same value.
However, a transaction may not be serializable – it may find
some records inserted by a transaction but not find others.
Read committed — only committed records can be read, but
successive reads of record may return different (but committed)
values.
Read uncommitted — even uncommitted records may be
read.
• Lower degrees of consistency useful for gathering approximate
information about the database
• Warning: some database systems do not ensure serializable
schedules by default
– E.g. Oracle and PostgreSQL by default support a level of
consistency called snapshot isolation (not part of the SQL
standard)
114
Transaction Definition in SQL
Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
Commit work commits current transaction and begins a
new one.
Rollback work causes current transaction to abort.
In almost all database systems, by default, every SQL
statement also commits implicitly if it executes successfully
Implicit commit can be turned off by a database directive
E.g. in JDBC, connection.setAutoCommit(false);
115
Implementation of Isolation
Schedules must be conflict or view serializable, and
recoverable, for the sake of database consistency, and
preferably cascadeless.
A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency.
Concurrency-control schemes tradeoff between the amount
of concurrency they allow and the amount of overhead that
they incur.
Some schemes allow only conflict-serializable schedules to
be generated, while others allow view-serializable
schedules that are not conflict-serializable.
116
Implementation of Isolation
120
Test for Conflict Serializability
122