2.advanced Database System
2.advanced Database System
BY: ZIGIJU N
8
be applied on relations on a database based on
the requirement.
Cont…
1.Selection ( ):- Selects a subset of rows from
a relation.
2.Projection ( ):- Deletes unwanted columns
from a relation.
3.Renaming: assigning intermediate relation for a
single operation
4.Cross-Product ( x):- Allows us to combine
two relations.
5.Set-Difference ( - ):- Tuples in relation1, but
not in relation2.
6.Union ):-():- Tuples
Tuples joined in based
from two relations relation1
on a condition or in
relation2.
7.Intersection ():- Tuples in relation1 and in
relation2
8.Join(
9
Cont…
Selection
Selects subset of tuples/rows in a relation that
satisfy selection condition.
It is a unary operator (it is applied to a single
relation)
The Selection operation is applied to each tuple
individually
The degree of the resulting relation is the same as
the original relation but the cardinality (no. of
tuples) is less than or equal to the original relation.
This operator is commutative.
Set of conditions can be combined using Boolean
operations ((AND), (OR), and ~(NOT))
No duplicates in result!
10
Cont…
Schema of result identical to schema of (only)
input relation.
Result relation can be the input for another
relational algebra operation! (Operator
composition.)
Has a syntax:- <Selection Condition><Relation
Name>
Example: Find all Employees with skill type of
Database could be written as .
<SkillType =”Database”> (Employee) in
relational algebra.
This could be written in SQL as
SELECT * FROM EMPLOYEE WHERE
SkillType=Database”;
11
Cont…
Projection
Selects certain attributes while discarding the
other from the base relation.
The PROJECT creates a vertical partitioning – one
with the needed columns (attributes) containing
results of the operation and other containing the
discarded Columns.
Deletes attributes that are not in projection
list.
Schema of result contains exactly the fields in the
projection list, with the same names that they had in
the (only) input relation.
Projection operator has to eliminate duplicates!
Note: real systems typically don’t do duplicate elimination
12 unless the user explicitly asks for it.
Cont…
If the Primary Key is in the projection list, then
duplication will not occur
Duplication removal is necessary to insure that the
resulting table is also a relation.
Has syntax:-
<Selected Attributes><Relation Name>
Example: To display Name, Skill, and Skill Level of
an employee, the query and the resulting relation
will be:
In RA:-
16
Cont…
Set Difference (or MINUS) Operation or complements
Denoted by R - S, mean that a relation that
includes all tuples that are in R but not in S.
The two operands must be "type compatible"
Eg: Employees who attend Database
Course but didn’t take any course at AAU
RelationOne–RelationTwo
RelationOne: Employees who attend Database Course
i.e select * from Employee where skillType=Database;
RelationTwo: Employees who attend a course in AAU
i.e in SQL
select * from Employee where school=”AAU”;
17
Cont…
CARTESIAN Operation (Cross Product-X)
This operation is used to combine tuples from two
relations in a combinatorial fashion.
That means, every tuple in Relation1(R) one will be
related with every other tuple in Relation2 (S).
In general, the result of R(A , A , . . ., A )
1 2 n x
S(B1,B2, . . ., Bm) is a relation Q with degree n +
m attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in
that order , where R has n attributes and S has m
attributes.
The resulting relation Q has one tuple for each
combination of tuples—one from R and one from S.
Hence, if R has ntuples, and S has m tuples, then |
18 R x S | will have n* m tuples.
Cont…
19
Cont…
Example:
To extract employee information about managers of the
departments, the algebra query using the JOIN
operation will be:
EQUIJOIN Operation
The most common use of join involves a condition
with equality comparisons only ( = ).
Such a join, where the only comparison operator
used is called an EQUIJOIN.
In the result of an EQUIJOIN we always have one or
more pairs of attributes (whose names need not be
identical) that have identical values in every tuple
20 since we used the equality logical operator.
Cont…
NATURAL JOIN Operation
The standard definition of natural join
requires that the two join attributes, or each
pair of corresponding join attributes, have
the same name in both relations.
If this is not the case, a renaming operation
on the attributes is applied first.
21
Cont…
OUTER JOIN Operation
OUTER JOIN is another version of the JOIN
operation where non matching tuples from a
relation are also included in the result with
NULL values for attributes in the other
relation.
RIGHT
There are two major
OUTER types
JOIN: of OUTER
where non JOIN.
matching tuples from the second (Right)
relation are included in the result with
NULL value for attributes of the first
(Left) relation.
LEFT OUTER JOIN: where non matching
22
tuples from the first (Left) relation are
included in the result with NULL value for
Cont…
When two relations are joined by a JOIN
operator, there could be some tuples in the
first relation not having a matching tuple
from the second relation, and the query is
interested to display these non matching
tuples from the first or second relation.
Such query is represented by the OUTER
JOIN.
R
Notation for Left Outer
<Join S
Condition Join:
>
23
Cont…
SEMIJOIN Operation
SEMI JOIN is another version of the JOIN
operation where the resulting Relation will
contain those attributes of only one of the
Relations that are related with tuples in the
other Relation.
The following notation depicts the inclusion
of only the attributes form the first relation
(R) in the result which are actually
participating in the relationship.
24
Relational Calculus
A relational calculus expression creates a
new relation, which is specified in terms of
variables that range over rows of the stored
database relations (in tuple calculus) or
over columns of the stored relations (in
domain calculus).
In a calculus expression, there is no order
of operations to specify how to retrieve the
query result.
A calculus expression specifies only what
information the result should contain rather
than how to retrieve it.
25
Cont…
The difference of relational calculus from
relational algebra is:-calculus,
In Relational there is no
description of how to evaluate a query;
it focuses on what information the
query has?
Relational calculus is considered to be
a nonprocedural language, but the
relational algebra uses procedural
approach .
26
Cont…
In general, query Processing can be divided
into four main phases:
Decomposition,
Optimization,
Code generation,
and
Execution
27
Cont…
Query Decomposition
Query decomposition is the process of
transforming a high level query(SQL) into a
relational algebra query (Low level query),
and to check that the query is syntactically
and semantically correct by decomposing
using parsing and validation.
It has relational Algebra query inputs.
28
Cont…
Typical stages in query decomposition are:
Analysis: lexical and syntactical analysis of the
query (correctness).
Normalization: convert the query into a
normalized form.
Semantic Analysis: to reject normalized queries
hat are not correctly formulated or contradictory.
Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform
the query to a semantically equivalent but more
easily and effectively computed form.
Query Restructuring: Re arranging nodes so that
the most restrictive condition will be executed
first.
29
Cont…
Query Processing Steps
30
Part Two
Query Optimization
Approaches to Query Optimization/ Algorithms/
Semantic Query Optimizations
Query Optimization
Query optimization is of great importance for
the performance of a relational database,
especially for the execution of complex SQL
statements.
A query optimizer decides the best methods for
implementing each query.
A single query can be executed through
different algorithms or re-written in different
forms and structures.
Hence, the question of query optimization
comes into the picture – Which of these forms
or pathways is the most optimal?
The query optimizer attempts to determine the
32 most efficient way to execute a given query by
Cont…
Purposes of Query Optimization
The goal of query optimization is to reduce the
system resources required to fulfill a query, and
ultimately provide the user with the correct
First,
result set it
faster.
provides the user with faster
results, which makes the application seem
faster to the user.
Secondly, it allows the system to service
more queries in the same amount of time,
because each request takes less time
than unoptimized queries.
Thirdly, query optimization ultimately
reduces the amount of wear on the
33
hardware (e.g. disk drives), and allows the
Cont…
For optimizing the execution of a query the
programmer must know:
File organization
Record access mechanism and primary or
secondary key.
Data location on disk.
Data access limitations.
34
Cont…
To write correct and efficient code,
application programmers need to know
how data is organized physically (e.g.,
which indexes exist) and worry about
data/workload characteristics
Query Optimization uses:-
To make query evaluation faster.
To reduce the response time of the query
processor.
To allow the user write queries without being
aware of the physical access mechanisms and
without asking her/his to explicitly dictate the
35 system how the queries should be evaluated.
Approaches to Query Optimization/
Algorithms/
A. Heuristics Approach
The heuristic approach uses the knowledge
of the characteristics of the relational
algebra operations and the relationship
between the operators to optimize the
query.
This method is also known as rule based
optimization.
This is based on the equivalence rule on
relational expressions; hence the number of
combination of queries get
Properties of reduces here.
individual
Hence the operators
cost of the query too reduces.
36
Association between operators
Thus theheuristic
Query Treeapproach of optimization
Cont…
Query Tree: a graphical representation of the
operators, relations, attributes and predicates
and processing sequence during query
processing.
Query
Thetree is composed of three main parts:
Leafs: the base relations used for
processing the query/ extracting the
required information
The Root: the final result/relation as
an out put based on the operation on
the relations used for query processing
Nodes: intermediate results or
relations before reaching the final
Sequence
result.of execution of operation in a query
37
tree will start from the leaves and continues to
the intermediate nodes and ends at the root.
Cont…
Query graph: a graph data structure that
corresponds to a relational calculus expression.
It does not indicate an order on which
operations to perform first.
There is only a single graph corresponding to
each query.
The properties of each operations and the
association between operators is analyzed
using set of rules called TRANSFORMATION
RULES Which uses to transform the query to
relatively good execution strategy.
38
Cont…
Example:
After the SQL query is parsed and it is
syntactically correct, then it is mapped onto
Relational Algebra (RA) expression.
Usually shown as a query tree (bottom up).
Consider the SQL query that are done on
Reserves and sailors tables.
SELECT S.Sname FROM Reserves R, Sailors S
WHERE R.SID = S.SID AND R.BID = 100 AND
S.Rating> 5
The same query in RA:
Sname (BID=100 and Rating > 5(Reserves
39
⋈SID=SIDSailors))
Cont…
Sname (BID=100 and Rating > 5(Reserves ⋈SID=SIDSailor
π sname
sid=sid
Reserves Sailors
40
Equivalence Rules/Transformation Rules/ for
Relational Algebra
1. Cascade(flow) of SELECTION: conjunctive
SELECTION Operations can cascade into individual
Selection Operations and Vice Versa
(c1c2c3) (R)= c1( c2( c3(R)) where ci is a
predicate
3. Cascade of PROJECTION:
in the sequence of PROJECTION Operations, only
the last in the sequence is required
41
L1L2L3L4(R)=L4(R)
Cont…
4. Commutativity of SELECTION with PROJECTION
and Vise Versa
If the predicate c1 involves only the attributes in
the projection list (L1), then the selection and
projection operations commute
<a1,a2..an>( c1(R))= c1(<a1,a2,,,,an>(R)) ,
Where c1€
{a1,a2…an}
5. Commutativity of THETA JOIN/Cartesian
Product
R X S is equivalent to S X R but mathematically
incorrect
42Also holds for Equi-Join and Natural-Join
Cont…
43
Cont…
44
B. Cost Estimation Approach
This is based on the cost of the query.
The query can use different paths based on
indexes, constraints, sorting methods etc.
This method mainly uses the statistics like
record size, number of records, number of
records per block, number of blocks, table
size, whether whole table fits in a block,
organization of tables, uniqueness of column
values, size of columns etc.
The main idea is to minimize the cost of
processing a query.
The cost function is comprised of:
45
I/O cost + CPU processing cost +
Cont…
The DBMs will use information stored in the
system catalogue for the purpose of
estimating cost.
The main target of this query optimization
is to minimize the size of the intermediate
relation.
Disk Access
The size/storage cost/ will have effect in the
Data Transpiration
cost of:
Storage space in the Primary
Memory
Writing on Disk
46
Cont…
The followings also used for cost estimations
50
Cont…
Computation Cost
Query is composed of many operations.
The operations could be database
operations like reading and writing to a disk,
or mathematical and other operations like:
Searching
Sorting
Merging
Computation on
field values
51
Pipelining
Pipelining is another method used for query
optimization.
It is sometime referred to as on-the-fly
processing of queries or also known as stream-
based processing.
As query optimization tries to reduce the size of
the intermediate result, pipelining use a better
way of reducing the size by performing different
conditions on a single intermediate result
continuously.
Thus the technique is said to reduce the number
of intermediate relations in query execution.
Pipelining performs multiple operations on a
52 single relation in a pipeline.
Cont…
Example :Lets say we have a relation on
employee with the following schema
Employee(ID, FName, LName, DoB, Salary,
Position, Dept)
If a query would like to extract supervisors
with salary greater than 2000, the relational
algebra representation of the query will be:
(Salary>2000) (Position=Supervisor) (Employee)
53
Semantic Query Optimizations
Semantic Query Optimization Uses constraints
specified on the database schema in order to
modify one query into another query that is
more efficient to execute.
Consider the following SQL query,
SELECT E.LNAME, M.LNAME FROM EMPLOYEE E M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
From the above, suppose that we had a
constraint on the database schema that stated
that no employee can earn more than his or her
direct supervisor.
If the semantic query optimizer checks for the
existence of this constraint, it need not execute
the query at all because it knows that the result
54
of the query will be empty.
Execution
An execution plan for a relational algebra query
consists of a combination of the relational
algebra query tree and information about the
access methods to be used for each relation as
well as the methods to be used in computing
the relational operators stored in the tree.
55
56
Chapter two
Database Security
and Authorization
57
Part
What is DB Security?
One
Security Issues
Security Levels
Computer based security measures
Authentication and Authorization
Role of DBA in Database Security
DB security techniques
Discretionary security
mechanism
Mandatory access
Statistical control.
Database Security
58
What is DB Security?
Security:
is protection from, or resilience against,
potential harm caused by others, by
restraining the freedom of others to act.
Database Security:
refers to the range of tools, controls, and
measures designed to establish and
preserve database confidentiality,
integrity, and availability.
includes a variety of measures used to
secure database management systems
from malicious cyber-attacks and
59 illegitimate use.
Cont…
Database security must address and
protect the following:
The data in the database
The database management system
(DBMS)
Any associated applications
The physical database server and/or
the virtual database server and the
underlying hardware
The computing and/or network
infrastructure used to access the
database
60
Cont…
Database security is a complex and
challenging endeavor that involves all
aspects of information security technologies
and practices.
It’s also naturally at odds with database
usability.
The more accessible and usable the
database, the more vulnerable it is to
security threats; the more invulnerable the
database is to threats, the more difficult it is
to access and use. (This paradox is
sometimes referred to as Anderson’s Rule).
61
Related terms for Security
Privacy :– Ethical and legal rights that
individuals have with regard to control over
the dissemination and user of their personal
information.
Database security :– Protection of
information contained in the database
against unauthorized access, modification
or destruction.
Database integrity: – Mechanism that is
applied to ensure that the data in the
database is correct and consistent
62
Cont…
Features of a good Security Management
System
Data Independence
Minimal
redundancy
Data consistency
Data integrity
Privacy
Integrity
Availability
Copyright
Validity
63
Cont…
Database security and integrity deals
about protecting the database from
being inconsistent and being disrupted
call it database misuse.
Database misuse could be Intentional
or Accidental, where accidental misuse
is easier to cope with than intentional
misuse.
System crash during transaction
Accidental inconsistency could occur
processing
due to:
Anomalies due to concurrent
access
Anomalies due to redundancy
64 Logical errors
Cont…
Intentional misuse could be:
Unauthorized reading of data
Unauthorized modification of
data or
Unauthorized destruction of
data
65
Cont…
Database Integrity:
Database Integrity constraints contribute
to maintaining a secure database system by
preventing data from becoming invalid and
hence giving misleading or incorrect results.
There are different types:-
Domain Integrity
Entity Integrity
Referential Integrity
Key constraints
Enterprise Constraint
66
Cont…
Domain Integrity means that each column in any
table will have set of allowed values and cannot
assume any value other than the one specified in
the domain.
Entity Integrity means that in each table the
primary key (which may be composite) satisfies
both of two conditions:
o The primary key is unique within the table and
o The primary key column(s) contains no null values.
Referential Integrity means that in the database
as a whole, things are set up in such a way that if a
column exists in two or more tables in the
database (typically as a primary key in one table
and as a foreign key in one or more other tables).
67
Cont…
Key constraints in a relational database,
there should be some collection of
attributes with a special feature used to
maintain the integrity of the database.
These attributes will be named as Primary
Key, Candidate Key, Foreign Key, and etc
Enterprise Constraint means some
business rules set by the enterprise on how
to use, manage and control the database.
68
Cont…
Database Security:
Database Security - the mechanisms that
protect the database against intentional or
accidental threats.
Database security encompasses hardware,
software, people and data.
Database Management Systems supporting
multi-user database system must provide a
database security and authorization
subsystem to enforce limits on individual
and group access rights and privileges.
69
Levels of Security Measures
Security measures can be implemented
at several levels and for different
components of the system.
Physical Level: concerned with
securing the site containing the
computer system.
Human Level: concerned with
authorization of database users for
access the content at different levels and
privileges.
Operating System Level: concerned
with the weakness and strength of the
70 operating system security on data files.
Cont…
Database System Level: concerned
with data access limit enforced by the
database system.
Access limit like password, isolated
transaction and etc.
Application Level: Since almost all
database systems allow remote access
through terminals or networks, software-
level security with the network software is
as important as physical security, both on
the Internet and networks private to an
enterprise.
71
Security Issues and general
considerations
Legal, ethical and social issues: regarding
the right to access information.
Physical control issues: regarding how to
keep the database physically secured.
Policy issues: regarding privacy of
individual level at enterprise and national
level.
Operational consideration: on the
techniques used (password, etc) to access
and manipulate the database.
System level security: including operating
system and hardware control.
72
Cont…
The designer and the administrator of a database
should first identify the possible threat that
might be faced by the system in order to take
counter measures.
Threat:-
It may be any situation or event, whether
intentional or accidental, that may adversely
affect a system and consequently the
organization
It may be caused by a situation or event involving
a person, action, or circumstance that is likely to
bring harm to an organization, where the harm to
an organization may be tangible or intangible.
Tangible – loss of hardware,
software, or data
73 Intangible –loss of credibility or
Counter measures: Computer Based
Controls
The types of counter measure to threats on
computer systems range from physical
controls to administrative procedures.
74
Cont…
The following are computer-based security
controls for a multi-user environment:
Authorization
have legitimate access to a
system/object
govern not only what system or object a
specified user can access, but also what
the user may do with it
sometimes referred to as access
controls
Views
is the dynamic result of one or more
75
relational operations
is a virtual relation that does not
Cont…
Backup and recovery
Backup is the process of periodically taking a
copy of the database and log file
Recovery is the process of restoring the
database to a correct state in the event of a
failure
Integrity
preventing data from becoming invalid and
giving misleading or incorrect results.
76
Cont…
Encryption
Authorization may not be sufficient to
protect data in database systems,
especially when there is a situation where
data should be moved from one location to
the other using network facilities.
Encryption is used to protect information
stored at a particular site or transmitted
between sites from being accessed by
unauthorized users.
Encryption is the encoding of the data by a
special algorithm.
The data is unreadable by any program
77 without the decryption key if encrypted.
Cont…
To transmit data securely over insecure
networks requires the use of a
Cryptosystem, which includes:
79
Cont…
User authorization on the
database schema
Index Authorization: deals with
permission to create as well as delete an
index table for relation.
Resource Authorization: deals with
permission to add/create a new relation in
the database.
Alteration Authorization: deals with
permission to add as well as delete
attribute.
Drop Authorization: deals with permission
to delete and existing relation.
80
Role of DBA in Database Security
The database administrator is responsible
to make the database to be as secure as
possible.
The major responsibilities of DBA in relation
to authorization of users are:
Account Creation
Security Level
Assignment
Privilege Grant
Privilege Revocation
Account Deletion
81
DB security techniques
There are two types of DB security
techniques for example:-Discretionary
security mechanism and Mandatory access
control.
The mechanisms used to grant and revoke
privileges in relational database systems
and in SQL referred to as Discretionary
access control.
On other round, the mechanisms for
enforcing multiple levels of security, which
is a more recent concern in database
system security that is known as
Mandatory access control.
82
Discretionary security mechanisms
Grant different privileges to different users
and user groups on various data objects to
access different data objects.
The mode of the privilege could be:- Read,
Insert, Delete, Update files, records or
fields.
It is more flexible
The typical method of enforcing
discretionary access control in DBS is based
on granting and revoking of privileges.
83
Mandatory Access Control
Enforce multilevel security
Classifying data and users into various
security classes (or levels) and implementing
the appropriate security policy of the
organization.
Each data object will have certain
classification level
Each user is given certain clearance level
Only users who can pass the clearance level
can access the data object
Is comparatively not-flexible/rigid
If one user can have A but not B then B is
accessed by users with higher privilege and
84 we can not have B but not A
Cont…
It have the following security classes:
Top Secret (TS).
Secret (S).
Confidential (C).
Unclassified (U).
TS is the highest level and U the lowest
level TS > S > C > U.
85
Statistical Database Security
Statistical databases contain information
about individuals, which may not be permitted
to be seen by others as individual records.
Such databases may contain information
about various populations.
Statistical databases should have additional
security techniques which will protect the
retrieval of individual records.
Only queries with statistical aggregate
functions like Average, Sum, Min, Max,
Standard Deviation, Mid, Count, etc should be
executed.
Queries retrieving confidential attributes
should be prohibited.
86
87
Chapter Three
Transaction
Processin
88 By Zigiju N
Part
One
What is transaction?
State of transaction
Ways of executing
transaction
Serializability
89 By Zigiju N
What is Transaction?
A Transaction is a mechanism for applying
the desired modifications/operations to
a database.
A transaction could be a whole program,
part/module of a program or a single
command.
Action, or series of actions, carried out by a
single user or application program, which
accesses or changes contents of database.
Changes made in real time to a database
are called transactions.
90 By Zigiju N
Cont…
A transaction could be composed of one or
more database and non-database
operations.
A database transaction is a unit of
interaction with database management
system or similar system that is treated in a
coherent and reliable way independent of
other transactions.
91 By Zigiju N
Transaction processing system
A system that manages transactions and
controls their access to a DBMS is called a TP
monitor.
A transaction processing system (TPS)
generally consists of a TP monitor, one or
more DBMSs, and a set of application
programs containing transaction.
In a database field, a transaction is a group
of logical operations that must all succeed
or fail as a group.
Systems dedicated to supporting such
operations are known as transaction
processing systems.
92 By Zigiju N
Cont…
Transactions can be started, attempted,
then committed or aborted via data
manipulation commands of SQL.
Can have one of the two outcomes for any
transaction:
Success - transaction commits and
database reaches a new consistent state
Committed transaction cannot be aborted or rolled back.
How do you discard a committed transaction?
95 By Zigiju N
Cont…
Never Started - If a transaction fails
during execution then all its modifications
must be undone to bring back the database
to the last consistent state, i.e., remove the
effect of failed transaction.
No state between Done and Never Started
Consistency
If the transaction code is correct then
a transaction, at the end of its
execution, must leave the database
consistent.
A transaction should transform a database
from one previous consistent state to
96 another consistent
By Zigiju N state.
Cont…
Isolation
A transaction must execute without
interference from other concurrent
transactions and its intermediate or partial
modifications to data must not be visible to
other transactions.
Durability
The effect of a completed transaction
must persist in the database, i.e., its
updates must be available to other
transaction immediately after the end of its
execution, and is should not be affected due
to failures after the completion of the
transaction.
By Zigiju N
97
State of a Transaction
A transaction is an atomic operation from the
users’ perspective.
But it has a collection of operations and it can
have a number of states during its execution.
A transaction can end in three possible states.
Successful Termination: when a transaction
completes the execution of all operations in it
and reaches the COMMIT command.
Suicidal Termination: when the transaction
detects an error during its processing and
decide to abrupt itself before the end of the
transaction and perform a ROLL BACK
Murderous Termination: When the DBMS or
the system force the execution to abort for any
98 By Zigiju N
reason. And hence, rolled back.
Ways of Transaction Execution
In a database system many transactions are
executed.
Basically there are two ways of executing a set
of transactions:
Serial Execution:
In a serial execution transactions are
executed strictly serially.
Thus, Transaction Ti completes and writes its
results to the database then only the next
transaction Tj is scheduled for execution.
This means at one time there is only one
transaction is being executed in the system.
The data is not shared between transactions at
99
one specific
By Zigiju N
time.
Cont…
10 By Zigiju N
1
Cont…
Concurrent Execution :
is the reverse of serially executable
transactions, in this scheme the individual
operations of transactions, i.e., reads and
writes are interleaved in some order.
Problems Associated with Concurrent
Transaction Processing
Although two transactions may be correct in
themselves, interleaving of operations may
produce an incorrect result which needs
control over access.
10 By Zigiju N
2
Cont…
Having a concurrent transaction
processing, one can enhance the
throughput of the system.
As reading and writing is performed from
and on secondary storage, the system
will not be idle during these operations if
there is a concurrent processing.
10 By Zigiju N
3
Cont…
The three potential problems caused by
concurrency are:
Lost Update Problem
Successfully completed update on a data set
by one transaction is overridden by another
transaction/user.
Uncommitted Dependency Problem
Occurs when one transaction can see
intermediate results of another transaction
before it is committed.
Inconsistent Analysis Problem
Occurs when transaction reads several values
but second transaction updates some of them
10 during Byexecution
Zigiju N and before the completion
4
Serializability
The objective of Concurrency Control
Protocol is to schedule transactions in such
a way as to avoid any interference between
them.
This demands a new principle in transaction
processing, which is serializability of the
schedule of execution of multiple
transactions.
10 By Zigiju N
5
Cont…
In any transaction processing system, if
concurrent processing is implemented, there
will be a concept called schedule having or
determining the execution sequence of
operations in different transactions.
Schedule: time-ordered sequence of the
important actions taken by one or more
transitions.
Schedule represents the order in which
instructions are executed in the system
in chronological ordering.
The scheduler component of a DBMS must
ensure that the individual steps of different
10
transactions
By Zigiju N
preserve consistency.
6
Cont…
Serial Schedule: a schedule where the
operations of each transaction are
executed consecutively without any
interleaved operations from other
transactions.
No guarantee that results of all serial
executions of a given set of transactions will
be identical.
Non-serial Schedule: Schedule where
operations from a set of concurrent
transactions are interleaved.
10 By Zigiju N
7
Cont…
The objective of serializability is to find non-
serial schedules that allow transactions to
execute concurrently without interfering
with one another.
Another objective of serialization is to find
schedules that allow transactions to execute
concurrently without interfering with one
another.
10 By Zigiju N
8
Cont…
In serializability:
10 By Zigiju N
9
11 By Zigiju N
0
Chapter Four
Concurrency
Control
Techniques
11 By Zigiju N
1
Part
One
What is concurrency control
Concurrency controlling
techniques
11 By Zigiju N
2
What is concurrency control?
Concurrency Control is the process of
managing simultaneous operations on the
database without having them interfere
with one another.
Prevents interference when two or more
users are accessing database
simultaneously and at least one is updating
data.
Although two transactions may be correct in
themselves, interleaving of operations may
produce an incorrect result.
11 By Zigiju N
3
Concurrency controlling techniques
Three basic concurrency control techniques:
Locking methods
Time stamping
Optimistic
Both Locking and Time stamping are
conservative approaches: delay
transactions in case they conflict with other
transactions.
The optimistic approach allows us to
proceed and check conflicts at the end.
11 By Zigiju N
4
Locking Method
The locking method is a mechanism for
preventing simultaneous access on a
shared resource for a critical operation
A LOCK is a mechanism for enforcing
limits on access to a resource in an
environment where there are many threads
of execution.
Locks are one way of enforcing concurrency
control policies.
Transaction uses locks to deny access to
other transactions and so prevent incorrect
updates.
11 By Zigiju N
5
Cont…
Lock prevents another transaction from
modifying item or even reading it, in the
case of a write lock.
Lock (X): If a transaction T1 applies Lock
on data item X, then X is locked and it is not
available to any other transaction.
Unlock (X): T1 Unlocks X. X is available to
other transactions.
11 By Zigiju N
6
Types of Locks
Shared lock: A Read operation does not change
the value of a data item.
Hence a data item can be read by two different
transactions simultaneously under share lock
mode.
So only to read a data item T1 will do: Share lock
(X), then Read (X), and finally Unlock (X).
Exclusive lock: A write operation changes the
value of the data item.
Hence two write operations from two different
transactions or a write from T1 and a read from T2
are not allowed.
A data item can be modified only under Exclusive
lock.
To modify a data item T1 will do: Exclusive lock (X),
11 By Zigiju N
7 then Write (X) and finally Unlock (X).
Lock: Basic rules
If transaction has a shared lock on an item,
it can read but not update the item.
If a transaction has an exclusive lock on an
item, it can both read and update the item.
Reads cannot conflict, so more than one
transaction can hold shared locks
simultaneously on same item.
Exclusive lock gives transaction exclusive
access to that item.
Some systems allow transaction to upgrade
a shared lock to an exclusive lock, or vice-
versa.
11 By Zigiju N
8
Locking Method: Problems
Deadlock:
A deadlock that may result when two (or
more) transactions are each waiting for
locks held by the other to be released.
Only one way to break deadlock: abort
one or more of the transactions in the
deadlock.
Deadlock should be transparent to user, so
DBMS should restart transaction(s).
Two general techniques for handling
deadlock:
Deadlock prevention, and
Deadlock detection and recovery.
11 By Zigiju N
9
Cont…
Timeout
The deadlock detection could be done using
the technique of TIMEOUT.
Every transaction will be given a time to
wait in case of deadlock.
If a transaction waits for the predefined
period of time in idle mode, the DBMS will
assume that deadlock occurred and it will
abort and restart the transaction.
12 By Zigiju N
0
Time-stamping Method
It is a unique identifier created by DBMS
that indicates relative starting time of a
transaction.
Can be generated by:
using system clock at the time of
transaction started, or
Incrementing a logical counter
every
It is also atime when new
concurrency transaction
control protocol that
starts.
orders transactions in such a way that older
transactions, transactions with smaller
time stamps, get priority in the event
of conflict.
12 By Zigiju N
1
Cont…
In time-stamping:
12 By Zigiju N
4
Cont…
Rules for permitting execution of
operations in Time-stamping Method
Suppose that Transaction Ti issues
Read(A)
If TS(Ti) < WTS(A): this implies that T i needs
to read a value of A which was already
overwritten. Hence the read operation must
be rejected and Ti is rolled back.
If TS(Ti) >= WTS(A): then the read is
executed and RTS(A) is set to the maximum
of RTS(A) and TS(Ti).
12 By Zigiju N
5
Cont…
Suppose that Transaction Ti issues
Write(A)
If TS(Ti) < RTS(A): then this implies that the
value of A that Ti is producing was
previously needed and it was assumed that
it would never be produced. Hence, the
Write operation must be rejcted and Ti is
rolled back.
If TS(Ti) < WTS(A): then this implies that T i
is attempting to Write an object value of A.
hence, this write operation can be ignored.
12 By Zigiju N
6
Cont…
Otherwise the Write operation is executed
and WTS(A) is set to the maximum of
WTS(A) or TS(Ti).
N.B: A transaction that is rolled back due to
conflict will be restarted and be given a new
timestamp.
12 By Zigiju N
7
Optimistic Technique
Locking and assigning and checking
timestamp values may be unnecessary for
some transactions
Assumes that conflict is rare.
When transaction reaches the level of
executing commit, a check is performed to
determine whether conflict has occurred.
If there is a conflict, transaction is rolled
back and restarted.
Based on assumption that conflict is
rare and more efficient to let
transactions proceed without delays to
ensure serializability.
12 By Zigiju N
8
Cont…
At commit, check is made to determine
whether conflict has occurred.
If there is a conflict, transaction must be
rolled back and restarted.
Potentially allows greater concurrency than
traditional protocols.
12 By Zigiju N
9
Cont…
Three phases:
Read
Validation
Write
13 By Zigiju N
0
Cont…
Optimistic Techniques - Validation Phase
Follows the read phase.
For read-only transaction, checks that data
read are still current values. If no
interference, transaction is committed, else
aborted and restarted.
For update transaction, checks transaction
leaves database in a consistent state, with
serializability maintained.
Optimistic Techniques - Write Phase
Follows successful validation phase for
update transactions.
Updates made to local copy are applied to
13 By Zigiju N
1 the database.
13 By Zigiju N
2
Chapter 5
Database Recovery
Techniques
Contents
1.What is database recovery?
2.Database recovery terms
3.Purpose of Database Recovery
4.Types of Failure
5.Transaction Log
6.Data Updates
7.Data Caching
8.Transaction Roll-back (Undo) and Roll-Forward
• In this example a full backup of a database (copies of its data files and
control file) is taken at SCN 100. Redo logs generated during the
operation of the database capture all changes that occur between SCN
100 and SCN 500. Along the way, some logs fill and are archived. At SCN
500, the data files of the database are lost due to a media failure. The
database is then returned to its transaction-consistent state at SCN 500,
by restoring the data files from the backup taken at SCN 100, then
applying the transactions captured in the archived and online redo logs
and undoing the uncommitted transactions.
Slide 19- 150
Purpose of Database Recovery
To bring the database into the last consistent state,
which existed prior to the failure.
To preserve transaction properties (Atomicity,
Consistency, Isolation and Durability).
Example:
If the system crashes before a fund transfer
transaction completes its execution, then either one or
both accounts may have incorrect value.
Thus, the database must be restored to the state
before the transaction modified any of the accounts.
The database may become unavailable for use due to:
System failure: System may fail because of
addressing error, application error, operating system
fault, RAM failure, etc.
Transaction failure: Transactions may fail
because of incorrect input, deadlock, incorrect
synchronization.
Media failure: Disk head crash, power disruption,
etc.
bits.
Pin-Unpin: Instructs the operating system not to flush the
data item.
Modified: Indicates the AFIM of the data item.
ADVANTAGES DISADVANTAGES
Reflects organizational structure Greater Potential for Bugs:
Many existing systems Increased Processing Overhead:
Data sharing and distributed control: Complexity
Improved sharing and local autonomy Cost Increased complexity
Improved Reliability and availability Security
Improved performance Integrity control more difficult
Economics: Lack of standards
Expansion (Scalability): Lack of experience
Integration Database design more complex
Remaining competitive