0% found this document useful (0 votes)
10 views184 pages

2.advanced Database System

The document discusses query processing and optimization in relational database management systems, detailing the phases of query processing, including parsing, optimization, and evaluation. It emphasizes the importance of query optimization for efficient execution of SQL queries, aiming to reduce resource usage and improve response times. Additionally, it covers the translation of SQL queries into relational algebra and various operations involved in query processing, such as selection, projection, and join operations.

Uploaded by

wendesennati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views184 pages

2.advanced Database System

The document discusses query processing and optimization in relational database management systems, detailing the phases of query processing, including parsing, optimization, and evaluation. It emphasizes the importance of query optimization for efficient execution of SQL queries, aiming to reduce resource usage and improve response times. Additionally, it covers the translation of SQL queries into relational algebra and various operations involved in query processing, such as selection, projection, and join operations.

Uploaded by

wendesennati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 184

CHAPTER ONE

QUERY PROCESSING AND OPTIMIZATION

BY: ZIGIJU N

February 2014 E.C


Part One
 Query Processing
 Translating SQL Queries into Relational Algebra
 Query Processing Phases
Introduction
Query Processing
Query: is a request for information from a
database.
Query Plans: is an ordered set of steps used
to access data in a SQL relational database
management system.
the process to answer a query to a
 Query processing refers to :
database or an information system,
usually involves:
 interpreting the query,
 searching through the space
storing data, and
 retrieving the results satisfying
3
the query
Query Optimization
Query optimization is the process of selecting an
efficient execution plan for evaluating the query.
A single query can be executed through different
algorithms or re-written in different forms and
structures.
Hence, the question of query optimization comes into
the picture – Which of these forms or pathways is the
most optimal?
The query optimizer attempts to determine the
most efficient way to execute a given query by
considering the possible query plans.
Importance: The goal of query optimization is to
reduce the system resources required to fulfill a query,
and ultimately provide the user with the correct result
set faster.
4
Query Processing
Its main aim is to find information from one or more
databases and deliver it to the user quickly and
efficiently.
Query language allows manipulation and retrieval of
data from a database.
Query processing phases
Parsingincludes:
and translation
 Optimization
 Evaluation
Query Languages is not a programming
languages
QLs not intended to be used for complex
calculations.
QLs support easy, efficient access to large data sets.
There are varieties of Query languages used by
5 relational DBMS for manipulating relations.

Cont…
 For procedural QL, user tells the system exactly what and
how to manipulate the data. E.g. relational algebra
 For non-procedural QL, user states what data is needed
rather than how it is to be retrieved. E.g. relational calculus
Two mathematical Query Languages form the basis for
Relational languages
 Relational Algebra
 Relational Calculus

We may describe the relational algebra as


procedural language: it can be used to tell the DBMS
how to build a new relation from one or more relations
in the database.
We may describe relational calculus as a non-
procedural language: it can be used to formulate the
definition of a relation in terms of one or more
database relations.
6
Translating SQL Queries into
Relational Algebra
 Why Translating SQL Queries into Relational
Algebra
A sequence of relational algebra operations
forms a relational algebra expression,
whose result will also be a relation that
represents
Relationalthe algebra
result of a database query (or
is a theoretical
retrieval request).
language with operations that work on
one or more relations to define another
relation without changing the original
relation.
The output from one operation can
become the input to another operation
(nesting is possible)
7
It uses to create/build up sophisticated
Cont…
Relational algebra is a procedural query
language, which takes instances of relations
as input and yields instances of relations as
output.
It uses operators to perform queries.
An operator can be either unary or binary.

8
be applied on relations on a database based on
the requirement.
Cont…
1.Selection ( ):- Selects a subset of rows from
a relation.
2.Projection (  ):- Deletes unwanted columns
from a relation.
3.Renaming: assigning intermediate relation for a
single operation
4.Cross-Product ( x):- Allows us to combine
two relations.
5.Set-Difference ( - ):- Tuples in relation1, but
not in relation2.
6.Union ):-():- Tuples
Tuples joined in based
from two relations relation1
on a condition or in
relation2.
7.Intersection ():- Tuples in relation1 and in
relation2
8.Join(

9
Cont…
Selection
Selects subset of tuples/rows in a relation that
satisfy selection condition.
It is a unary operator (it is applied to a single
relation)
The Selection operation is applied to each tuple
individually
The degree of the resulting relation is the same as
the original relation but the cardinality (no. of
tuples) is less than or equal to the original relation.
This operator is commutative.
Set of conditions can be combined using Boolean
operations ((AND), (OR), and ~(NOT))
No duplicates in result!
10
Cont…
Schema of result identical to schema of (only)
input relation.
Result relation can be the input for another
relational algebra operation! (Operator
composition.)
Has a syntax:-  <Selection Condition><Relation
Name>
Example: Find all Employees with skill type of
Database could be written as .
<SkillType =”Database”> (Employee) in
relational algebra.
This could be written in SQL as
SELECT * FROM EMPLOYEE WHERE
SkillType=Database”;
11
Cont…
Projection
Selects certain attributes while discarding the
other from the base relation.
The PROJECT creates a vertical partitioning – one
with the needed columns (attributes) containing
results of the operation and other containing the
discarded Columns.
Deletes attributes that are not in projection
list.
Schema of result contains exactly the fields in the
projection list, with the same names that they had in
the (only) input relation.
Projection operator has to eliminate duplicates!
Note: real systems typically don’t do duplicate elimination
12 unless the user explicitly asks for it.
Cont…
If the Primary Key is in the projection list, then
duplication will not occur
Duplication removal is necessary to insure that the
resulting table is also a relation.
Has syntax:-
 <Selected Attributes><Relation Name>
Example: To display Name, Skill, and Skill Level of
an employee, the query and the resulting relation
will be:
 In RA:-

<FName, LName, Skill, Skill_Level>(Employee)


Equivalent to SQL is
SELECT fname, lname, skill, skilllevel FROM
13
EMPLOYEE
Cont…
Rename Operation
We may want to apply several relational
algebra operations one after the other.
The query could be written in two different
forms:
Write the operations as a single
relational algebra expression by nesting
the operations.
Apply one operation at a time and
create intermediate result relations.
In the latter case, we must give names to
the relations that hold the intermediate
resultsRename Operation
14
Cont…
If we want to have the FName, LName, Skill, and Skill
Level of an employee with salary greater than 1500 and
working for department 5, we can write the expression
for this query using the two alternatives:
A single algebraic expression:
The above used query is using a single algebra operation,
which is:
 <FName, LName, Skill, Skill_Level> ( <Skill=”SQL” SkillLevel>5>(Employee))
Using an intermediate relation by the Rename Operation:
Step1: Result1   <DeptNo=5  Salary>1500> (Employee)
Step2: Result  <FName, LName, Skill, Skill_Level> (Result1)

Then Result will be equivalent with the relation we get


using the first alternative.
15
Cont…
Set Difference Operations
The three main set operations are the Union,
Intersection and Set Difference.
The properties of these set operations are
similar with the concept we have in
mathematical set theory.
The difference is that, in database context, the
elements of each set, which is a Relation in
Database, will be tuples.
The set operations are Binary operations which
demand the two operand Relations to have
type compatibility feature.

16
Cont…
Set Difference (or MINUS) Operation or complements
Denoted by R - S, mean that a relation that
includes all tuples that are in R but not in S.
The two operands must be "type compatible"
Eg: Employees who attend Database
Course but didn’t take any course at AAU
RelationOne–RelationTwo
RelationOne: Employees who attend Database Course
i.e select * from Employee where skillType=Database;
RelationTwo: Employees who attend a course in AAU
i.e in SQL
select * from Employee where school=”AAU”;
17
Cont…
CARTESIAN Operation (Cross Product-X)
This operation is used to combine tuples from two
relations in a combinatorial fashion.
That means, every tuple in Relation1(R) one will be
related with every other tuple in Relation2 (S).
In general, the result of R(A , A , . . ., A )
1 2 n x
S(B1,B2, . . ., Bm) is a relation Q with degree n +
m attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in
that order , where R has n attributes and S has m
attributes.
The resulting relation Q has one tuple for each
combination of tuples—one from R and one from S.
Hence, if R has ntuples, and S has m tuples, then |
18 R x S | will have n* m tuples.
Cont…

19
Cont…
Example:
 To extract employee information about managers of the
departments, the algebra query using the JOIN
operation will be:

EQUIJOIN Operation
The most common use of join involves a condition
with equality comparisons only ( = ).
Such a join, where the only comparison operator
used is called an EQUIJOIN.
In the result of an EQUIJOIN we always have one or
more pairs of attributes (whose names need not be
identical) that have identical values in every tuple
20 since we used the equality logical operator.
Cont…
NATURAL JOIN Operation
The standard definition of natural join
requires that the two join attributes, or each
pair of corresponding join attributes, have
the same name in both relations.
If this is not the case, a renaming operation
on the attributes is applied first.

21
Cont…
OUTER JOIN Operation
OUTER JOIN is another version of the JOIN
operation where non matching tuples from a
relation are also included in the result with
NULL values for attributes in the other
relation.
RIGHT
There are two major
OUTER types
JOIN: of OUTER
where non JOIN.
matching tuples from the second (Right)
relation are included in the result with
NULL value for attributes of the first
(Left) relation.
LEFT OUTER JOIN: where non matching
22
tuples from the first (Left) relation are
included in the result with NULL value for
Cont…
When two relations are joined by a JOIN
operator, there could be some tuples in the
first relation not having a matching tuple
from the second relation, and the query is
interested to display these non matching
tuples from the first or second relation.
Such query is represented by the OUTER
JOIN.
R
Notation for Left Outer
<Join S
Condition Join:
>

23
Cont…
SEMIJOIN Operation
SEMI JOIN is another version of the JOIN
operation where the resulting Relation will
contain those attributes of only one of the
Relations that are related with tuples in the
other Relation.
The following notation depicts the inclusion
of only the attributes form the first relation
(R) in the result which are actually
participating in the relationship.

24
Relational Calculus
A relational calculus expression creates a
new relation, which is specified in terms of
variables that range over rows of the stored
database relations (in tuple calculus) or
over columns of the stored relations (in
domain calculus).
 In a calculus expression, there is no order
of operations to specify how to retrieve the
query result.
A calculus expression specifies only what
information the result should contain rather
than how to retrieve it.
25
Cont…
The difference of relational calculus from
relational algebra is:-calculus,
In Relational there is no
description of how to evaluate a query;
it focuses on what information the
query has?
Relational calculus is considered to be
a nonprocedural language, but the
relational algebra uses procedural
approach .

26
Cont…
In general, query Processing can be divided
into four main phases:
 Decomposition,
 Optimization,
 Code generation,
and
 Execution

27
Cont…
Query Decomposition
Query decomposition is the process of
transforming a high level query(SQL) into a
relational algebra query (Low level query),
and to check that the query is syntactically
and semantically correct by decomposing
using parsing and validation.
It has relational Algebra query inputs.

28
Cont…
Typical stages in query decomposition are:
Analysis: lexical and syntactical analysis of the
query (correctness).
Normalization: convert the query into a
normalized form.
Semantic Analysis: to reject normalized queries
hat are not correctly formulated or contradictory.
Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform
the query to a semantically equivalent but more
easily and effectively computed form.
Query Restructuring: Re arranging nodes so that
the most restrictive condition will be executed
first.
29
Cont…
Query Processing Steps

30
Part Two
 Query Optimization
 Approaches to Query Optimization/ Algorithms/
 Semantic Query Optimizations
Query Optimization
 Query optimization is of great importance for
the performance of a relational database,
especially for the execution of complex SQL
statements.
 A query optimizer decides the best methods for
implementing each query.
 A single query can be executed through
different algorithms or re-written in different
forms and structures.
 Hence, the question of query optimization
comes into the picture – Which of these forms
or pathways is the most optimal?
 The query optimizer attempts to determine the
32 most efficient way to execute a given query by
Cont…
Purposes of Query Optimization
The goal of query optimization is to reduce the
system resources required to fulfill a query, and
ultimately provide the user with the correct
First,
result set it
faster.
provides the user with faster
results, which makes the application seem
faster to the user.
Secondly, it allows the system to service
more queries in the same amount of time,
because each request takes less time
than unoptimized queries.
Thirdly, query optimization ultimately
reduces the amount of wear on the
33
hardware (e.g. disk drives), and allows the
Cont…
For optimizing the execution of a query the
programmer must know:
File organization
Record access mechanism and primary or
secondary key.
Data location on disk.
Data access limitations.

34
Cont…
To write correct and efficient code,
application programmers need to know
how data is organized physically (e.g.,
which indexes exist) and worry about
data/workload characteristics
Query Optimization uses:-
 To make query evaluation faster.
 To reduce the response time of the query
processor.
 To allow the user write queries without being
aware of the physical access mechanisms and
without asking her/his to explicitly dictate the
35 system how the queries should be evaluated.
Approaches to Query Optimization/
Algorithms/
A. Heuristics Approach
The heuristic approach uses the knowledge
of the characteristics of the relational
algebra operations and the relationship
between the operators to optimize the
query.
This method is also known as rule based
optimization.
This is based on the equivalence rule on
relational expressions; hence the number of
combination of queries get
 Properties of reduces here.
individual
Hence the operators
cost of the query too reduces.
36
 Association between operators
Thus theheuristic
Query Treeapproach of optimization
Cont…
 Query Tree: a graphical representation of the
operators, relations, attributes and predicates
and processing sequence during query
processing.
 Query
 Thetree is composed of three main parts:
Leafs: the base relations used for
processing the query/ extracting the
required information
 The Root: the final result/relation as
an out put based on the operation on
the relations used for query processing
 Nodes: intermediate results or
relations before reaching the final
 Sequence
result.of execution of operation in a query
37
tree will start from the leaves and continues to
the intermediate nodes and ends at the root.
Cont…
Query graph: a graph data structure that
corresponds to a relational calculus expression.
It does not indicate an order on which
operations to perform first.
There is only a single graph corresponding to
each query.
The properties of each operations and the
association between operators is analyzed
using set of rules called TRANSFORMATION
RULES Which uses to transform the query to
relatively good execution strategy.

38
Cont…
Example:
After the SQL query is parsed and it is
syntactically correct, then it is mapped onto
Relational Algebra (RA) expression.
Usually shown as a query tree (bottom up).
Consider the SQL query that are done on
Reserves and sailors tables.
SELECT S.Sname FROM Reserves R, Sailors S
WHERE R.SID = S.SID AND R.BID = 100 AND
S.Rating> 5
The same query in RA:
Sname (BID=100 and Rating > 5(Reserves
39
⋈SID=SIDSailors))
Cont…
Sname (BID=100 and Rating > 5(Reserves ⋈SID=SIDSailor

π sname

σ bid = 100 and rating > 5

sid=sid

Reserves Sailors
40
Equivalence Rules/Transformation Rules/ for
Relational Algebra
1. Cascade(flow) of SELECTION: conjunctive
SELECTION Operations can cascade into individual
Selection Operations and Vice Versa
(c1c2c3) (R)=  c1( c2( c3(R)) where ci is a
predicate

2. Commutativity of SELECTION operations


 c1( c2(R))=  c2( c1(R))where ci is a
predicate

3. Cascade of PROJECTION:
in the sequence of PROJECTION Operations, only
the last in the sequence is required
41
L1L2L3L4(R)=L4(R)
Cont…
4. Commutativity of SELECTION with PROJECTION
and Vise Versa
If the predicate c1 involves only the attributes in
the projection list (L1), then the selection and
projection operations commute
<a1,a2..an>( c1(R))=  c1(<a1,a2,,,,an>(R)) ,

Where c1€
{a1,a2…an}
5. Commutativity of THETA JOIN/Cartesian
Product
R X S is equivalent to S X R but mathematically
incorrect
42Also holds for Equi-Join and Natural-Join
Cont…

43
Cont…

44
B. Cost Estimation Approach
This is based on the cost of the query.
The query can use different paths based on
indexes, constraints, sorting methods etc.
This method mainly uses the statistics like
record size, number of records, number of
records per block, number of blocks, table
size, whether whole table fits in a block,
organization of tables, uniqueness of column
values, size of columns etc.
The main idea is to minimize the cost of
processing a query.
The cost function is comprised of:
45
I/O cost + CPU processing cost +
Cont…
The DBMs will use information stored in the
system catalogue for the purpose of
estimating cost.
The main target of this query optimization
is to minimize the size of the intermediate
relation.
Disk Access
The size/storage cost/ will have effect in the
Data Transpiration
cost of:
Storage space in the Primary
Memory
Writing on Disk

46
Cont…
The followings also used for cost estimations

 Cardinality of a relation: the number of


tuples contained in a relation currently
(r)
 Degree of a relation: number of
attributes of a relation
 Number of tuples on a relation that can
be stored in one block of memory
 Total number of blocks used by a
relation
 Number of distinct values of an attribute
(d)
 Selection Cardinality of an attribute (S):
47
that is average number of records that
Cont…
Information about the size of a file
 number of records
(tuples) (r),
 record size (R),
 number of blocks (b)
 blocking
Information factor
about (bfr) and indexing
indexes
 Number
attributes of a of
file
levels (x) of each multilevel
index
 Number of first-level index blocks (bI1)
 Number of distinct values (d) of an
attribute
 Selectivity (sl) of an attribute
48  Selection cardinality (s) of an attribute.
Cost Components for Query Optimization
The costs of query execution can be calculated for the
following major process we have during processing.
Access Cost of Secondary Storage
Data is going to be accessed from secondary storage, as
an query will be needing some part of the data stored in
the database.
The disk access cost can again be analyzed in terms of:
earching
eading, and
Writing, data blocks used to store some portion of a r
The disk access cost will vary depending on the file
organization used and the access method implemented
for the file organization.
In addition to the file organization, the data allocation
scheme, whether the data is stored contiguously or in
49
scattered manner, will affect the disk access cost.
Cont…
Storage Cost
While processing a query, as any query would
be composed of many database operations,
there could be one or more intermediate results
before reaching the final output.
These intermediate results should be stored in
primary memory for further processing.
The bigger the intermediate relation, the larger
the memory requirement, which will have
impact on the limited available space.
This will be considered as a cost of storage.

50
Cont…
Computation Cost
Query is composed of many operations.
The operations could be database
operations like reading and writing to a disk,
or mathematical and other operations like:
Searching
Sorting
Merging
Computation on
field values

51
Pipelining
Pipelining is another method used for query
optimization.
It is sometime referred to as on-the-fly
processing of queries or also known as stream-
based processing.
As query optimization tries to reduce the size of
the intermediate result, pipelining use a better
way of reducing the size by performing different
conditions on a single intermediate result
continuously.
Thus the technique is said to reduce the number
of intermediate relations in query execution.
Pipelining performs multiple operations on a
52 single relation in a pipeline.
Cont…
Example :Lets say we have a relation on
employee with the following schema
Employee(ID, FName, LName, DoB, Salary,
Position, Dept)
If a query would like to extract supervisors
with salary greater than 2000, the relational
algebra representation of the query will be:
(Salary>2000)  (Position=Supervisor) (Employee)

53
Semantic Query Optimizations
Semantic Query Optimization Uses constraints
specified on the database schema in order to
modify one query into another query that is
more efficient to execute.
Consider the following SQL query,
SELECT E.LNAME, M.LNAME FROM EMPLOYEE E M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
From the above, suppose that we had a
constraint on the database schema that stated
that no employee can earn more than his or her
direct supervisor.
If the semantic query optimizer checks for the
existence of this constraint, it need not execute
the query at all because it knows that the result
54
of the query will be empty.
Execution
An execution plan for a relational algebra query
consists of a combination of the relational
algebra query tree and information about the
access methods to be used for each relation as
well as the methods to be used in computing
the relational operators stored in the tree.

55
56
Chapter two

Database Security
and Authorization
57
Part
 What is DB Security?
One
 Security Issues
 Security Levels
 Computer based security measures
 Authentication and Authorization
 Role of DBA in Database Security
 DB security techniques
 Discretionary security
mechanism
 Mandatory access
 Statistical control.
Database Security

58
What is DB Security?
Security:
is protection from, or resilience against,
potential harm caused by others, by
restraining the freedom of others to act.
Database Security:
refers to the range of tools, controls, and
measures designed to establish and
preserve database confidentiality,
integrity, and availability.
includes a variety of measures used to
secure database management systems
from malicious cyber-attacks and
59 illegitimate use.
Cont…
Database security must address and
protect the following:
The data in the database
The database management system
(DBMS)
Any associated applications
The physical database server and/or
the virtual database server and the
underlying hardware
The computing and/or network
infrastructure used to access the
database
60
Cont…
Database security is a complex and
challenging endeavor that involves all
aspects of information security technologies
and practices.
It’s also naturally at odds with database
usability.
The more accessible and usable the
database, the more vulnerable it is to
security threats; the more invulnerable the
database is to threats, the more difficult it is
to access and use. (This paradox is
sometimes referred to as Anderson’s Rule).

61
Related terms for Security
Privacy :– Ethical and legal rights that
individuals have with regard to control over
the dissemination and user of their personal
information.
Database security :– Protection of
information contained in the database
against unauthorized access, modification
or destruction.
Database integrity: – Mechanism that is
applied to ensure that the data in the
database is correct and consistent

62
Cont…
Features of a good Security Management
System
Data Independence
Minimal
redundancy
Data consistency
Data integrity
Privacy
Integrity
Availability
Copyright
Validity
63
Cont…
Database security and integrity deals
about protecting the database from
being inconsistent and being disrupted
call it database misuse.
Database misuse could be Intentional
or Accidental, where accidental misuse
is easier to cope with than intentional
misuse.
 System crash during transaction
Accidental inconsistency could occur
processing
due to:
 Anomalies due to concurrent
access
 Anomalies due to redundancy
64  Logical errors
Cont…
Intentional misuse could be:
Unauthorized reading of data
Unauthorized modification of
data or
Unauthorized destruction of
data

65
Cont…
Database Integrity:
Database Integrity constraints contribute
to maintaining a secure database system by
preventing data from becoming invalid and
hence giving misleading or incorrect results.
There are different types:-

 Domain Integrity
 Entity Integrity
 Referential Integrity
 Key constraints
 Enterprise Constraint

66
Cont…
 Domain Integrity means that each column in any
table will have set of allowed values and cannot
assume any value other than the one specified in
the domain.
 Entity Integrity means that in each table the
primary key (which may be composite) satisfies
both of two conditions:
o The primary key is unique within the table and
o The primary key column(s) contains no null values.
 Referential Integrity means that in the database
as a whole, things are set up in such a way that if a
column exists in two or more tables in the
database (typically as a primary key in one table
and as a foreign key in one or more other tables).

67
Cont…
Key constraints in a relational database,
there should be some collection of
attributes with a special feature used to
maintain the integrity of the database.
These attributes will be named as Primary
Key, Candidate Key, Foreign Key, and etc
Enterprise Constraint means some
business rules set by the enterprise on how
to use, manage and control the database.

68
Cont…
Database Security:
 Database Security - the mechanisms that
protect the database against intentional or
accidental threats.
 Database security encompasses hardware,
software, people and data.
 Database Management Systems supporting
multi-user database system must provide a
database security and authorization
subsystem to enforce limits on individual
and group access rights and privileges.

69
Levels of Security Measures
Security measures can be implemented
at several levels and for different
components of the system.
Physical Level: concerned with
securing the site containing the
computer system.
Human Level: concerned with
authorization of database users for
access the content at different levels and
privileges.
Operating System Level: concerned
with the weakness and strength of the
70 operating system security on data files.
Cont…
Database System Level: concerned
with data access limit enforced by the
database system.
Access limit like password, isolated
transaction and etc.
Application Level: Since almost all
database systems allow remote access
through terminals or networks, software-
level security with the network software is
as important as physical security, both on
the Internet and networks private to an
enterprise.
71
Security Issues and general
considerations
 Legal, ethical and social issues: regarding
the right to access information.
 Physical control issues: regarding how to
keep the database physically secured.
 Policy issues: regarding privacy of
individual level at enterprise and national
level.
 Operational consideration: on the
techniques used (password, etc) to access
and manipulate the database.
 System level security: including operating
system and hardware control.

72
Cont…
 The designer and the administrator of a database
should first identify the possible threat that
might be faced by the system in order to take
counter measures.
Threat:-
 It may be any situation or event, whether
intentional or accidental, that may adversely
affect a system and consequently the
organization
 It may be caused by a situation or event involving
a person, action, or circumstance that is likely to
bring harm to an organization, where the harm to
an organization may be tangible or intangible.
 Tangible – loss of hardware,
software, or data
73  Intangible –loss of credibility or
Counter measures: Computer Based
Controls
The types of counter measure to threats on
computer systems range from physical
controls to administrative procedures.

74
Cont…
The following are computer-based security
controls for a multi-user environment:
Authorization
have legitimate access to a
system/object
govern not only what system or object a
specified user can access, but also what
the user may do with it
sometimes referred to as access
controls
Views
is the dynamic result of one or more

75
relational operations
is a virtual relation that does not
Cont…
Backup and recovery
Backup is the process of periodically taking a
copy of the database and log file
Recovery is the process of restoring the
database to a correct state in the event of a
failure
Integrity
preventing data from becoming invalid and
giving misleading or incorrect results.

76
Cont…
Encryption
 Authorization may not be sufficient to
protect data in database systems,
especially when there is a situation where
data should be moved from one location to
the other using network facilities.
 Encryption is used to protect information
stored at a particular site or transmitted
between sites from being accessed by
unauthorized users.
 Encryption is the encoding of the data by a
special algorithm.
 The data is unreadable by any program
77 without the decryption key if encrypted.
Cont…
To transmit data securely over insecure
networks requires the use of a
Cryptosystem, which includes:

An encryption key to encrypt the


data (plaintext)
An encryption algorithm that, with
the encryption key, transforms the
plaintext into ciphertext
A decryption key to decrypt the
ciphertext
A decryption algorithm that, with
the decryption key, transforms the
78 ciphertext back into plaintext
Authentication and Authorization
Authentication is the process of verifying
who someone is, whereas authorization is
the process of verifying what specific
applications, files, and data a user has
access to.
Authentication means confirming your own
identity, while authorization means granting
access to the system.
Authorization in system security is the
process of giving the user permission to
access a specific resource or function.

79
Cont…
User authorization on the
database schema
 Index Authorization: deals with
permission to create as well as delete an
index table for relation.
 Resource Authorization: deals with
permission to add/create a new relation in
the database.
 Alteration Authorization: deals with
permission to add as well as delete
attribute.
 Drop Authorization: deals with permission
to delete and existing relation.
80
Role of DBA in Database Security
The database administrator is responsible
to make the database to be as secure as
possible.
The major responsibilities of DBA in relation
to authorization of users are:

 Account Creation
 Security Level
Assignment
 Privilege Grant
 Privilege Revocation
 Account Deletion
81
DB security techniques
 There are two types of DB security
techniques for example:-Discretionary
security mechanism and Mandatory access
control.
 The mechanisms used to grant and revoke
privileges in relational database systems
and in SQL referred to as Discretionary
access control.
 On other round, the mechanisms for
enforcing multiple levels of security, which
is a more recent concern in database
system security that is known as
Mandatory access control.
82
Discretionary security mechanisms
Grant different privileges to different users
and user groups on various data objects to
access different data objects.
The mode of the privilege could be:- Read,
Insert, Delete, Update files, records or
fields.
It is more flexible
The typical method of enforcing
discretionary access control in DBS is based
on granting and revoking of privileges.

83
Mandatory Access Control
Enforce multilevel security
Classifying data and users into various
security classes (or levels) and implementing
the appropriate security policy of the
organization.
Each data object will have certain
classification level
Each user is given certain clearance level
Only users who can pass the clearance level
can access the data object
Is comparatively not-flexible/rigid
If one user can have A but not B then B is
accessed by users with higher privilege and
84 we can not have B but not A
Cont…
It have the following security classes:
Top Secret (TS).
Secret (S).
Confidential (C).
Unclassified (U).
TS is the highest level and U the lowest
level TS > S > C > U.

85
Statistical Database Security
Statistical databases contain information
about individuals, which may not be permitted
to be seen by others as individual records.
Such databases may contain information
about various populations.
Statistical databases should have additional
security techniques which will protect the
retrieval of individual records.
Only queries with statistical aggregate
functions like Average, Sum, Min, Max,
Standard Deviation, Mid, Count, etc should be
executed.
Queries retrieving confidential attributes
should be prohibited.
86
87
Chapter Three

Transaction
Processin
88 By Zigiju N
Part

One
What is transaction?
 State of transaction
 Ways of executing
transaction
 Serializability

89 By Zigiju N
What is Transaction?
A Transaction is a mechanism for applying
the desired modifications/operations to
a database.
A transaction could be a whole program,
part/module of a program or a single
command.
Action, or series of actions, carried out by a
single user or application program, which
accesses or changes contents of database.
Changes made in real time to a database
are called transactions.

90 By Zigiju N
Cont…
A transaction could be composed of one or
more database and non-database
operations.
A database transaction is a unit of
interaction with database management
system or similar system that is treated in a
coherent and reliable way independent of
other transactions.

91 By Zigiju N
Transaction processing system
A system that manages transactions and
controls their access to a DBMS is called a TP
monitor.
A transaction processing system (TPS)
generally consists of a TP monitor, one or
more DBMSs, and a set of application
programs containing transaction.
In a database field, a transaction is a group
of logical operations that must all succeed
or fail as a group.
Systems dedicated to supporting such
operations are known as transaction
processing systems.
92 By Zigiju N
Cont…
 Transactions can be started, attempted,
then committed or aborted via data
manipulation commands of SQL.
 Can have one of the two outcomes for any
transaction:
Success - transaction commits and
database reaches a new consistent state
 Committed transaction cannot be aborted or rolled back.
 How do you discard a committed transaction?

Failure - transaction aborts, and database


must be restored to consistent state before it
started.
 Such a transaction is rolled back or undone.
 Aborted transaction that is rolled back can be restarted
93 later.By Zigiju N
Cont…
 A transaction is expected to exhibit some
basic features or properties to be
considered as a valid transaction.
 These features are:
A: Atomicity
C: Consistency
I: Isolation
D: Durability
 It is referred to as ACID property of a
transaction.
 Without the ACID property, the integrity of
the database cannot be guaranteed.
94 By Zigiju N
Cont…
Atomicity
 Is All or None property
 Every transaction should be considered
as an atomic process which can not be
sub divided into small tasks.
 Due to this property, just like an atom which
exists or does not exist, a transaction has
only two states.
Done or Never Started.
 Done - a transaction must complete
successfully and its effect should be visible
in the database.

95 By Zigiju N
Cont…
 Never Started - If a transaction fails
during execution then all its modifications
must be undone to bring back the database
to the last consistent state, i.e., remove the
effect of failed transaction.
 No state between Done and Never Started
Consistency
 If the transaction code is correct then
a transaction, at the end of its
execution, must leave the database
consistent.
 A transaction should transform a database
from one previous consistent state to
96 another consistent
By Zigiju N state.
Cont…
Isolation
 A transaction must execute without
interference from other concurrent
transactions and its intermediate or partial
modifications to data must not be visible to
other transactions.
Durability
 The effect of a completed transaction
must persist in the database, i.e., its
updates must be available to other
transaction immediately after the end of its
execution, and is should not be affected due
to failures after the completion of the
transaction.
By Zigiju N
97
State of a Transaction
 A transaction is an atomic operation from the
users’ perspective.
 But it has a collection of operations and it can
have a number of states during its execution.
 A transaction can end in three possible states.
 Successful Termination: when a transaction
completes the execution of all operations in it
and reaches the COMMIT command.
 Suicidal Termination: when the transaction
detects an error during its processing and
decide to abrupt itself before the end of the
transaction and perform a ROLL BACK
 Murderous Termination: When the DBMS or
the system force the execution to abort for any
98 By Zigiju N
reason. And hence, rolled back.
Ways of Transaction Execution
 In a database system many transactions are
executed.
 Basically there are two ways of executing a set
of transactions:
Serial Execution:
 In a serial execution transactions are
executed strictly serially.
 Thus, Transaction Ti completes and writes its
results to the database then only the next
transaction Tj is scheduled for execution.
 This means at one time there is only one
transaction is being executed in the system.
 The data is not shared between transactions at

99
one specific
By Zigiju N
time.
Cont…

In Serial transaction execution, one


transaction being executed does not
interfere the execution of any other
transaction.
10 By Zigiju N
0
Cont…
Good things about serial execution
Correct execution, i.e., if the input is
correct then output will be correct.
Fast execution, since all the resources are
available to the active.
The worst thing about serial execution is
very inefficient resource utilization. i.e.
reduced parallelism.

10 By Zigiju N
1
Cont…
Concurrent Execution :
is the reverse of serially executable
transactions, in this scheme the individual
operations of transactions, i.e., reads and
writes are interleaved in some order.
Problems Associated with Concurrent
Transaction Processing
Although two transactions may be correct in
themselves, interleaving of operations may
produce an incorrect result which needs
control over access.

10 By Zigiju N
2
Cont…
Having a concurrent transaction
processing, one can enhance the
throughput of the system.
As reading and writing is performed from
and on secondary storage, the system
will not be idle during these operations if
there is a concurrent processing.

10 By Zigiju N
3
Cont…
 The three potential problems caused by
concurrency are:
Lost Update Problem
 Successfully completed update on a data set
by one transaction is overridden by another
transaction/user.
Uncommitted Dependency Problem
 Occurs when one transaction can see
intermediate results of another transaction
before it is committed.
Inconsistent Analysis Problem
 Occurs when transaction reads several values
but second transaction updates some of them
10 during Byexecution
Zigiju N and before the completion
4
Serializability
The objective of Concurrency Control
Protocol is to schedule transactions in such
a way as to avoid any interference between
them.
This demands a new principle in transaction
processing, which is serializability of the
schedule of execution of multiple
transactions.

10 By Zigiju N
5
Cont…
 In any transaction processing system, if
concurrent processing is implemented, there
will be a concept called schedule having or
determining the execution sequence of
operations in different transactions.
 Schedule: time-ordered sequence of the
important actions taken by one or more
transitions.
 Schedule represents the order in which
instructions are executed in the system
in chronological ordering.
 The scheduler component of a DBMS must
ensure that the individual steps of different
10
transactions
By Zigiju N
preserve consistency.
6
Cont…
Serial Schedule: a schedule where the
operations of each transaction are
executed consecutively without any
interleaved operations from other
transactions.
No guarantee that results of all serial
executions of a given set of transactions will
be identical.
Non-serial Schedule: Schedule where
operations from a set of concurrent
transactions are interleaved.

10 By Zigiju N
7
Cont…
The objective of serializability is to find non-
serial schedules that allow transactions to
execute concurrently without interfering
with one another.
Another objective of serialization is to find
schedules that allow transactions to execute
concurrently without interfering with one
another.

10 By Zigiju N
8
Cont…
In serializability:

If two transactions only read data,


order is not important.
If two transactions either read or
write completely separate data
items, they do not conflict and
order is not important.
If one transaction writes a data
item and another reads or writes
the same data item, order of
execution is important

10 By Zigiju N
9
11 By Zigiju N
0
Chapter Four
Concurrency
Control
Techniques
11 By Zigiju N
1
Part

One
What is concurrency control
 Concurrency controlling
techniques

11 By Zigiju N
2
What is concurrency control?
Concurrency Control is the process of
managing simultaneous operations on the
database without having them interfere
with one another.
Prevents interference when two or more
users are accessing database
simultaneously and at least one is updating
data.
Although two transactions may be correct in
themselves, interleaving of operations may
produce an incorrect result.

11 By Zigiju N
3
Concurrency controlling techniques
Three basic concurrency control techniques:

Locking methods
Time stamping
Optimistic
Both Locking and Time stamping are
conservative approaches: delay
transactions in case they conflict with other
transactions.
The optimistic approach allows us to
proceed and check conflicts at the end.

11 By Zigiju N
4
Locking Method
 The locking method is a mechanism for
preventing simultaneous access on a
shared resource for a critical operation
A LOCK is a mechanism for enforcing
limits on access to a resource in an
environment where there are many threads
of execution.
 Locks are one way of enforcing concurrency
control policies.
 Transaction uses locks to deny access to
other transactions and so prevent incorrect
updates.

11 By Zigiju N
5
Cont…
 Lock prevents another transaction from
modifying item or even reading it, in the
case of a write lock.
 Lock (X): If a transaction T1 applies Lock
on data item X, then X is locked and it is not
available to any other transaction.
Unlock (X): T1 Unlocks X. X is available to
other transactions.

11 By Zigiju N
6
Types of Locks
 Shared lock: A Read operation does not change
the value of a data item.
 Hence a data item can be read by two different
transactions simultaneously under share lock
mode.
 So only to read a data item T1 will do: Share lock
(X), then Read (X), and finally Unlock (X).
 Exclusive lock: A write operation changes the
value of the data item.
 Hence two write operations from two different
transactions or a write from T1 and a read from T2
are not allowed.
 A data item can be modified only under Exclusive
lock.
 To modify a data item T1 will do: Exclusive lock (X),
11 By Zigiju N
7 then Write (X) and finally Unlock (X).
Lock: Basic rules
If transaction has a shared lock on an item,
it can read but not update the item.
If a transaction has an exclusive lock on an
item, it can both read and update the item.
Reads cannot conflict, so more than one
transaction can hold shared locks
simultaneously on same item.
Exclusive lock gives transaction exclusive
access to that item.
Some systems allow transaction to upgrade
a shared lock to an exclusive lock, or vice-
versa.

11 By Zigiju N
8
Locking Method: Problems
Deadlock:
 A deadlock that may result when two (or
more) transactions are each waiting for
locks held by the other to be released.
 Only one way to break deadlock: abort
one or more of the transactions in the
deadlock.
 Deadlock should be transparent to user, so
DBMS should restart transaction(s).
 Two general techniques for handling
deadlock:
Deadlock prevention, and
Deadlock detection and recovery.
11 By Zigiju N
9
Cont…
Timeout
The deadlock detection could be done using
the technique of TIMEOUT.
Every transaction will be given a time to
wait in case of deadlock.
If a transaction waits for the predefined
period of time in idle mode, the DBMS will
assume that deadlock occurred and it will
abort and restart the transaction.

12 By Zigiju N
0
Time-stamping Method
It is a unique identifier created by DBMS
that indicates relative starting time of a
transaction.
Can be generated by:
using system clock at the time of
transaction started, or
Incrementing a logical counter
every
It is also atime when new
concurrency transaction
control protocol that
starts.
orders transactions in such a way that older
transactions, transactions with smaller
time stamps, get priority in the event
of conflict.

12 By Zigiju N
1
Cont…
In time-stamping:

 Transactions ordered globally based


on their timestamp so that older
transactions, transactions with earlier
timestamps, get priority in the event
of conflict.
 Conflict is resolved by rolling back and
restarting transaction.
 Since there is no need to use lock
there will be No Deadlock.
 The schedule is equivalent to the
12
2
particular
By Zigiju N
serial order that
Cont…
 If Ti came to processing prior to T j then TS of
Tj will be larger than TS of Ti.
 Again each data item will have a timestamp
for Read and Write.
 WTS(A) which denotes the largest
timestamp of any transaction that
successfully executed Write(A)
 RTS(A) which denotes the largest
timestamp of any transaction that
successfully executed Read(A)
 These timestamps are updated whenever a
new Read (A) or Write (A) instruction is
executed.
12 By Zigiju N
3
Cont…
Read/write proceeds only if last update on
that data item was carried out by an older
transaction.
Otherwise, transaction requesting
read/write is restarted and given a new
timestamp.
The timestamp ordering protocol
ensures that any conflicting read and
write operations are executed in the
timestamp order.

12 By Zigiju N
4
Cont…
Rules for permitting execution of
operations in Time-stamping Method
Suppose that Transaction Ti issues
Read(A)
If TS(Ti) < WTS(A): this implies that T i needs
to read a value of A which was already
overwritten. Hence the read operation must
be rejected and Ti is rolled back.
If TS(Ti) >= WTS(A): then the read is
executed and RTS(A) is set to the maximum
of RTS(A) and TS(Ti).

12 By Zigiju N
5
Cont…
Suppose that Transaction Ti issues
Write(A)
If TS(Ti) < RTS(A): then this implies that the
value of A that Ti is producing was
previously needed and it was assumed that
it would never be produced. Hence, the
Write operation must be rejcted and Ti is
rolled back.
If TS(Ti) < WTS(A): then this implies that T i
is attempting to Write an object value of A.
hence, this write operation can be ignored.

12 By Zigiju N
6
Cont…
Otherwise the Write operation is executed
and WTS(A) is set to the maximum of
WTS(A) or TS(Ti).
 N.B: A transaction that is rolled back due to
conflict will be restarted and be given a new
timestamp.

12 By Zigiju N
7
Optimistic Technique
 Locking and assigning and checking
timestamp values may be unnecessary for
some transactions
 Assumes that conflict is rare.
 When transaction reaches the level of
executing commit, a check is performed to
determine whether conflict has occurred.
 If there is a conflict, transaction is rolled
back and restarted.
 Based on assumption that conflict is
rare and more efficient to let
transactions proceed without delays to
ensure serializability.
12 By Zigiju N
8
Cont…
At commit, check is made to determine
whether conflict has occurred.
If there is a conflict, transaction must be
rolled back and restarted.
Potentially allows greater concurrency than
traditional protocols.

12 By Zigiju N
9
Cont…
Three phases:
Read
Validation
Write

Optimistic Techniques - Read Phase


Extends from start until immediately before
commit.
Transaction reads values from database and
stores them in local variables. Updates are
applied to a local copy of the data.

13 By Zigiju N
0
Cont…
Optimistic Techniques - Validation Phase
 Follows the read phase.
 For read-only transaction, checks that data
read are still current values. If no
interference, transaction is committed, else
aborted and restarted.
 For update transaction, checks transaction
leaves database in a consistent state, with
serializability maintained.
Optimistic Techniques - Write Phase
 Follows successful validation phase for
update transactions.
 Updates made to local copy are applied to
13 By Zigiju N
1 the database.
13 By Zigiju N
2
Chapter 5
Database Recovery
Techniques
Contents
1.What is database recovery?
2.Database recovery terms
3.Purpose of Database Recovery
4.Types of Failure
5.Transaction Log
6.Data Updates
7.Data Caching
8.Transaction Roll-back (Undo) and Roll-Forward

Slide 19- 134


Database Recovery
• Database recovery is the process of restoring
database to a correct state in the event of a failure.
• A database recovery is the process of eliminating the
effects of a failure from the database.
• Recovery, in database systems terminology, is called
restoring the last consistent state of the data items.
• In other words, it is the process of restoring the
database to the most recent consistent state that existed
shortly before the time of system failure.

Slide 19- 135


Cont…

• The failure may be the result of a system crash due to


hardware or software errors, a media failure such as
head crash, transaction errors, viruses, catastrophic
failure, incorrect commands execution or a software
error in the application such as a logical error in the
program that is accessing the database.
• Recovery restores a database form a given state,
usually inconsistent, to a previously consistent state.

Slide 19- 136


Cont…

• Recovery should protect the database and


associated users from unnecessary problems and
avoid or reduce the possibility of having to duplicate
work manually.

Slide 19- 137


Cont…

• Recovery techniques are heavily dependent upon the


existence of a special file known as a system log.
• It contains information about the start and end of each
transaction and any updates which occur in
the transaction.
• The log keeps track of all transaction operations that
affect the values of database items.

Slide 19- 138


Cont…
The log is kept on disk
• start_transaction(T): This log entry records that
transaction T starts the execution.
• read_item(T, X): This log entry records that
transaction T reads the value of database item X.
• write_item(T, X, old_value, new_value): This log
entry records that transaction T changes the value of
the database item X from old_value to new_value.
• The old value is sometimes known as a before an
image of X, and the new value is known as an
afterimage of X.
Slide 19- 139
Cont…

• commit(T): This log entry records that transaction T


has completed all accesses to the database successfully
and its effect can be committed (recorded
permanently) to the database.
• abort(T): This records that transaction T has been
aborted.
• checkpoint: Checkpoint is a mechanism where all the
previous logs are removed from the system and stored
permanently in a storage disk.
• Checkpoint declares a point before which the DBMS
was in consistent state, and all the transactions were
committed. Slide 19- 140
Cont…

• A transaction T reaches its commit point when all its


operations that access the database have been executed
successfully i.e. the transaction has reached the point
at which it will not abort (terminate without
completing).
• Once committed, the transaction is permanently
recorded in the database.
• Commitment always involves writing a commit entry
to the log and writing the log to disk.

Slide 19- 141


Database Recovery Terms
Undoing:
• If a transaction crashes, then the recovery manager
may undo transactions i.e. reverse the operations of a
transaction.
• This involves examining a transaction for the log entry
write_item(T, x, old_value, new_value) and setting the
value of item x in the database to old-value.

Slide 19- 142


Cont…
Deferred update:
• This technique does not physically update the database
on disk until a transaction has reached its commit
point.
• Before reaching commit, all transaction updates are
recorded in the local transaction workspace.
• If a transaction fails before reaching its commit point,
it will not have changed the database in any way so
UNDO is not needed.

Slide 19- 143


Cont…

• It may be necessary to REDO the effect of the


operations that are recorded in the local transaction
workspace, because their effect may not yet have been
written in the database.
• Hence, a deferred update is also known as the No-
undo/redo algorithm

Slide 19- 144


Cont…
Immediate update:
• In the immediate update, the database may be
updated by some operations of a transaction before
the transaction reaches its commit point.
• However, these operations are recorded in a log on
disk before they are applied to the database, making
recovery still possible.
• If a transaction fails to reach its commit point, the
effect of its operation must be undone i.e. the
transaction must be rolled back hence we require both
undo and redo.
• This technique is known as undo/redo algorithm.
Slide 19- 145
Cont…
Shadow paging:
• It provides atomicity and durability.
• A directory with n entries is constructed, where the ith
entry points to the ith database page on the link.
• When a transaction began executing the current
directory is copied into a shadow directory.
• When a page is to be modified, a shadow page is
allocated in which changes are made and when it is
ready to become durable, all pages that refer to
original are updated to refer new replacement page.

Slide 19- 146


Cont…
Caching/Buffering:
• In this one or more disk pages that include data items
to be updated are cached into main memory buffers
and then updated in memory before being written back
to disk.

Slide 19- 147


Cont…
Backup:
Some of the backup techniques are as follows :
• Full database backup: In this full database including
data and database, Meta information needed to restore
the whole database, including full-text catalogs are
backed up in a predefined time series.
• Differential backup: It stores only the data changes
that have occurred since last full database backup.
• When same data has changed many times since last full
database backup, a differential backup stores the most
recent version of changed data.
Slide 19- 148
Cont…

• Transaction log backup: In this, all events that have


occurred in the database, like a record of every single
statement executed is backed up.
• It is the backup of transaction log entries and contains
all transaction that had happened to the database.
• Through this, the database can be recovered to a
specific point in time.

Slide 19- 149


Cont…

• In this example a full backup of a database (copies of its data files and
control file) is taken at SCN 100. Redo logs generated during the
operation of the database capture all changes that occur between SCN
100 and SCN 500. Along the way, some logs fill and are archived. At SCN
500, the data files of the database are lost due to a media failure. The
database is then returned to its transaction-consistent state at SCN 500,
by restoring the data files from the backup taken at SCN 100, then
applying the transactions captured in the archived and online redo logs
and undoing the uncommitted transactions.
Slide 19- 150
Purpose of Database Recovery

To bring the database into the last consistent state,
which existed prior to the failure.

To preserve transaction properties (Atomicity,
Consistency, Isolation and Durability).
Example:
 If the system crashes before a fund transfer
transaction completes its execution, then either one or
both accounts may have incorrect value.
 Thus, the database must be restored to the state
before the transaction modified any of the accounts.

Slide 19- 151


Types of Failure


The database may become unavailable for use due to:
 System failure: System may fail because of
addressing error, application error, operating system
fault, RAM failure, etc.
 Transaction failure: Transactions may fail
because of incorrect input, deadlock, incorrect
synchronization.
 Media failure: Disk head crash, power disruption,
etc.

Slide 19- 152


Transaction Log
 For recovery from any type of failure data values prior to modification
(BFIM - BeFore Image) and the new value after modification (AFIM –
AFter Image) are required.
 These values and other information is stored in a sequential file called
Transaction log. A sample log is given below.
 Back P and Next P point to the previous and next log records of
the same transaction.
T ID Back P Next P Operation Data item BFIM AFIM
T1 0 1 Begin
T1 1 4 Write X X = 100 X = 200
T2 0 8 Begin
T1 2 5 W Y Y = 50 Y = 100
T1 4 7 R M M = 200 M = 200
T3 0 9 R N N = 400 N = 400
T1 5 nil End
Slide 19- 153
Data Update
 Immediate Update: As soon as a data item is
modified in cache, the disk copy is updated.
 Deferred Update: All modified data items in the cache is
written either after a transaction ends its execution or
after a fixed number of transactions have completed their
execution.
 Shadow update: The modified version of a data item
does not overwrite its disk copy but is written at a
separate disk location.
 In-place update: The disk version of the data item
is overwritten by the cache version.
Slide 19- 154
Data Caching
■ Data items to be modified are first stored into database
cache by the Cache Manager (CM) and after
modification they are flushed (written) to the disk.
■ The flushing is controlled by Modified and Pin-Unpin

bits.
 Pin-Unpin: Instructs the operating system not to flush the
data item.
 Modified: Indicates the AFIM of the data item.

Slide 19- 155


Transaction Roll-back (Undo) and Roll-Forward (Redo)
■ To maintain atomicity, a transaction’s operations are
redone or undone.
 Undo: Restore all BFIMs on to disk (Remove all AFIMs).
 Redo: Restore all AFIMs on to disk.
■ Database recovery is achieved either by performing only
Undos or only Redos or by a combination of the two.
■ These operations are recorded in the log as they happen.

Slide 19- 156


Cont…
Example: Cascading Roll Back

Slide 19- 157


Cont…

Slide 19- 158


Steal/No-Steal and Force/No-Force
Possible ways for flushing database cache to database
disk:
 Steal: Cache can be flushed before transaction
commits.
 No-Steal: Cache cannot be flushed before transaction
commit.
 Force: Cache is immediately flushed (forced) to disk.
 No-Force:
These Cache
give rise to four is deferredways
different until for
transaction
handlingcommits
recovery:
 Steal/No-Force (Undo/Redo)
 Steal/Force (Undo/No-redo)
 No-Steal/No-Force (Redo/No-undo)
 No-Steal/Force (No-undo/No-redo)

Slide 19- 159


Chapter 6

Distributed Database System


Contents
1. DD and DDB Management System(DDBMS)
2. Data allocation techniques in DDB
3. Types of Distributed Database
4. Advantages and disadvantages of DDBS

Slide 19- 162


Distributed Database
• A distributed database is basically a database that is not
limited to one system, it is spread over different sites, i.e,
on multiple computers or over a network of computers.
• A distributed database is a collection of multiple
interconnected databases, which are spread physically
across various locations that communicate via a computer
network.
• A distributed database system is located on various sites
that don’t share physical components.
• This may be required when a particular database needs to
be accessed by various users globally.
• It needs to be managed such that for the users it looks like
Slide 19- 163
Features of DDB
 Some general features of distributed databases are:
 Location independency:
• Data is physically stored at multiple sites and managed by
an independent DDBMS.
 Distributed query processing:
• Distributed databases answer queries in a distributed
environment that manages data at multiple sites.
• High-level queries are transformed into a query execution
plan for simpler management.
 Distributed transaction management:
• Provides a consistent distributed database through commit
protocols, distributed concurrency control techniques, and
distributed recovery methods in case of many transactions
and failures. Slide 19- 164
Cont…
 Seamless integration:
• Databases in a collection usually represent a single logical
database, and they are interconnected.
 Network linking:
• All databases in a collection are linked by a network and
communicate with each other.
 Transaction processing:
• Distributed databases incorporate transaction processing,
which is a program including a collection of one or more
database operations.
• Transaction processing is an atomic process that is either
entirely executed or not at all.

Slide 19- 165


Distributed Database Management System
• A distributed database management system (DDBMS)
is a centralized software system that manages a
distributed database in a manner as if it were all stored
in a single location.
• DDBMS is the software system that permits the
management of a Distributed DB and makes the
distribution transparent to the user.
• DDBMS synchronizes all data operations among
databases and ensures that the updates in one database
automatically reflect on databases in other sites.

Slide 19- 166


Features of DDBMS
 It is used to create, retrieve, update and delete distributed
databases.
 It synchronizes the database periodically and provides access
mechanisms by the virtue of which the distribution becomes
transparent to the users.
 It ensures that the data modified at any site is universally
updated.
 It is used in application areas where large volumes of data are
processed and accessed by numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.

Slide 19- 167


Distributed Database Types
• There are two types of distributed databases:
 Homogeneous
• A homogenous distributed database is a network
of identical databases stored on multiple sites.
• The sites have the same operating system, DDBMS,
and data structure, making them easily manageable.
• Homogenous databases allow users to access data from
each of the databases seamlessly.

Slide 19- 168


Cont…
The following diagram shows an example of a
homogeneous distributed database:

Slide 19- 169


Cont…
Heterogeneous
• A heterogeneous distributed database
uses different schemas, operating systems, DDBMS,
and different data models.
• In the case of a heterogeneous distributed database, a
particular site can be completely unaware of other sites
causing limited cooperation in processing user
requests.
• The limitation is why translations are required to
establish communication between sites.

Slide 19- 170


Cont…
• The following diagram shows an example of a
heterogeneous distributed database:
Cont…
• A distributed database system consists of a collection
of sites, each of which maintains a local database
system (Local DBMS) but each local DBMS also
participates in at least one global transaction where
different databases are integrated together.
• Local Transaction: transactions that access data only
in that single site.
• Global Transaction: transactions that access data in
several sites.

Slide 19- 172


Data allocation in DDB
• Data allocation is the process of deciding where to
allocate/store particular data item.
• There are four alternative strategies regarding the
placement/allocation of data in DDB:
 Centralized,
 Fragmented,
 Complete replication, and
 Selective replication.

Slide 19- 173


Cont…
Centralized:
• This strategy consists of a single database and DBMS
stored at one site with users distributed across the
network (we referred to this previously as distributed
processing).
• Locality of reference is at its lowest as all sites, except
the central site, have to use the network for all data
accesses.
• This also means that communication costs are high.
• Reliability and availability are low, as a failure of the
central site results in the loss of the entire database
Slide 19- 174
Cont…
Fragmented/Partitioned:
• This strategy partitions the database into disjoint
fragments, with each fragment assigned to one site.
• If data items are located at the site where they are used
most frequently, locality of reference is high.
• As there is no replication, storage costs are low;
similarly, reliability and availability are low, although
they are higher than in the centralized case, as the
failure of a site results in the loss of only that site’s
data.
• Performance should be good and communications costs
Slide 19- 175
Cont…
• A relation may be divided into a number of sub
relations, called fragments, which are then distributed.
• There are two main types of fragmentation: horizontal
and vertical.
• Horizontal fragments are subsets of tuples and vertical
fragments are subsets of attributes.
• Allocation. Each fragment is stored at the site with
“optimal” distribution.
• Replication. The DDBMS may maintain a copy of a
fragment at several different sites.
Slide 19- 176
Cont…
• The definition and allocation of fragments are carried
out strategically to achieve the following objectives:
 Locality of reference.
 Improved reliability and availability
 Acceptable performance
 Balanced storage capacities and costs
 Minimal communication costs.

Slide 19- 177


Cont…
• Fragmentation is correct if it fulfils the following three
rules that must be followed during implementation:
 Completeness. If a relation instance R is decomposed into fragments R1,
R2, . . ., Rn, each data item that can be found in R must appear in at least
one fragment. This rule is necessary to ensure that there is no loss of data
during fragmentation.
 Reconstruction. It must be possible to define a relational operation that
will reconstruct the relation R from the fragments. This rule ensures that
functional dependencies are preserved.
 Dis-jointness. If a data item di, appears in fragment Ri, then it should not
appear in any other fragment. Vertical fragmentation is the exception to
this rule, where primary key attributes must be repeated to allow
reconstruction. This rule ensures minimal data redundancy.

Slide 19- 178


Cont…
Complete replication
• This strategy consists of maintaining a complete copy of
the database at each site.
• Therefore, locality of reference, reliability and availability,
and performance are maximized.
• However, storage costs and communication costs for
updates are the most expensive.
• To overcome some of these problems, snapshots are
sometimes used.
• A snapshot is a copy of the data at a given time. The copies
are updated periodically.
• Snapshots are also sometimes used to implement views in a
Slide 19- 179
Cont…
Selective replication:
• This strategy is a combination of fragmentation,
replication, and centralization.
• Some data items are fragmented to achieve high
locality of reference, and others that are used at many
sites and are not frequently updated are replicated;
otherwise, the data items are centralized.
• The objective of this strategy is to have all the
advantages of the other approaches but none of the
disadvantages.
• This is the most commonly used strategy, because of
Slide 19- 180
Comparison of strategies for data allocation
Type of Locality of Reliability Performance Storage Communicati
allocation Reference and Costs
on
Availability
Costs
Centralized Lowest Lowest Unsatisfactory Lowest Highest
Fragmented High Low for Satisfactory Lowest Low
item; high
for system
Complete Highest Highest Best for read Highest High for
replication update; low for
read
High Low for Satisfactory Average Low
Selective
item; high
replication for system

Slide 19- 181


Cont…
Applications of Distributed Database
 It is used in corporate management information
system.
 It is used in multimedia applications.
 Used in Military’s control system, Hotel chains etc.
 It is also used in manufacturing control system.

Slide 19- 182


Advantages and Disadvantages of DDBMSs

ADVANTAGES DISADVANTAGES
Reflects organizational structure Greater Potential for Bugs:
Many existing systems Increased Processing Overhead:
Data sharing and distributed control: Complexity
Improved sharing and local autonomy Cost Increased complexity
Improved Reliability and availability Security
Improved performance Integrity control more difficult
Economics: Lack of standards
Expansion (Scalability): Lack of experience
Integration Database design more complex
Remaining competitive

Slide 19- 183

You might also like