Advanced Database Modules
Advanced Database Modules
COLLEGE OF ENGINEERING
AND TECHNOLOGY
Developed By:
By
1. Girma Asefa (Msc,Information Technology, Lecturer)
1. Girma Asefa (Msc,Information Technology)
2. Abduljebar Kedir(Msc,Information Technology, Lecturer)
2. Abduljebir Kedir(Msc,Information Technology)
3. Petros Haile(Msc,Information Technology, Lecturer)
3. Pertros Haile(Msc,Information Technology)
Reviewed By:
Reviewed By
1. Eleni Shiferaw(Bsc.Information Technology, Assistant Lecturer)
1. Eleni Shiferaw(Bsc.Information Technology)
2. Abebe W/Senbat(Msc,Information Technology, Lecturer )
2. Abebe W/Senbat(Msc,Information Technology )
1|P a g e
ETHIOPIA,HOSAINA 2022
Module: Preface
This course covers file organizations, storage management, query optimization, transaction
management, recovery, and concurrency control, database authorization and security.
Additional topics include distributed databases, mobile databases, and integration may also
be covered. A major component of the course is a database implementation project using
current database languages and systems
This module also introduces the fundamental concepts necessary for designing, using, and
implementing database systems and database applications. Our presentation stresses the
fundamentals of database modeling and design, the languages and models provided by the
database management systems, and database system implementation techniques. The
module is meant to be used as a for a one-semester course in database systems. Our goal is
to provide an in-depth and up-to-date presentation of the most important aspects of database
systems and applications, and related technologies. We assume that readers are familiar with
elementary programming and data structuring concepts and those they have had some
exposure to the basics of computer organization.
Acknowledgments
2|P a g e
Contents
Chapter One: Query Processing And Optimization ................................................................................... 8
3|P a g e
Chapter 3: Transaction Processing Concepts........................................................................................ 66
5|P a g e
7.1.6. Valid Time ........................................................................................................... 139
7.2.Mobility....................................................................................................................... 144
References .....................................................................................................................................................................149
7|P a g e
Chapter One: Query processing and Optimization
CHAPTER OUTCOME
8|P a g e
It is design to facilitate reporting and analysis.
Field
A field is a character or group of characters that have a specific meaning.
It is also called a data item. It is represented in the database by a value.
For Example customer id, name, society and city are all fields for customer Data.
Record
A record is a collection of logically related fields.
For examples, collection of fields (id, name, address & city) forms a record for customer.
2. Database Management System (DBMS)
A database is a collection of related data which represents some aspect of the real world that
is designed for a certain task. A database management system (DBMS) is a software
package designed to define, store, manipulate, retrieve and manage users’ data in a database
by considering appropriate security measures. It consists of a group of programs which
manipulate the database. The DBMS accepts the request (instruction) for data from an
application and instructs the operating system to provide the specific data. It provides an
interface between the data and the software application.
Some other DBMS examples include:
Microsoft Access
MySQL
Oracle
PostgreSQL
dBASE
FoxPro
MongoDB
SQLite
IBM DB2
LibreOffice Base
MariaDB
Microsoft SQL Server etc
9|P a g e
1.1. The DBMS manages three important things
Data accessibility; allows data to be accessed: provides a centralized view of data that can
be accessed by multiple users, from multiple locations, in a controlled manner.
It can limit what data the end user sees, as well as how that end user can view the data,
providing many views of a single database schema.
End users and software programs are free from having to understand where the data is
physically located or on what type of storage media it resides because the DBMS handles
all requests
Manages locking and modification
Defines database schema (the database's logical structure).
Generally, the above three functions provide concurrency, security, data integrity and unifor m
data administration procedures. Typical database administration tasks supported by the DBMS
include change manage me nt, perfor ma nc e monito r ing and tuning, security, and
backup and recovery. Many d a t a b a s e management systems are also responsible
for automated rollbacks and restarts as well as the logging and auditing of activity in
databases.
In a relational database management system (RDBMS), the most widely used type of DBMS
API is SQL, a standard programming language for defining, managing, protecting and
accessing data in an RDBMS. SQL commands are used for communicating with the database.
Everything ranging from creating a table and adding data to modifying a table and setting user
permissions is accomplished using SQL commands.
SQL Commands
There are basically four types of statements that can be executed in SQL Server for different
purposes.
10 | P a g e
This command is used for altering the structure of a database. Typically, the ALTER
command is used either to add a new attribute or modify the characteristics of some existing
attribute. For adding new columns to the table:
ALTER TABLE Student ADD (Address varchar2(20),Age number (2);
For modifying an existing column in the table:
ALTER TABLE Student MODIFY (Name varchar2 (20));
LTER TABLE Student DROP COLUMN Age;
DROP
Used for deleting an entire table from the database and all the data stored in it.
Example: DROP TABLE Student
TRUNCATE
Used for deleting all rows from a table and free the space containing the table.
Example: TRUNCATE TABLE Student;
R E N A M E : Used for renaming a table.
Example: RENAME Student TO Student Details;
11 | P a g e
Used to modify or update the value of a column in a table. It can update all rows or some
selective rows in the table.
UPDATE Student SET Name = “Naa’ol” WHERE Id = 22;
Alter Insert
Delete Select
Index Update
GRANT
Used for granting user access privileges to a database.
GRANT SELECT, UPDATE ON Student TO ABC
This will allow the user to run only SELECT and UPDATE operations on the Student table.
GRANT ALL ON Student TO ABC WITH GRANT OPTION
Allows the user to run all commands on the table as well as grant access privileges to other
users.
REVOKE:
Used for taking back permission given to a user.
REVOKE UPDATE ON Student FROM ABC;
Note: - A user who is not the owner of a table but has been given the privilege to grant
permissions to other users can also revoke permissions.
TCL – Transaction Control Language
Transaction Control Language commands can only be used with DML commands. As these
operations are auto-committed in the database, they can’t be used while creating or dropping
tables. Transaction control statement are used to apply the changes permanently save into
database.
12 | P a g e
COMMIT
Used for saving all transactions made to a database. Ends the current transaction and makes
all changes permanent that were made during the transaction. Releases all transaction
locks acquired on tables.
Example: DELETE FROM Student WHER Age=25;
COMMIT;
ROLLBACK
Used to undo transactions that aren’t yet saved in the database. Ends the Transaction and
undoes al changes made during the transaction. Releases all transaction locks acquired on
tables.
Example: DELETE FROM Student WHERE Age=25;
ROLLBACK;
SAVEPOINT
Used for rolling back to a certain state known as the savepoint. Savepoints
Need to be created first so that they can be used for rollbacking transactions partially.
SAVEPOINT savepoint_name;
Note: - An active savepoint is one that has been specified since the last COMMIT or
ROLLBACK
Command
Note: - An active savepoint is one that has been specified since the last COMMIT or
ROLLBACK
Command.
Summary types of SQL command
Figure 1. 1 Types of SQL Command
13 | P a g e
14 | P a g e
2. Relational Algebra in DBMS
Relational algebra is one of the two formal query languages associated with the relationa l
model. Queries in algebra are composed using a collection of operators. A fundame nta l
property is that every operator in the algebra accepts (one or two) relation instances
as arguments and returns a relation instance as the result.
This property makes it easy to compose operators to form a complex query a relationa l
algebra expression is recursively defined to be a relation, a unary algebra operator
applied to a single expression, or a binary algebra operator applied to two expressions.
We describe the basic operators of the algebra (selection, projection, union, cross-
product, and difference), as well as some additional operators that can be defined in
terms of the basic operators but arise frequently enough to warrant special attention,
in the following sections. Each relational query describes a step-by-step procedure for
computing the desired answer, based on the order in which operators are applied in
the query. The procedural nature of the algebra allows us to think of an algebra expression
as a recipe, or a plan, for evaluating a query, and relational systems in fact use algebra
expressions to represent query evaluation plans.
Every database management system must define a query language to allow users to access
the data stored in the database. Relational Algebra is a procedural query language, which
takes instances of relations as input and yields instances of relations as output from
database tables to access data in different ways. It uses operators to perform queries. An
operator can be
Either unary or binary.
In relational algebra, input is a relation (table from which data has to be accessed) and
output is also a relation (a temporary table holding the data asked for by the user).
15 | P a g e
Relational Algebra works on the whole table at once, so we do not have to use loops etc to
iterate over all the rows (tuples) of data one by one. All we have to do is specify the table
name from which we need the data, and in a single line of command, relational algebra will
traverse the entire given table to fetch data for you.
Relational database systems are expected to be equipped with a query language that can assist
its users to query the database instances. There are two kinds of query languages relationa l
algebra and relational calculus.
The fundamental operations of relational algebra are as follows –
I. Unary Relational Operations
A. SELECT (symbol: σ (Sigma))
This is used to fetch rows (tuples) from table (relation) which satisfies a given conditio n
(predicate)?
Syntax: σp(r)
Where, σ represents the Select Predicate, r is the name of relation (table name in which you
want to look for data), and p is the prepositional logic, where we specify the conditions that
must be satisfied by the data. In prepositional logic, one can use unary and binary operators
Like =, <, > etc, to specify the conditions.
Let's take an example of the Student table we specified above in the Introduction of relationa l
algebra, and fetch data for students with age more than 17.
σage > 17 (Student)
This will fetch the tuples (rows) from table Student, for which age will be greater than 17.
You can also use, and, or etc operators, to specify two conditions, for example,
σage > 17 and gender = 'Male' (Student)
This will return tuples(rows) from table Student with information of male students, of age
more than 17.(Consider the Student table has an attribute Gender too.)
σ topic = "Database" (COURSE)
16 | P a g e
Output - Selects tuples from Tutorials where the topic is 'Database' and 'author' is guru99.
σ sales > 50000 (Customers)
Output - Selects tuples from Customers where sales is greater than 50000
B. PROJECT (symbol: π (pi))
Project operation is used to project only a certain set of attributes of a relation. In simple
words, if you want to see only the names all of the students in the Student table, then
you can use Project Operation.
It will only project or show the columns or attributes asked for, and will also remove duplicate
data from the columns.
Syntax: ∏A1, A2...(r)
Where A1, A2 etc are attribute names (column names).
For example: ∏Name, Age(Student)
Above statement will show us only the Name and Age columns for all the rows of
data in Student table.
It eliminates all attributes of the input relation but those mentioned in the projection list.
The projection method defines a relation that contains a vertical subset of Relation.
II.Relational Algebra Operations from Set Theory
A. UNION (υ): This operation is used to fetch data from two relations (tables) or temporary
relation (result of another operation).
For this operation to work, the relations (tables) specified should have same number
of attributes (columns) and same attribute domain. Also the duplicate tuples are
automatically
Eliminated from the result.
Syntax: A ∪ B
Where A and B are relations.
For example, if we have two tables RegularClass and ExtraClass, both
have a column student to save name of student, then,
∏Student (RegularClass) ∪ ∏Student (ExtraClass)
Above operation will give us name of Students who are attending both regular classes and
extra classes, eliminating repetition.
UNION is symbolized by ∪ symbol. It includes all tuples that are in tables A or in B.
It also eliminates duplicate tuples.
17 | P a g e
For a union operation to be valid, the following conditions must hold -
R and S must be the same number of attributes.
Attribute domains need to be compatible.
Duplicate tuples should be automatically removed.
B. INTERSECTION (∩)
Defines a relation consisting of a set of all tuple that are in both A and B. However, A and B
must be union-compatible.
C. DIFFERENCE (-)
This operation is used to find data present in one relation and not present in the second
relation. This operation is also applicable on two relations, just like
Union operation.
Syntax: A – B where A and B are relations.
For example, if we want to find name of students who attend the regular class but not the
extra class, then, we can use the below operation:
∏Student(RegularClass) - ∏Student(ExtraClass)
The result of A - B, is a relation which includes all tuples that are in A but not inB.
The attribute name of A has to match with the attribute name in B.
The two-operand relations A and B should be either compatible or Union compatible.
It should be defined relation consisting of the tuples that are in relation A, but not in B.
D. CARTESIAN PR OD UCT ( x)
This is us e d t o c o mb in e d a ta f r o m t w o d if f e r e n t relations (tables) into one and
fetch data from the combined relation.
Syntax: A X B
18 | P a g e
For example, if we want to find the information for Regular Class and Extra Class which
are conducted during morning, then, we can use the following operation: σtime = 'morning'
(RegularClass X ExtraClass)
For the above query to work, both RegularClass and ExtraClass should have the attribute
time.
This type of operation is helpful to merge columns from two relations. Generally, a Cartesian
product is never a meaningful operation when it performs alone. However, it becomes
meaningful when it is followed by other operations.
III. Binary Relational
Operations
A. JOIN
Join operation is essentially a cartesian product followed by a selection criterion. Join
operation denoted by ⋈.
JOIN operation also allows joining variously related tuples from differe nt
relations.
Types of Joint
Various forms of join operation are:
1. Inner Joins
In an inner join, only those tuples that satisfy the matching criteria are included, while the
rest are excluded. Let's study various types of Inner Joins:
Theta join: The general case of JOIN operation is called a Theta join. It is denoted by
symbol θ
2. EQUI join: When a theta join uses only equivalence condition, it becomes a equi join.
It is the most difficult operations to implement efficiently in an RDBMS and one
reason why RDBMS have essential performance problems.
Natural join (⋈): Natural join can only be performed if there is a common attribute
(column)
between the relations. The name and type of the attribute must be same.
3. Outer join: In an outer join, along with tuples that satisfy the matching criteria, we
also include some or all tuples that do not match the criteria.
Left Outer Join (A B): In the left outer join, operation allows keeping all tuple in the
left relation. However, if there is no matching tuple is found in right relation, then the
attributes of right relation in the join result are filled with null values.
19 | P a g e
Right Outer Join (A B): In the right outer join, operation allows keeping all tuple in
the right relation. However, if there is no matching tuple is found in the left relation, then
the attributes of the left relation in the join result are filled with null values.
Full Outer Join: In a full outer join, all tuples from both relations are included
in the result, irrespective of the matching condition
Database performance tuning often requires a deeper understanding of how
queries are processed and optimized within the database management system.
In this note we provide a general overview of how query processing (rule based
20 | P a g e
and cost-based query optimizers operate) and then provide some specific examples of
query optimization in commercial DBMS.
In this session we discuss the techniques used by DBMS to process, optimize and execute
high- level queries. A query expressed in high level query language such as SQL must first
be scanned, parsed and validated.
The scanner identifies the language tokens such as Keywods, attribute names, and relation
names whereas parser checks the query syntax to determine whether it is formulated
according to the syntax rules (rules of grammar) of the query language.
The query must be validated, by checking that all attribute and relation names are valid and
semantically meaningful names in the schema of particular database being queried. An
internal representation of the query is then created as a tree data structure called query tree.
It is also possible to represent the query using graph data structure called query graph.
The DBMS must then device execution strategy for retrieving the result of query from the
database. A query typically has many possible execution strategies and process of choosing
a suitable one for processing a query is known as query optimization.
4. Query-processing
21 | P a g e
The system executes the query using the optimal strategy generated. In query-optimizatio n,
a SQL query is first translated into an equivalent relational algebra expression using a query
tree data structure before to be optimized.
We will therefore set out below how to pass from a SQL query to an expression in
Relational Algebra. Query is processed in two phases: the query-optimization phase and the
query-processing phase. In order to facilitate the understanding, we will add the query-
compilation phase before the two previous phases because queries are viewed by user as Data
Manipulation Language (DML) scripts.
3.1. Query-compilation: DML processor translates DML statements into low-level
instructions (compiled query) that the query optimizer can understand.
Query Processing would mean the entire process or activity which involves query
translation into low level instructions, query optimization to save resources, cost
estimation or evaluation of query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would
minimize the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations
[which typically needs resources such as memory/RAM].
22 | P a g e
The programmer write code to perform the queries with higher level database query
languages such as SQL and a special component of the DBMS called the Query
Processor takes care of arranging the underlying access routines to satisfy a given query.
A query is processed in the following four general steps:
1. Scanning and Parsing
When a query is first submitted (via an applications program), it must be scanned and
parsed to determine if the query consists of appropriate syntax. Scanning is the process
of converting the query text into a tokenized representation. The tokenized representation
is more compact and is suitable for processing by the parser.
This representation may be in a tree form. The Parser checks the tokenized
representation for correct syntax. In this stage, checks are made to determine if columns
and tables identified in the query exist in the database and if the query has been formed
correctly with the appropriate keywords and structure. If the query passes the parsing
checks, then it is passed on to the Query Optimizer.
2. Query Optimization or planning the execution strategy
For any given query, there may be a number of different ways to execute it. Each
operation in the query (SELECT, JOIN, etc.) can be implemented using one or more
different Access Routines.
For example, an access routine that employs an index to retrieve some rows would be
more efficient that an access routine that performs a full table scan.
The goal of the query optimizer to find a reasonably efficient strategy for executing the
query (not quite what the name implies) using the access routines. Optimiza tio n
typically takes one of two forms:
Heuristic Optimization or Cost Based Optimization
In Heuristic Optimization, the query execution is refined based on heuristic rules for
reordering the individual operations. With Cost Based Optimization, the overall cost of
executing the query is systematically reduced by estimating the costs of executing several
different execution plans.
3. Query Code Generator (interpreted or compiled)
Once the query optimizer has determined the execution plan (the specific ordering of
access routines), the code generator writes out the actual access routines to be executed.
With an interactive session, the query code is interpreted and passed directly to the
runtime database processor for execution. It is also possible to compile the access
routines and store them for later execution.
23 | P a g e
4. Execution in the runtime database processor
At this point, the query has been scanned, parsed, planned and (possibly) compiled. The
runtime database processor then executes the access routines against the database. The
results are returned to the application that made the query in the first place. Any runtime
errors are also returned.
The major steps involved in query processing are depicted in the figure
below;
Let us discuss the whole process with an example. Let us consider the following two relations
as the example tables for our discussion;
Employee (Eno, Ename, Phone) Proj_Assigned(Eno, Proj_No, Role, DOP)
Where,
Eno is Employee number, Ename is Employee name,
Proj_No is Project Number in which an employee is
assigned; Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are working
in a project which is more than 10 months old.
SELECT Ename FROM Employee, Proj_Assigned WHERE Employee. Eno =
Proj_Assigned. Eno AND DOP > 10;
Input:
A query written in SQL is given as input to the query processor. For our case, let us consider
the SQL query written above.
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the query, the
user’s privileges to execute the query, the table names and attribute names, etc. The correct
24 | P a g e
table names attribute names and the privilege of the users can be taken from the system
catalog (data dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low
level instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are
information about the size of the table, the length of records, the indexes created on the
table, etc. Optimizer also checks for the conditions and conditional attributes which are parts
of the query.
Step 4: Execution Plan
A query can be expressed in many ways. The query processor module, at this stage, using
the information collected in step 3 to find different relational algebra expressions that are
equivalent and return the result of the one which we have written already.
For our example, the query written in Relational algebra can also be written as the one given
below;
πEname (Employee ⋈Eno (σDOP>10
(Prof_Assigned)))
So far, we have got two execution plans. Only condition is that both plans should give the
same
Result
Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return
same result (obvious), they differ in terms of Time consumption to execute the query, or
the Space required executing the query. Hence, it is mandatory to choose one plan which
obviously consumes less cost.
At this stage, we choose one execution plan of the several we have developed. This Executio n
plan accesses data from the database to give the final result.
25 | P a g e
In our example, the second plan may be good. In the first plan, we join two relations (costly
operation) then apply the condition (conditions are considered as filters) on the joined
relation. This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the
Employee table. This join may need to compare less number of records. Hence, the second
plan is the best (with the information known, not always).
Example: See the following schema
Sailors (sid: integer, sname: string, rating: integer, age:
real) Reserves (sid: integer, bid: integer, day: dates,
rname: string)
Reserves:
Each tuple is 40 bytes long
100 tuples per page 1 00 pages. Sailors:
Each tuple is 50 bytes long
80 tuples per page
500 pages.
Consider the following SQL query:
SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100
AND S.rating > 5
This query can be expressed in relational algebra (RA) as follows:
26 | P a g e
Query Expressed as a Relational Algebra Tree
EXAMPLE
SELECT V,Vno, Vname, count(*), sum(Amount) FROM Vendor V, Transaction T
WHERE V,Vno=T,Vno AND V,Vno berween 1000 AND 2000 GROUP BY V,Vno,
Vname HAVING sum(Amount)>100
➢ S c a n the Vendor table, select all tuples where Vno = [1000, 2000], eliminate attributes
other than Vno and Vname, and place the result in a temporary relation R1
➢ J o i n the tables R1 and Transaction, eliminate attributes other than Vno, Vname, and
Amount,
and place the result in a temporary relation R2. This may
involve:
✓ sorting R1 on Vno
✓ sorting Transaction on Vno
✓ merging the two sorted relations to produce R2
➢ p e r f o r m grouping on R2, and place the result in a temporary relation R3. This
may involve:
✓ sorting R2 on Vno and Vname
✓ grouping tuples with identical values of Vno and Vname
27 | P a g e
✓ counting the number of tuples in each group, and adding their Amounts
➢ S c a n R3, select all tuples with sum(Amount) > 100 to produce the result.
Find the names of employees other than J. Doe who worked on the CAD/CAM project for
either one or two years
SELECT ENAME FROM PROJ P, ASG G, EMP E WHERE G.ENO=E.ENO AND
28 | P a g e
EQUIVALENT QUERY
Example: Tree for a Query Using the relations Bars(name, addr) and Sells(bar, beer, price),
find the names of all the bars that are either on Maple St. or sell Bud for less than $3.
29 | P a g e
Expression Trees: Example.
MovieStar(Name,Address,Gender,Birthdate) StarIn(Title,Year,StarName)
Query: “Find the birthdate and the movie title for those female stars who appeared in movies
in 1996.
SELECT Title, Birthdate FROM MovieStar, StarIn WHERE Year=1996 AND
Gender=’F’ ND Name=StarName;
30 | P a g e
Chapter 1:- SQL MCQ (Multiple Choice Questions)
.What is the full form of SQL?
A. Structured Query List
B. Structure Query Language
C. Sample Query Language
D. None of these.
1. Which of the following are TCL commands?
A. COMMIT and ROLLBACK
B. UPDATE and TRUNCATE
C. SELECT and INSERT
D. GRANT and REVOKE
2. Which of the following is also called an INNER JOIN?
A. SELF JOIN
B. EQUI JOIN
C. NON-EQUI JOIN
D. None of the above
3. Find the cities name with the condition and temperature from table 'whether' where condition =
sunny or cloudy but temperature >= 60.
A. SELECT city, temperature, condition FROM weather WHERE condition = 'cloudy' AND
condition = 'sunny' OR temperature >= 60
B. SELECT city, temperature, condition FROM weather WHERE condition = 'cloudy' OR condition
= 'sunny' OR temperature >= 60
C. SELECT city, temperature, condition FROM weather WHERE condition = 'sunny' OR condition
= 'cloudy' AND temperature >= 60
D. SELECT city, temperature, condition FROM weather WHERE condition = 'sunny' AND condition
= 'cloudy' AND temperature >= 60
4. Which of the following statement is correct to display all the cities with the condition, temperature,
and humidity whose humidity is in the range of 60 to 75 from the 'whether' table?
A. SELECT * FROM weather WHERE humidity IN (60 to 75)
B. SELECT * FROM weather WHERE humidity BETWEEN 60 AND 75
C. SELECT * FROM weather WHERE humidity NOT IN (60 AND 75)
D. SELECT * FROM weather WHERE humidity NOT BETWEEN 60 AND 75
5. Reducing the complexity of complex queries by similarly handling sub-queries is known as ___
A. Complex query handling
31 | P a g e
B. Multi query optimization
C. Complex query optimization
D. Parametric query optimization
32 | P a g e
Chapter 2: Database Security and Authorization
CHAPTER OUTCOME
33 | P a g e
2.2. Security Issues
What are we trying to protect by ensuring database security?
What levels of information need to be safeguarded and how?
What are the types of problems and threats that deserve special attention?
Can we distinguish between threats from outside and internal threats?
Do these require different types of protection mechanisms?
What are the solution options?
How is protection of privacy related to database security?
Let us address these broad questions before getting into specific access control techniques.
Many organizations are opening up their database systems for access over the Internet. This
openness results in great advantages but, at the same time, makes the database system
vulnerable to threats from a much wider area. Web security demands special attention.
34 | P a g e
Figure 2. 1 Database security system
Shelter of privacy. Shelter the privacy of individuals and institutions about whom data reside in
the database.
Identification of users. Be able to positively identify authorized users.
Authorization of users. Guarantee access to authorized users.
Scope of authorization. Be able to authorize individual users for specific portions of the database
as needed.
Levels of authorization. Provide individual users with particular authorization levels to read,
update, add, or delete data.
Monitoring of usage.Be able to monitor access by authorized users to keep audit trails for tracing
actions.
36 | P a g e
Minimize the probability of the problem happening. Establish enough protection rings to
enclose the database system. Take all the necessary protective measures and institute
strong deterrents.
Diminish the damage if it happens. If an intruder manages to penetrate the outer layer of
protection, make it progressively difficult to cut through the inner layers. Guard the most
sensitive portions of the database with the most stringent security measures.
Devise precise recovery schemes. If a vandal manages to destroy some parts of the
database, have a tested method to recover from the damage. If a fire destroys your
database, plan to be able to restore from a copy stored off-site.
When you examine the types of threats, you will notice that most of the recovery solutio ns
must be a combination of general control procedures and computer-based techniques. Let us
explore the nature of these two types of solution method
37 | P a g e
Computer-Based Techniques. Now let us turn our attention to the types of
countermeasures that are executed through the use of the computer system including the
DBMS. Here is a list of the major techniques:
Authorization of users. Includes authentication of authorized users and granting of access
privileges to them.
Tailoring authorization through views. Defining user views to have the ability to
authorize users for specific portions of the database.
Backup and recovery. Creation of backup copies of the database at regular intervals and
also testing and implementing recovery procedures.
Protection of sensitive data. Use of encryption technology to protect sensitive data. All
DBMSs have security systems to guarantee database access to authorized users.
Commonly, these security mechanisms are referred to as discretionary and mandatory
security mechanisms. Let us define the scope of this division:
Discretionary security mechanisms. Used for granting and revoking data access
privileges to users for accessing specific parts of a database in any of the access modes of
read, update, add, and delete.
Mandatory security mechanisms. Used for establishing security at multiple levels by
classifying users into distinct groups and grouping data into distinct segments and,
thereafter, assigning access privileges for particular user groups to data segments.
From our discussions so far, you must have concluded that database security is critical but also
difficult. You must look toward enforcing database security at different levels. Security
mechanisms must exist at several layers such as within the database system itself, at the leve l
of the operating system, the network, the application, the hardware, and so on. Figure 4 clearly
illustrates the layers of control for database security.
38 | P a g e
Data privacy fits into data security in an unorthodox manner. Data security is generally thought
of as the protection of a company’s data from unauthorized access. Who authorizes access,
and who decides on how and to whom access must
be granted? Of course, the company does this because it is deemed that the company owns the
data in the database. In the same way, data privacy may be thought of as protecting informa tio n
about employees, customers, suppliers, and distributors from unauthorized access. Who
decides on this authorization? Naturally, the owners must make the decision. Who are the
owners—the company or those about whom information is collected and stored? Privacy
issues are becoming more and more sensitive in North America, as they have been in Europe
for some time. Legislation about privacy and confidentiality of information varies from region
to region. Some basic rights are available to those about whom data is retained in corporate
databases.
Individuals and institutions may inquire about what information about them is stored and may
demand to correct any information about them. Privacy concerns escalate with the widespread
use of the Internet. Although formal regulations may not be adequate, organizations are
ethically obliged to prevent misuse of the information they collect about individuals and third -
party institutions.
39 | P a g e
2.8. Access Control
Essentially, database security rests on controlling access to the database system. Controlling
physical access forms one part of database security. The other major part consists of
controlling access through the DBMS. Let us consider two primary dimensions of access
control. One dimension of access control deals with levels of data access. A single user or a
category of users may be granted access privileges to database objects at various levels of
detail.
Another dimension of access control refers to the modes or types of access granted to a single
user or to a category of users. How do you grant access privileges to a single user or user
category? This leads to the two basic approaches to access control.
As noted above, the DBMS provides two basic approaches to access control: discretionar y
control and mandatory control. Discretionary access control refers to the granting of privile ges
or rights to individual users. Although discretionary access control is fairly effective, it is
possible for an unauthorized user to gain privileges through an unsuspecting authorized user.
Mandatory access control is more effective in overcoming the defects of discretionary access
control.
We will first discuss how data access control pertains to levels of database objects and access
types or modes. Data levels and access types form a grid, and access privileges may be granted
at the intersections of data levels and access types. Our discussion will continue on the
mechanisms for granting access privileges under the discretionary or mandatory access control
approaches. In this section, you will also study the two important topics of authentication and
authorization of users.
Examine the following list of possible ways of granting of access privileges to a specific user:
User has unlimited access privileges to the entire WORKER relation.
User has no access privileges of any kind to any part of the WORKER relation.
User may only read any part of WORKER relation but cannot make any changes at all.
User may read only his or her row in the relation but cannot change any columns in that row.
40 | P a g e
User may read only his or her row in the relation but can change only the Name and
Address columns.
User may read only the WorkerId, Name, Address, and SuperId columns of any record but
can change only the Name and Address columns.
User may read only the WorkerId and WageRate columns of any record but can modify
the WageRate column only if the value is less than 5.00.
User may read all columns of any record but can modify the WageRate only if the SuperId
column value is the value of WorkerId of that user.
The above list is in no way exhaustive. Yet you can readily observe that a general method of
security enforcement must possess a great range and flexibility. A flexi ble security system in
a DBMS must be able to grant privileges at the following data levels:
The whole database
Individual relation; all rows and all columns
All rows but only specific columns of a relation
All columns but only specific rows of a relation
Specific rows and specific columns of a relation
Now let us move on to the consideration of modes or types of data access. You are familiar
with access types or modes of create, read, update, and delete (sometimes indicated by the
acronym CRUD). Let us expand the list of access types to include all types:
Insert or Create. Add data to a file without destroying any data.
Read. User may read and copy data from the database into the user’s environment through an
application program or a database query.
Update. Write updated values.
Delete. Delete and destroy specific data objects.
Move.Move data objects without the privilege of reading the contents.
Execute.Run a program or procedure with implied privileges needed for the execution.
Verify Existence.Verify whether a specific database objects exists in the database. You have
noted the various access types and also the levels of data eligibility based on which access
privileges may be granted. What is your observation from this discussion? What are the
implications? You can easily realize the immense flexibility needed for giving access
privileges. Although numerous variations are possible, most commonly access privileges are
granted to single relations in the CRUD modes.
41 | P a g e
2.8.2. Discretionary Control
As mentioned above, in this approach, individual users are granted privileges or rights to
access specific data items in one or more designated modes. On the basis of the specificatio n
of privileges, a user is given the discretion to access a data item in the read, update, insert, or
delete modes. A user who created a database object automatically derives all privileges to
access the object including the passing on of privileges to other users with regard to that object.
We introduced the SQL commands for granting and revoking access privileges. This is how
SQL supports discretionary access control. Now we will explore the fundamental concepts of
discretionary access control and go over a few more examples.
Basic Levels There are two basic components or levels for granting or revoking access
privileges:
Database Objects Users Data item or data element, generally a base table or view A single
user or a group of users identifiable with some authorization identifier With these two
components, access privileges may be granted as shown in the following general
command:
GRANT privileges ON database object TO users
matrix for the purpose of granting access privileges. Set the users as columns and the database
objects as rows. Then in the cells formed by the intersection of these columns and rows we
can specify the type of privilege granted.
Table presents an example of a type of authorization matrix. Note how this type of presentation
makes it easy to review the access privileges in a database environment.
42 | P a g e
Owner Account. Each database table or relation has an owner. This user account that created
the table possesses all access privileges on that table. The DBA can assign an owner to an
entire schema and grant the appropriate access privileges.
The owner of a database object can grant privileges to another user. This second user can then
pass along the privileges to a third user and so on. The DBMS keeps track of the cycle of
granting of privileges.
The illustrates this cycle of privileges with an authorization graph. Note how the privile ges
are passed along and how the revoking of privileges with cascade option works.
REFERENCES Option the REFERENCES privilege is not the same as the SELECT
privilege. Let us take an example. Suppose Nash is the owner of the DEPARTMENT table as
indicated below:
Nash can authorize Miller to create another table EMPLOYEE with a foreign key in that table
to refer to the DeptNo column in the DEPARTMENT table. Nash can do this by granting
Miller the REFERENCES privilege with respect to the DeptNo column. Note the
EMPLOYEE table shown below:
43 | P a g e
If Miller loses the REFERENCES privilege with respect to the DeptNo column in the
DEPARTMENT table, the foreign key constraint in the EMPLOYEE
Table will be dropped. The EMPLOYEE table itself, however, will not be dropped. Now
suppose Miller has the SELECT privilege on the DeptNo column of the DEPARTMENT table,
not the REFERENCES privilege. In this case, Miller will not be allowed to create the
EMPLOYEE table with a foreign key column referring to DeptNo in the DEPARTMENT
table.
Why not grant Miller the SELECT privilege and allow him to create the EMPLOYEE table
with a foreign key column referring to the DeptNo column in the DEPARTMENT table? If
this is done, assume that Miller creates the table with a foreign key constraint as follows:
With the NO ACTION option in the foreign key specification, Nash is prevented from deleting
rows from the DEPARTMENT table even though he is the owner. For this reason, whenever
such a restrictive privilege needs to be authorized, the more stringent privilege REFERENCES
is applied. The SELECT privilege is therefore intended as permission just to read the values.
44 | P a g e
2.8.3. Use of Views
Earlier we had discussions on user views. A user view is like a personalized model of the
database tailored for individual groups of users. If a user group, say, in the marketing
department, needs to access only some columns of the DEPARTMENT and EMPLOYEE
tables, then you can satisfy their information requirements by creating a view comprising just
those columns.
This view hides the unnecessary parts of the database from the marketing group and shows
them only those columns hey require.
Views are not like tables in the sense that they do not store actual data. You know that views
are just like windows into the database tables that store the data. Views are virtual tables.
When a user accesses data through a view, he or she is getting the data from the base tables,
but only from the columns defined in the view.
Views are intended to present to the user exactly what is needed from the database and to make
the rest of the data content transparent to the user.
However, views offer a flexible and simple method for granting access privileges in a
personalized manner. Views are powerful security tools. When you grant access privileges to
a user for a specific view, the privileges apply only to those data items defined in the views
and not to the complete base tables themselves.
Let us review an example of a view and see how it may be used to grant access privileges. For
a user to create a view from multiple tables, the user must have access privileges on those base
tables. The view is dropped automatically if the access privileges are dropped. Note the
following example granting access privilege to Miller for reading EmployeeNo, FirstName,
LastName, Address, and Phone information of employees in the department where Miller
works.
SQL Examples
45 | P a g e
In this we considered a few SQL examples on granting and revoking of access privileges. Now
we will study a few more examples. These examples are intended to reinforce your
understanding of discretionary access control. We will use the DEPARTMENT and
EMPLOYEE tables shown above for our SQL examples.
DBA gives privileges to Miller to create the schema:
GRANT CREATETAB TO Miller;
46 | P a g e
professional can drill holes into the protection mechanism and gain unauthorized
access.
Note the actions of user Shady indicated in the last few statements of the previous
subsection. Shady has created a private table MYTABLE of which he is the owner. He
has all privileges on this table. All he has to do is somehow get sensitive data into
MYTABLE. Being a clever professional, Shady may temporarily alter one of Miller’s
programs to take data from the EMPLOYEE data and move the data into MYTABLE.
For this purpose, Shady has already given privileges to Miller for inserting rows into the
MYTABLE table.
This scenario appears as too unlikely and contrived. Nevertheless, it makes the
statement that discretionary access control has its limitations.
Mandatory access control overcomes the shortcomings of discretionary access control.
In the mandatory access control approach, access privileges cannot be granted or
passed on by one user to another in an uncontrolled manner. A well -defined security
policy dictates which classes of data may be accessed by users at which clearance levels.
The most popular method is known as the Bell–LaPadula model. Many of the
commercial relational DBMSs do not currently provide for mandatory access control.
However, government agencies, defense departments, financial institutions, and
intelligence agencies do require security mechanisms based on the mandatory control
technique.
Look at the first property, which is fairly intuitive. This property allows a subject to read an
object only if the subject’s clearance level is higher than or equal to that of the object.
47 | P a g e
Try to understand what the second property is meant to prevent. The second property prohibits
a subject from writing to an object in a security class lower than the clearance level of the
subject. Otherwise, information may flow from a higher class to a lower class. Consider a user
with S clearance. Without the enforcement of the star property, this user can copy an object in
S class and rewrite it as a new object with U classification so that everyone will be able to see
the object.
Get back to the case of Shady trying to access data from the EMPLOYEE table by tricking
Miller. The mandatory access control method would spoil Shady’s plan as follows:
Classify EMPLOYEE table as S.
Give Miller clearance for S.
Give Shady lower clearance for C.
Shady can therefore create objects of C or lower classification. MYTABLE will be in class C
or lower. Miller’s program will not be allowed to copy into MYTABLE because
48 | P a g e
who is a user. User Samantha Jenkins is eligible to have access to the human resources
database. So first, Jenkins must be assigned an account or user identification.
The DBMS maintains a user profile for each user account. The profile for Jenkins includes all
the database objects such as tables, views, rows, and columns that she is authorized to access.
In the user profile, you will also find the types of access privileges such as read, update, insert,
and delete granted to Jenkins.
Alternatively, the DBMS may maintain an object profile for each database object. An object
profile is another way of keeping track of the authorizations. For example, in the object profile
for the EMPLOYEE table, you will find all the user accounts that are authorized to access the
table. Just like a user profile, an object profile also indicates the types of access privileges.
Authorization Rules The user profile or the object profile stipulates which user can access
which database object and in what way. These are the authorization rules. By examining
these rules, the DBMS determines whether a specific user may be permitted to perform
the operations of read, update, insert, or delete on a particular database object. You have
already looked at an example of an authorization matrix in table 2. This matrix tends to be
exhaustive and complex in a large database environment.
49 | P a g e
Table 2 presents both options for implementing authorization rules. Note how you can derive
the same rule authorizing Samantha Jenkins to access the EMPLOYEE table with read and
insert privileges.
Enforcing Authorization Rules We have authorization rules in an authorization matrix or in
the form of authorization tables for users or objects. How are the rules enforced by the DBMS?
A highly protected privilege module with unconstrained access to the entire database exists to
enforce the authorization rules.
This is the arbiter or security enforcer module, although it might not go by those names in
every DBMS. The primary function of the arbiter is to interrupt and examine every
database operation, check against the authorization matrix, and either allow or deny the
operation.
Suppose after going through the interrogation sequence, the arbiter has to deny a database
operation. What are the possible courses of action? Naturally, the particular course of action
to be adopted depends on a number of factors and the circumstances. Here are some basic
options provided in DBMSs:
If the sensitivity of the attempted violation is high, terminate the transaction and lock the
workstation.
For lesser violations, send appropriate message to user.
Record attempted security breaches in the log file.
50 | P a g e
Now when she signs on with her user-id and declares that she is Samantha Jenkins, how does
the system know that she is really who she says she is? How can the system be sure that it is
really Samantha Jenkins and not someone else signing on with her user-id? How can the
system authenticate her identity? Authentication is the determination of whether the user is
who he or she claims to be or declares he or she is through the user-id.
It is crucial that the authentication mechanism be effective and failsafe. Otherwise, all the
effort and sophistication of the authorization rules will be an utter waste. How can you ensure
proper authentication? Let us examine a few of the common techniques for authenticatio n.
Passwords. Passwords, still the most common method, can be effective if properly
administered. Passwords must be changed fairly often to deter password thefts. They must
be stored in encrypted formats and be masked while being entered. Password formats need
to be standardized to avoid easily detectable combinations. A database environment with
highly sensitive data may require one-time-use passwords.
Personal information. The user may be prompted with questions for which the user alone
would know the answers such as mother’s maiden name, last four digits of social security
number, first three letters of the place of birth, and so on.
Biometric verification. Verification through fingerprints, voiceprints, and retina images,
and so on. Smartcards recorded with such biometric data may be used.
Special procedures. Run a special authentication program and converse with the user.
System sends a random number m to the user. The user performs a simple set of operations
on the random number and types in the result n. System verify n by performing the same
algorithm on m. Of course, m and n will be different each time and it will be hard for a
perpetrator to guess the algorithm
Hang-up and call-back. After input of user-id, the system terminates the input and
reinitiates input at the workstation normally associated with that user. If the user is there
at that customary workstation and answers stored questions for the user-id, then the system
allows the user to continue with the transaction.
51 | P a g e
System analysis
Programming.
Network
Database
System design
Responsibility of data base administrator
Software installation and Maintenance
Data Extraction, Transformation, and Loading
Specialized Data Handling
Database Backup and Recovery
Database Backup and Recovery
Authentication
Troubleshooting
54 | P a g e
Same sample. Reject series of queries to the same sample set of records.
Query types. Allow only those queries that contain statistical or mathematical functions.
Number of queries. Allow only a certain number of queries per user per unit time.
Query thresholds.Reject queries that produce result sets containing fewer than n records,
where n is the query threshold.
Query combinations. The result set of two queries may have a number of common records
referred to as the intersection of the two queries. Impose a restriction saying that no two
queries may have an intersection larger than a certain threshold number.
Data pollution. Adopt data swapping. In the case of the bank database, swap balances
between accounts. Even if a user manages to read a single customer’s record, the balances
may have been swapped with balances in another customer’s record.
Introduce noise. Deliberately introduce slight noise or inaccuracies. Randomly add
records to the result set. This is likely to show erroneous individual records, but statistica l
samples produce approximate responses quite adequate for statistical analysis.
Log queries. Maintain a log of all queries. Maintain a history of query results and reject
queries that use a high number of records identical to those used in previous queries.
2.9. ENCRYPTION
We have discussed the standard security control mechanisms in detail. You studied the
discretionary access control method whereby unauthorized persons are kept away from the
database and authorized users are guaranteed access through access privileges. You have also
understood the mandatory access control method, which addresses some of the weaknesses of
the discretionary access control scheme. Now you are confident that these two standard
schemes provide adequate protection and that potential intruders cannot invade the database.
However, the assumption is that an infiltrator or intruder tries to break into the system through
normal channels by procuring user-ids and passwords through illegal means. What if the
intruder bypasses the system to get access to the information content of the database? What if
the infiltrator steals the database by physically removing the disks or backup tapes? What if
the intruder taps into the communication lines carrying data to genuine users? What if a clever
infiltrator runs a program to retrieve the data by breaking the defenses of the operating system?
The normal security system breaks down in such cases. Standard security techniques fall short
of expectations to protect data from assaults bypassing the system. If your database contains
sensitive financial data about your customers, then you need to augment your security system
with additional safeguards. In today’s environment of electronic commerce on the Internet,
55 | P a g e
the need for dependable security techniques is all the more essential. Encryption techniques
offer added protection.
What is Encryption?
Simply stated, encryption is a method of coding data to make them unintelligible to an intruder
and then decoding the data back to their original format for use by an authorized user. Some
commercial DBMSs include encryption modules; a few others provide program exits for users
to code their own encryption routines.
Currently, encryption techniques are widely used in applications such as electronic fund
transfers (EFT) and electronic commerce. An encryption scheme needs a cryptosystem
containing the following components and concepts:
An encryption key to code data (called plaintext)
An encryption algorithm to change plaintext into coded text (called ciphertext)
A decryption key to decode ciphertext
A decryption algorithm to change ciphertext back into original plaintext
Figure above shows the elements of encryption. Note the use of the keys and where encryption
and decryption take place.
The underlying idea in encryption dictates the application of an encryption algorithm to
plaintext where the encryption algorithm may be accessible to the intruder. The idea includes
56 | P a g e
an encryption key specified by the DBA that has to be kept secret. Also is included a
decryption algorithm to do the reverse process of transforming ciphertext back into plaintext.
A good encryption technique, therefore, must have the following features:
Fairly simple for providers of data to encrypt
Easy for authorized users to decrypt
Does not depend on the secrecy of the encryption algorithm
Relies on keeping the encryption key a secret from an intruder
Extremely difficult for an intruder to deduce the encryption key
Just to get a feel for an encryption scheme, let us consider a simple example before proceeding
further into more details about encryption. First, we will use a simple substitution method.
Second, we will use a simple encryption key. Let us say that the plaintext we want to encrypt
is the following plaintext:
ADMINISTRATOR
Simple Substitution Use simple substitution by shifting each letter in the plaintext to three
spaces to the right in the alphabetic sequence. A becomes D, D becomes G; and so on. The
resulting ciphertext is as follows:
DGPLQLVWUDWRU
If the intruder sees a number of samples of the ciphertext, he or she is likely to deduce the
encryption algorithm.
Use of Encryption Key This is a slight improvement over the simple substitution method.
Here, let us a use a simple encryption key stipulated as “SAFE.” Apply the key to each four -
character segment of the plaintext as shown below:
ADMINISTRATOR
SAFESAFESAFES
The encryption algorithm to translate each character of the plaintext is as follows:
Give each character in the plaintext and the key its position number in the alphabetic scheme.
The letter “a” gets 1, the letter “z” gets 26, and a blank in the plaintext gets 27. Add the position
number of each letter of plaintext to the position number of the corresponding letter of the key.
Then apply division modulus 27 to the sum. This calculation means dividing a number by 27
and using the remainder to applying to the algorithm. Use the number resulting from the
division modulus 27 to find the letter to be substituted in the ciphertext.
57 | P a g e
Now compare the ciphertexts produced by the two methods and note how even a simple key
and fairly unsophisticated algorithm could improve the encryption scheme.
Encryption Methods
Three basic methods are available for encryption:
Encoding. Most simple and inexpensive method. Here, for important fields, the values of
are encoded. For example, instead of storing the names of bank branches, store codes to
represent each name.
Substitution. Substitute, letter for letter, in the plaintext to produce the ciphertext.
Transposition. Rearrange characters in the plaintext using a specific algorithm. Usually a
combination of substitution and transposition works well. However, techniques without
encryption keys do not provide adequate protection. The strength of a technique depends
on the key and the algorithm used for encryption and decryption. However, with plain
substitution or transposition, if an intruder reviews sufficient number of encoded texts, he
or she is likely to decipher.
On the basis of the use and disposition of encryption keys, encryption techniques fall into
two categories.
1. Symmetric Encryption This technique uses the same encryption key for both
encryption and decryption. The key must be kept a secret from possible intruders. The
technique relies on safe communication to exchange the key between the provider of
58 | P a g e
data and an authorized user. If the key is to be really secure, you need a key as long as
the message itself. Because this is not efficient, most keys are shorter. The Data
Encryption Standard (DES) is an example of this technique.
2. Asymmetric Encryption This technique utilizes different keys for encryption and
decryption. One is a public key known openly, and the other is a private key known
only to the authorized user. The encryption algorithm may also be known publicly. The
RSA model is an asymmetric encryption method.
59 | P a g e
Figure 2. 6 DES: single-key encryption.
Weaknesses Despite the complexity of the encryption algorithm and the sophistication of key
selection, DES is not universally accepted as absolutely secure.
56-bit keys are inadequate. With the powerful and special hardware available now, they
are breakable. Even such expensive hardware is within the reach of organized crime and
hostile governments. However, 128-bit keys are expected to be unbreakable within the
foreseeable future. A better technique known as PGP (pretty good privacy) uses 128-bit
keys. Another possible remedy is double application of the algorithm at each step.
Users must be given the key for decryption. Authorized users must receive the key
through secure means. It is very difficult to maintain this secrecy. This is a major
weakness. Critics point out the following deficiencies:
Public Key Encryption
This technique overcomes some of the problems associated with the DES technique. In
DES you have to keep the encryption key a secret, and this is not an easy thing to
accomplish. Public key encryption addresses this problem. The public key as well as the
encryption algorithm need not be kept secret. Is this like locking the door and making the
key available to any potential intruder? Let us examine the concept.
The widely used public key encryption technique was proposed by Rivest, Shamir, and
Adleman. It is known by the acronym RSA. The RSA model is based on the following
concepts:
60 | P a g e
Two encryption keys are used—one public and the other private.
Each user has a public key and a private key.
The public keys are all published and known openly.
The encryption algorithm is also made freely available.
Only an individual user knows his or her private key.
The encryption and decryption algorithms are inverses of each other.
61 | P a g e
numbers, then it would take about 40 quadrillion years on the same machine to determine the
prime factors of the product!
The public key encryption technique treats data as a collection of integers. The public and
private keys are reckoned for an authorized user as follows:
Choose two large, random prime numbers n1 and n2.
Compute the product of n1 and n2. This product is also known as the limit L, assumed to
be larger than the largest integer ever needed to be encoded.
Choose a prime number larger than both n1 and n2 as the public key P
Choose the private key R in a special way based on n1, n2, and P
[If you are interested, R is calculated such that R * P = 1, modulo (n1–1) * (n2–1).] The limit
L and the public key P are made known publicly. Note that the private key R may be computed
easily if the public key P and the prime numbers n1 and n2 are given. However, it is extremely
difficult to compute the private key R if just the public key P and the limit L are known. This
is because finding the prime factors of L is almost impossible if L is fairly large.
Data Exchange Example Let we consider the use of public key encryption in a banking
application. Here are the assumptions:
Online requests for fund transfers may be made to a bank called ABCD. The bank’s
customer known as Good places a request to transfer $1 million.
The bank must be able to understand and acknowledge the request.
The bank must be able to verify that the fund transfer request was in fact made by customer
Good and not anyone else.
Also, customer Good must not be able to allege that the request was made up by the bank
to siphon funds from Good’s account.
Figure 12 illustrates the use of public key encryption technique showing the banking
transaction. Note how each transfer is coded and decoded.
62 | P a g e
Figure 2. 8 Public key encryption: data exchange.
Assessment 2.1
List the major goals and objectives of a database security system.
Which ones are important?
What are the types of general access control procedures in a database environment?
What is data privacy? What are some of the privacy issues to be addressed
What is discretionary access control?
What are the types of access privileges available to users?
Describe an authorization graph with an example.
DBMS Security and Authorization->Choose Part II
1. Which type of command is GRANT?
A. Transaction Control Language (TCL) command
B. Data Query Language (DQL) command
C. Data Control language (DCL) command
D. Data Definition Language (DDL) command
E. None of these
1. In Oracle, which of the following is a group of privileges that are collected together and
granted to users?
A. Revoke
B. Grant
C. Role
63 | P a g e
D. Synonym
E. View
3. Create, Revoke, Grant and Drop commands are parts of _____________ language in
Oracle.
A. DML
B. DDL
C. Object-Oriented language
D. Procedural language
E. Assembly language
4. What is the SQL statement to grant select, update privileges to the user RAKESH for the
DEPT relation?
A. GRANT DML on DEPT to USER1;
B. GRANT select, update to USER1 on DEPT;
C. GRANT select, update on DEPT to USER1;
D. GRANT on DEPT to USER1 for select, update;
5. Data security threat includes
A. Privacy invasion
B. Hardware failure
C. Fraudulent manipulation of data
D. All of above
6. Locking can be used for
A. Deadlock
B. lost update
C. uncommitted dependency
D. inconsistent data
7. In an Oracle distributed database system, which of the following is the one in which each
server participating in a distributed database is administered independently from all other
databases?
A. Authentication through database links
B. Distributed database security
C. Auditing database links
D. Administration tools
E. Site autonomy
64 | P a g e
8. . _____ limits who gains access to the database while _____ limits what a user can access
within the database.
A. Access authentication, user definition
B. Access authentication, view definition
C. Data access, user monitoring
D. Access control, database security
9. _____ is the process of transforming data into an unreadable form to anyone who does
not know the key.
A. Data authentication
B. Data security
C. Data encryption
D. Database security management
65 | P a g e
Chapter 3: Transaction Processing Concepts
CHAPTER OUTCOME
3.2.1. Atomicity
Either all operations of the transaction are properly reflected in the database or
none are.
Means either all the operations of a transaction are executed or not a single operation is
executed.
For example consider below transaction to transfer Rs. 50 from account A to account B:
67 | P a g e
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
7. COMMIT
In above transaction if Rs. 50 is deducted from account A then it must be added to
account B
3.2.2. Consistency
Execution of a transaction in isolation preserves the consistency of the database. Means our
database must remain in consistent state after execution of any transaction. In above example
total of A and B must remain same before and after the execution of transaction.
3.2.3. Isolation
Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from other concurrently executed
transactions.
In above example once your transaction start from step one its result should not
be access by any other transaction until last step (step 7) is completed.
3.2.4. Durability
After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
Once your transaction completed up to step 7 its result must be stored permanently. It
should not be removed if system fails.
Transactions, Database items, Read and Write operations and DBMS buffers
A transaction is an executing program, forms a logical unit of database processing
Txn includes one or more database operations
Txn can be embedded in an application program (or it can be a command line query)
Txn boundary; begin Txn …..end Txn
A single application program can contain many Txns
If a Txn is retrieve and no updates, it is called a read only Txn, otherwise read-write
68 | P a g e
Data item can be a record or an entire block (granularity) or could be a single attribute
Each data item has a unique name
The basic database operation that a Txn can include: read_item(x), write_item(x); x is a
program variable
The basic access is one disk block from disk to memory
Read_item(x):
1. Find the address of the disk block that contains x
2. Copy the disk block to memory buffer (if not in memory)
3. Copy the item from the buffer to program variable x
Write_item(x):
1. Find the address of the disk block that contains x
2. Copy the disk block to memory buffer (if not in memory)
3. Copy the program variable x into buffer
4. Store the updated buffer to disk
COMMIT_TRANSACTION. This signals a successful end of the transaction so that
any
Changes (updates) executed by the transaction can be safely committed to the database and
Will not be undone.
ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so
that any changes or effects that the transaction may have applied to the database must be
undone.
Active
This is the initial state. The transaction stay s in this state while it is executing.
Partially Committed
This is the state after the final statement of the transaction is executed.
At this point failure is still possible since changes may have been only done in main memory,
a hardware failure could still occur.
The DBMS needs to write out enough information to disk so that, in case of a failure, the system could
re-create the updates performed by the transaction once the system is brought back up. After it
has written out all the necessary information, it is committed.
Failed
After the discovery that normal execution can no longer proceed.
Once a transaction cannot be completed, any changes that it made must be undone rolling it
back.
Aborted
The state after the transaction has been rolled back and the database has been restored to its state prior to
the start of the transaction.
Committed
70 | P a g e
The transaction enters in this state after successful completion of the transaction.
We cannot abort or rollback a committed transaction.
3.4.1. Schedule
A schedule is the chronological (sequential) order in which instructions are executed I system.
A schedule for a set of transaction must consist of all the instruction of those transactions and
must preserve the order in which the instructions appear in each individual transaction.
Example of schedule (Schedule 1)
71 | P a g e
3.4.1.2.
Interleaved schedule
Schedule that interleave the actions of different transactions.
Means schedule 2 will start before all instructions of schedule 1 are completed.
This type of schedules is called interleaved schedule.
A schedule that is equivalent (in its outcome) to a serial schedule has the serializability property.
Two schedules are equivalent schedule if the effect of executing the first schedule is identical (same)
to the effect of executing the second schedule.
72 | P a g e
We can also say that two schedule are equivalent schedule if the output of executing the first schedule
is identical (same) to the output of executing the second schedule.
A schedule that is equivalent (in its outcome) to a serial schedule has the serializability property.
Example of serializable schedule
73 | P a g e
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
Example
Schedule S can be transformed into Schedule S’ by swapping of non-conflicting series of instructio ns.
Therefore Schedule S is conflict serializable.
Instruction Ii of transaction T1 and Ij of transaction T2 conflict if both of these instruction access same data A
and one of these two instructions performs write operation on that data (A).
In above example the write(A) instruction of transaction T1 conflict with read(A) instruction of
transaction T2 because both the instructions access same data A. But write(A) instruction of
transaction T2 is not conflict with read(B) instruction of transaction T1 because both the instructio ns
access different data. Transaction T2 performs write operation in A and transaction T1 is reading B.
So in above example in schedule S two instructions read(A) and write(A) of transaction T2 and two
instructions read(B) and write(B) of transaction T1 are interchanged and we get schedule S’.
Therefore Schedule S is conflict serializable.
We are unable to swap instructions in the above schedule S’’ to obtain either the serial schedule < T3, T4 >,
or the serial schedule < T4, T3 >.
So above schedule S’’ is not conflict serializable.
74 | P a g e
If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj (if any),
then in schedule S’ also transaction Ti must read the value of Q that was produced by the same write(Q)
operation of transaction Tj .
The transaction Ti (if any) that performs the final write(Q) operation in schedule S then in schedule S’
also the final write(Q) operation must be performed by Ti .
A schedule S is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable but every view serializable is not conflict
serializable.
Below is a schedule which is view serializable but not conflict serializable
Above schedule is view serializable but not conflict serializable because all the transactions can use
same data item (Q) and all the operations are conflict with each other due to one operation is write
on data item (Q) and that’s why we cannot interchange any non-conflict operation of any transaction.
75 | P a g e
This page contains some SQL TCL. Commands that I think it might be useful. Each command’s description is
taken and modified from the SQLPlus help. They are provided as is and most likely are partially described. So,
if you want more detail or other commands, please use HELP in the SQLPlus directly.
Transaction Control:
There are following commands used to control transactions:
COMMIT: to save the changes.
ROLLBACK: to rollback the changes.
SAVEPOINT: creates points within groups of transactions in which to ROLLBACK
SET TRANSACTION: Places a name on a transaction.
Transactional control commands are only used with the DML commands INSERT, UPDATE and DELETE only.
They cannot be used while creating tables or dropping them because these operations are automatica lly
committed in the database.
The COMMIT Command:
The COMMIT command is the transactional command used to save changes invoked by a transaction
to the database.
The COMMIT command saves all transactions to the database since the last COMMIT or ROLLBACK
command(Source from Sql Tutorial Point).
The syntax for COMMIT command is as follows:
COMMIT;
76 | P a g e
Following is the example, which would delete records from the table having age = 25 and then
COMMIT the changes in the database.
As a result, two rows from the table would be deleted and SELECT statement would produce the following
result:
The ROLLBACK Command:
The ROLLBACK command is the transactional command used to undo transactions that have not already been
saved to the database.
The ROLLBACK command can only be used to undo transactions since the last COMMIT or
ROLLBACK command was issued.
The syntax for ROLLBACK command is as follows:
Example:
Consider the CUSTOMERS table having the following records:
77 | P a g e
Following is the example, which would delete records from the table having age = 25 and then ROLLBACK
the changes in the database.
As a result, delete operation would not impact the table and SELECT statement would produce the following
result:
This command serves only in the creation of a SAVEPOINT among transactional statements. The
ROLLBACK command is used to undo a group of transactions.
The syntax for rolling back to a SAVEPOINT is as follows:
78 | P a g e
Following is an example where you plan to delete the three different records from the CUSTOMERS table. You
want to create a SAVEPOINT before each delete, so that you can ROLLBACK to any SAVEPOINT at any time
to return the appropriate data to its original state:
Example:
Consider the CUSTOMERS table having the following records:
Now that the three deletions have taken place, say you have changed your mind and decided to ROLLBACK to
the SAVEPOINT that you identified as SP2. Because SP2 was created after the first deletion, the last two
deletions are undone:
79 | P a g e
Notice that only the first deletion took place since you rolled back to SP2:
Once a SAVEPOINT has been released, you can no longer use the ROLLBACK command to undo transactions
performed since the SAVEPOINT.
Assessment.3.1
1. Discuss what we meant transaction? Why it is interesting? And describe the possible database
operation
2. Differentiate Read operation vs Write operation by giving appropriate examples.
3. Compare and contrast transaction properties: Atomicity, Consistency, Isolation and Durability by giving an
appropriate example from the real world scenario.
4. Compare and contrast Transaction state:- Active state, Partially committed, Failed state, Aborted State ,
Committed State by giving an appropriate example from the real world scenario.
5. Discuss Concurrency Problems in DBMS by using real world example: Dirty Read Problem/uncommitted
read, Unrepeatable Read Problem and Lost Update Problem
6. Compare and contrast the Recoverable Schedules:- Cascading Schedule and Cascadeless Schedule use
example for each
80 | P a g e
1. What types of schedules occur if a transaction Tj read a data item previously written by a transaction Ti
then the commit operation of Ti must appear before the commit operation of Tj?
A. Recovery scheduling
B. Non recovery scheduling
C. Cascading schedule
D. Cascadeless scheduling
E. All of the above
2. State in which transaction stays while it is executing is termed as
A. Active
B. Partially committed
C. Initial
D. Both A and B
3. Serializability of schedules can be ensured through a mechanism called
A. Concurrency control policy
B. Evaluation control policy
C. Execution control policy Cascading control policy
4. A _________ consists of a sequence of query and/or update statements.
A. Transaction
B. Commit
C. Rollback
D. Flashback
5. Which of the following makes the transaction permanent in the database?
A. View
B. Commit
C. Rollback
D. Flashback
6. Which one of the following is a part of the ACID properties of database transactions?
A. Atomicity, consistency, isolation, database
B. Atomicity, consistency, isolation, durability
C. Atomicity, consistency, integrity, durability
D. Atomicity, consistency, integrity, database
7. Which concept more show that when the operation of one transaction of one user override by the operation
of another user and produce incorrect answer?
A. Lost of update problem
B. Uncommitted dependency
C. Dirty read problem
D. Incorrect summary problem
E. b and c
8. Consider the following transactions with data items P and Q initialized to zero:
1: read (P) ;
read (Q) ;
if P = 0 then Q : = Q + 1 ;
write (Q) ;
T2: read (Q) ;
81 | P a g e
read (P) ;
if Q = 0 then P : = P + 1 ;
write (P) ;
A. A serializable schedule
B. A schedule that is not conflict serializable
C. A conflict serializable schedule
D. A schedule for which a precedence graph cannot be drawn
E. All of the above
9. .Consider three data items D1, D2 and D3 and the following execution schedule of transactions T1, T2 and T3. In the
diagram, R(D) and W(D) denote the actions reading and writing the data item D respectively
82 | P a g e
Chapter 4: DBMS Concurrency Control
CHAPTER OUTCOME
83 | P a g e
Semi-optimistic - Block operations in some situations, if they may cause violation of
some rules, and do not block in other situations while delaying rules checking (if
needed) to transaction's end, as done with optimistic.
When multiple transactions execute concurrently in an uncontrolled or unrestricted manner, then it
might lead to several problems, such problems are called as concurrency problems, the concurrenc y
problems are-
Concurrency Problems in DBMS
85 | P a g e
Here,
T1 reads the value of A (= 10 say).
T2 updates the value to A (= 15 say) in the buffer.
T2 does blind write A = 25 (write without read) in the buffer.
T2 commits.
When T1 commits, it writes A = 25 in the database.
In this example,
T1 writes the over written value of X in the database.
Thus, update from T1 gets lost
86 | P a g e
The temporary update (or dirty read) problem: This occurs when one transaction updates a
database item and then the transaction fails for some reason. The updated item is accessed by
another transaction before it is changed back to its original value.
The incorrect summary problem: If one transaction is calculating an aggregate function on a
number of records while other transaction is updating some of these records, the aggregate
function may calculate some values before they are updated and others after they are updated.
Whenever a transaction is submitted to a DBMS for execution, the system must make sure that:
All the operations in the transaction are completed successfully and their effect is recorded
permanently in the database; or
The transaction has no effect whatever on the database or on the other transactions in the
case of that a transaction fails after executing some of operations but before executing all of
them.
87 | P a g e
This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
4.5. Locking
In order to execute transactions in an interleaved manner it is necessary to have some form
of concurrency control.
This enables a more efficient use of computer resources.
One method of avoiding problems is with the use of locks.
When a transaction requires a database object it must obtain a lock.
Locking is necessary in a concurrent environment to assure that one process does not retrieve
or update a record that is being updated by another process. Failure to use some controls
(locking), would result in inconsistent and corrupt data.
Locks enable a multi-user DBMS to maintain the integrity of transactions by isolating a
transaction from others executing concurrently.
Locks are particularly critical in write intensive and mixed workload (read/write)
environments, because they can prevent the inadvertent loss of data or Consistency problems
with reads. In addition to record locking, DBMS implements several other locking mechanisms to
ensure the integrity of other data structures that provide shared I/O, communication among
different processes in a cluster and automatic recovery in the event of a process or cluster
failure.
Aside from their integrity implications, locks can have a significant impact on performance. While
it may benefit a given application to lock a large amount of data (perhaps one or more tables) and
hold these locks for a long period of time, doing so inhibits concurrency and increases the likelihood
that other applications will have to wait for locked resources.
88 | P a g e
A transaction holding an S-lock may only issue a read request on the data item
89 | P a g e
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
Figure 4. 1 Two phase locking techniques
90 | P a g e
4.5.4. Validation Based Protocol
It assumes that multiple transactions can frequently complete without interfering with
each other.
Before committing, each transaction verifies that no other transaction has modified the data
it has read.
If check make known conflicting modifications, the committing transaction rolls back and
can be restarted.
The validation based protocols require that each transaction Ti executes in two or
three different phases in its lifetime.
Read phase.
During this phase, it reads the values of the various data items and stores them in variables
local to Ti.
It performs all write operations on temporary local variables, without updates of the actual
database.
Transaction Ti performs a validation test to determine whether it can copy to the database
the temporary to local variables that hold the results of write operations without causing a
violation of serializability.
Write phase.
If transaction Ti succeeds in validation (step 2), then the system applies the actual updates
to the database. Otherwise, the system rolls back Ti.
Start(Ti), the time when Ti started its execution.
Validation (Ti), the time when Ti finished its read phase and started its validation phase.
Finish (Ti), the time when Ti finished its write phase.
93 | P a g e
Chapter 5:- Database Recovery
Recovery Concepts
Recovery Concepts Based on Deferred Update
Recovery Concepts Based on Immediate Update
Shadow Paging
The ARIES Recovery Algorithm
Recovery in Multidatabase Systems
94 | P a g e
In case of media failure, a database administrator (DBA) must initiate a recovery
operation.
Recovering a backup involves two distinct operations: rolling the backup forward to a
more recent time by applying redo data and rolling back all changes made in uncommitted
transactions to their original state.
In general, recovery refers to the various operation s involved in restoring, rolling forward
and rolling back a backup.
Backup and recovery refers to the various strategies and operations involved in
Protecting the database against data loss and reconstructing the database.
95 | P a g e
All such files are maintained by DBMS itself. Normally these are sequential files.
Recovery has two factors Rollback (Undo) and Roll forward (Redo).
When transaction Ti starts, it registers itself by writing a <Ti start>log record
Before Ti executes write(X), a log record <Ti , X, V1, V2> is written, where V1 is the
value of X before the write, and V2 is the value to be written to X.
Log record notes that Ti has performed a write on data item Xj
Xj had value V1 before the write, and will have value V2 after the write
When Ti finishes it last statement, the log record < Ti commit> is written.
Two approaches are used in log based recovery
1. Deferred database modification
2. Immediate database modification
96 | P a g e
Transaction T starts by writing <T start> to the log.
Any update is recorded as <T, X, V>, where V indicates new value for data item X. Here, no
need to preserve old value of the changed data item. Also, V is not written to the X in database,
but it is deferred.
Transaction T commits by writing <T commit> to the log. Once this is entered in log,
actual updates are recorded to the database.
If a transaction T aborts, the transaction log record is ignored, and no any updates are recorded
to the database.
Example
Consider the following two transactions, T 0 and T1 given in figure, where T0 executes
before T1. Also consider that initial values for A, B and C are 500, 600 and 700
respectively.
The following figure shows the transaction log for above two transactions at three
different instances of time.
97 | P a g e
If failure occurs in case of
1. No any REDO actions are required.
2. As Transaction T0 has already committed, it must be redone.
3. As Transactions T0 and T1 have already committed, they must be redone.
98 | P a g e
The following figure shows the transaction log for above two transactions at three
different instances of time. Note that, here, transaction log contains original values also along
with new updated values for data items.
If failure occurs in case of -
Undo the transaction T0 as it has not committed, and restore A and B to 500 and 600
respectively.
Undo the transaction T1, restore C to 700; and, Redo the Transaction T0 set A and B to 400
and 700 respectively.
Redo the Transaction T0 and Transaction T0; and, set A and B to 400 and 700
respectively, while set C to 500.
99 | P a g e
Find out the nearest checkpoint.
If transaction has already committed before this checkpoint, ignore it.
If transaction is active at this point or after this point and has committed before failure, redo
that transaction.
If transaction is active at this point or after this point and has not committed, undo that
transaction.
Example
Consider the transactions given in following figure. Here, Tc indicates checkpoint, while Tf
indicates failure time.
Here, at failure time -
1. Ignore the transaction T1 as it has already been committed before checkpoint.
2. Redo transaction T2 and T3 as they are active at/after checkpoint, but have committed
before failure.
3. Undo transaction T4 as it is active after checkpoint and has not committee\d.
101 | P a g e
Figure 5. 1 Explain Shadow Paging Technique
Advantages
No overhead of maintaining transaction log.
Recovery is quite faster, as there is no any redo or undo operations required.
Disadvantages
Copying the entire page table is very expensive.
Data are scattered or fragmented.
After each transaction, free pages need to be collected by garbage collector. Difficult to
extend this technique to allow concurrent transactions
Review Questions
1. In which case the database update strategies transaction is summited to the physical database
only after the transaction is reach its commit point?
A. Immediate update
B. Deferred update
C. Summary problem update
102 | P a g e
D. All of the above
2. Transaction Ti read the item X and write item X lastly, but before the Ti is permanently
committed the transaction the transaction Ji is starting and read the values of Ti which is not
committed. Which concurrency problem of the above idea.
A. Lost of update problem
B. Uncommitted dependency
C. Dirty read problem
D. Incorrect summary
3. From the following which one is true about the locking technique?
A. More than one transaction using the item X at the same time
B. When one transaction using the variable X the other transaction must waiting up to locked is
acquire
C. When one transaction using the variable X the other transaction must waiting up to locked is
released
4. Which one of the following more described about the shared and exclusive locks of lock types?
A. If the transaction only need to write item X it use shared lock
B. If the transaction only need to read the item Y is use the exclusive lock
C. If the transaction only need to read and write the item Z it uses exclusive lock
D. If the transaction only need to read and write the item Z it uses shared lock
E. All of the above
5. From the following which one is false about the two phase of locking?
A. The lock and unlock would done in two different phases
B. The transaction lock first in one phase than released in the other phase
C. Locked point is the point of stop locking and allow to unlock
D. At the growing phase the lock and released activity can occur
E. None of the above
F. All of the above
6. Which one is more describe about multiple granularity of locking techniques?
A. It allow the data with the various size and the hierarchy of the data structure
B. During this time when locking the tables all the database would be locked
C. On this mechanism when locking the table all the ascendants are locked
103 | P a g e
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
7. Which one is more describe about multiple granularity of locking techniques?
A. It allow the data with the various size and the hierarchy of the data structure
B. During this time when locking the tables all the database would be locked
C. On this mechanism when locking the table all the ascendants are locked
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
Which of the following is not a recovery technique?
A. Deferred update
B. Immediate update
C. Two-phase commit
D. Recovery management
8. .... deals with soft errors, such as power failures.
A. System recovery
B. Media recovery
C. Database recovery
D. Failure recovery
9. Rollback of transactions is normally used to :
A. Recover from transaction failure
B. Update the transaction
C. Retrieve old records
D. Repeat a transaction
10. A transaction performs on the isolation level of the Read Uncommitted if :
A. A dirty read occurs
B. Non-repeatable read occurs
C. Phantom reads occurs
D. All of the above
104 | P a g e
Chapter 6: Distributed Database System
Outline
Introduction
Distributed Database Concepts
What Constitutes a DDB
Transparency
Availability and Reliability
Scalability and Partition Tolerance
Advantages of Distributed Databases
Data Fragmentation, Replication, and Allocation Techniques for Distributed Database
Design
Data Fragmentation and Sharding
Data Replication and Allocation
Types of Distributed Database Systems
Distributed Database Architectures
Parrallel versus Distributed Architecture
General Architecture of Pure Distributed Database
Federated Database Schema Architecture
An Overview of Three-Tier Client/Server Architecture
105 | P a g e
A distributed database is a database in which storage devices are not all attached to a common
processing unit such as the CPU. It may be stored in multiple computers, located in the same
physical location; or may be dispersed over a network of interconnected computers. A distributed
database system consists of loosely-coupled sites that share no physical components. In Centralized
systems, Data, Process and Interface components of an information system are central.
In order to work on the system end users uses terminals or terminal emulators. In Distributed
System, Data, Process, and Interface components of an information system are distributed to
multiple locations in a computer network.
Accordingly, the processing workload is distributed across the network. Distributed Systems are
required for Functional distribution, Inherent distribution in application domain, Economics, Better
performance, and increased Reliability.
106 | P a g e
6.2. Distributed Database Concepts
We can define a distributed database (DDB)as a collection of multiple logically interrelated
databases distributed over a computer network, and a distributed database management system
(DDBMS)as a software system that manages a distributed database while making the distributio n
transparent to the user.
Distributed databases are different from Internet Web files. Web pages are basically a very large
collection of files stored on different nodes in a network the Internet with interrelationships among
the files represented via hyperlinks. The common functio ns of database management, includ ing
uniform query processing and transaction processing, do not apply to this scenario yet.
Differences between DDB and Multiprocessor Systems
We need to distinguish distributed databases from multiprocessor systems that use shared storage
(primary memory or disk). For a database to be called distributed, the following minimum
conditions should be satisfied:
Connection of database nodes over a computer network. There is multiple computers, called
sites or nodes. These sites must be connected by an underlying communication network to
transmit data and commands among sites
Logical interrelation of the connected databases. It is essential that the information in the
databases be logically related.
Absence of homogeneity constraint among connected nodes. It is not necessary that all nodes
be identical in terms of data, hardware, and software.
The sites may all be located in physical proximity say, within the same building or a group of
adjacent buildings and connected via a local area network, or they may be geographically distributed
over large distances and connected via a long-haul or wide area network. Local area networks
typically use wireless hubs or cables, whereas long-haul networks use telephone lines or satellite s.
It is also possible to use a combination of networks.
Networks may have different topologies that define the direct communication paths among sites.
The type and topology of the network used may have a significant impact on the performance and
hence on the strategies for distributed query processing and distributed database design. For high-
level architectural issues, however, it does not matter what type of network is used; what matters is
that each site be able to communicate, directly or indirectly, with every other site. For the remainder
107 | P a g e
of this chapter, we assume that some type of communication network exists among sites, regardless
of any particular topology.
We will not address any network specific issues, although it is important to understand that for an
efficient operation of a distributed database system (DDBS), network design and performance
issues are critical and are an integral part of the overall solution. The details of the underlying
communication network are invisible to the end user.
Transparency
The concept of transparency extends the general idea of hiding implementation details from end
users. A highly transparent system offers a lot of flexibility to the end user/application developer
since it requires little or no awareness of underlying details on their part. In the case of a traditiona l
centralized database, transparency simply pertains to logical and physical data independence for
application developers.
However, in a DDB scenario, the data and software are distributed over multiple sites
connected by a computer network, so additional types of transparencies are introduced.
Consider the company database in Figure 3.5 that we have been discussing throughout the
book. The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented horizonta l ly
and stored with possible replication as shown in Figure below. The following types of
transparencies are possible:
Data organization transparency (also known as distribution or network transparency).
This refers to freedom for the user from the operational details of the network and the placement of
the data in the distributed system. It may be divided into location transparency and naming
transparency.
Location transparency refers to the fact that the command used to perform a task is
independent of the location of the data and the location of the node where the command was
issued. Naming transparency implies that once a name is associated with an object, the named
objects can be accessed unambiguously without additional specification as to where the data is
located.
Replication transparency. As we show in Figure below, copies of the same data objects may
be stored at multiple sites for better availability, performance, and reliability. Replicatio n
transparency makes the user unaware of the existence of these copies.
108 | P a g e
6.4.Fragmentation transparency.
Two types of fragmentation are possible.
Horizontal fragmentation distributes a relation (table) into sub relations
That is subsets of the tuples (rows) in the original relation. Vertical fragmentation distributes a
relation into subrelations where each subrelation is defined by a subset of the columns of the
original relation. A global query by the user must be transformed into several fragment queries.
Fragmentation transparency makes the user unaware of the existence of fragments.
Other transparencies include design transparency and execution transparency—referring to
freedom from knowing how the distributed database is designed and where a transaction executes.
Autonomy
Autonomy determines the extent to which individual nodes or DBs in a connected DDB can
operate independently. A high degree of autonomy is desirable for increased flexibility and
customized maintenance of an individual node. Autonomy can be applied to design,
communication, and execution. Design autonomy refers to independence of data model usage and
transaction management techniques among nodes. Communication autonomy determines the
extent to which each node can decide on sharing of information with other nodes. Executio n
autonomy refers to independence of users to act as they please.
109 | P a g e
Reliability and Availability
Reliability and availability are two of the most common potential advantages cited for distribute d
databases. Reliability is broadly defined as the probability that a system is running (not down) at
a certain time point, whereas availability is the probability that the system is continuous ly
available during a time interval. We can directly relate reliability and availability of the database
to the faults, errors, and failures associated with it. A failure can be described as a deviation of a
system’s behavior from that which is specified in order to ensure correct execution of operations.
Errors constitute that subset of system states that causes the failure .Fault is the cause of an error.
To construct a system that is reliable, we can adopt several approaches. One common approach
stresses fault tolerance; it recognizes that faults will occur, and designs mechanisms that can detect
and remove faults before they can result in a system failure. Another more stringent approach
attempts to ensure that the final system does not contain any faults. This is done through an
exhaustive design process followed by extensive quality control and testing.
A reliable DDBMS tolerates failures of underlying components and processes user requests so
long as database consistency is not violated. A DDBMS recovery manager has to deal with
failures arising from transactions, hardware, and communication networks.
Hardware failures can either be those that result in loss of main memory contents or loss of
secondary storage contents. Communication failures occur due to errors associated with messages
and line failures. Message errors can include their loss, corruption, or out-of-order arrival at
destination.
Advantages of Distributed Databases
Organizations resort to distributed database management for various reasons. Some important
advantages are listed below.
Improved ease and flexibility of application development. Developing and maintaining
applications at geographically distributed sites of an organization is facilitated owing to
transparency of data distribution and control.
Increased reliability and availability. This is achieved by the isolation of faults to their site of
origin without affecting the other databases connected to the network. When the data and
DDBMS software are distributed over several sites, one site may fail while other sites continue
to operate. Only the data and software that exist at the failed site cannot be accessed. This
improves both reliability and availability. Further improvement is achieved by judicio us ly
110 | P a g e
replicating data and software at more than one site. In a centralized system, failure at a single
site makes the whole system unavailable to all users. In a distributed database, some of the data
may be unreachable, but users may still be able to access other parts of the database. If the data
in the failed site had been replicated at another site prior to the failure, then the user will not be
affected at all.
Improved performance. A distributed DBMS fragments the database by keeping the data closer
to where it is needed most. Data localization reduces the contention for CPU and I/O services
and simultaneously reduces access delays involved in wide area networks. When a large
database is distributed over multiple sites, smaller databases exist at each site. As a result, local
queries and transactions accessing data at a single site have better performance because of the
smaller local databases.
In addition, each site has a smaller number of transactions executing than if all transactions are
submitted to a single centralized database. Moreover, inter query and intra query parallelis m
can be achieved by executing multiple queries at different sites, or by breaking up a query into
a number of subqueries that execute in parallel. This contributes to improved performance.
.Easier expansion. In a distributed environment, expansion of the system in terms of adding
more data, increasing database sizes, or adding more processors is much easier
111 | P a g e
Distributed database recovery. The ability to recover from individual site crashes and from
new types of failures, such as the failure of communication links.
Security. Distributed transactions must be executed with the proper management of the
security of the data and the authorization/access privileges of users.
Distributed directory (catalog) management. A directory contains information (metadata)
about data in the database. The directory may be global for the entire DDB, or local for each
site. The placement and distribution of the directory are design and policy issues.
These functions themselves increase the complexity of a DDBMS over a centralized DBMS. Before
we can realize the full potential advantages of distribution, we must find satisfactory solutions to
these design issues and problems. Including all this additional functionality is hard to accomplis h,
and finding optimal solutions is a step beyond that.
112 | P a g e
system is obtained through a site that is part of the DDBMS which means that no local autonomy
exists.
Along the autonomy axis we encounter two types of DDBMSs called federated database system
(Point C) and multidatabase system (Point D).In such systems, each server is an independent and
autonomous centralized DBMS that has its own local users, local transactions, and DBA, and hence
has a very high degree of local autonomy.
113 | P a g e
Shared memory (tightly coupled) architecture. Multiple processors share secondary (disk)
storage and also share primary memory.
Shared disk (loosely coupled) architecture. Multiple processors share secondary (disk)
storage but each has their own primary memory.
These architectures enable processors to communicate without the overhead of exchanging
messages over a network.
Database management systems developed using the above types of architectures are termed parallel
database management systems rather than DDBMSs, since they utilize parallel processor
technology.
115 | P a g e
Figure 6. 6 Parallel versus Distributed Architectures(Ramez Elmasri,2011)
116 | P a g e
All the problems related to query processing, transaction processing, and directory and metadata
management and recovery apply to FDBSs with additional considerations.
117 | P a g e
pages. The latter are employed when the interaction involves database access. When a Web
interface is used, this layer typically communicates with the application layer via the HTTP
protocol.
2. Application layer (business logic).This layer programs the application logic. For example,
queries can be formulated based on user input from the client, or query results can be formatted
and sent to the client for presentation. Additional application functionality can be handled at
this layer, such as security checks, identity verification, and other functions. The applicatio n
layer can interact with one or more databases or data sources as needed by connecting to the
database using ODBC, JDBC, SQL/CLI, or other database access techniques.
3. Database server. This layer handles query and update requests from the application layer,
processes the requests, and sends the results. Usually SQL is used to access the database if it is
relational or object-relational and stored database procedures may also be invoked. Query
results (and queries) may be formatted into XML when transmitted between the applicatio n
server and the database server.
Exactly how to divide the DBMS functionality between the client, application server, and database
server may vary. The common approach is to include the functionality of a centralized DBMS at
the database server level. A number of relational DBMS products have taken this approach, where
an SQL server is provided. The application server must then formulate the appropriate SQL queries
and connect to the database server when needed. The client provides the processing for user
interface interactions.
Since SQL is a relational standard, various SQL servers, possibly provided by different vendors,
can accept SQL commands through standards such as ODBC, JDBC, and SQL/CLI.
In this architecture, the application server may also refer to a data dictionary that include s
information on the distribution of data among the various SQL servers, as well as modules for
decomposing a global query into a number of local queries that can be executed at the various sites.
Interaction between an application server and database server might proceed as follows during the
processing of an SQL query
1. The application server formulates a user query based on input from the client layer and
decomposes it into a number of independent site queries. Each site query is sent to the
appropriate database server site.
118 | P a g e
2. Each database server processes the local query and sends the results to the application server
site. Increasingly, XML is being touted as the standard for data exchange, so the database server
may format the query result into XML before sending it to the application server.
3. The application server combines the results of the subqueries to produce the result of the
originally required query, formats it into HTML or some other form accepted by the client, and
sends it to the client site for display.
The application server is responsible for generating a distributed execution plan for a multisite query
or transaction and for supervising distributed execution by sending commands to servers. These
commands include local queries and transactions to be executed, as well as commands to transmit
data to other clients or servers.
Another function controlled by the application server (or coordinator) is that of ensuring consistenc y
of replicated copies of a data item by employing distributed (or global) concurrency control
techniques. The application server must also ensure the atomicity of global transactions by
performing global recovery when certain sites fail.
If the DDBMS has the capability to hide the details of data distribution from the application server,
then it enables the application server to execute global queries and transactions as though the
database were centralized, without having to specify the sites at which the data referenced in the
query or transaction resides.
This property is called distribution transparency. Some DDBMSs do not provide distributio n
transparency, instead requiring that applications are aware of the details of data distribution.
119 | P a g e
6.11. Data Fragmentation
In a DDB, decisions must be made regarding which site should be used to store which portions of
the database. For now, we will assume that there is no replication; that is, each relation or portion
of a relation is stored at one site only. We discuss replication and its effects later in this section. We
also use the terminology of relational databases, but similar concepts apply to other data models.
We assume that we are starting with a relational database schema and must decide on how to
distribute the relations over the various sites. To illustrate our discussion, we use the relationa l
database schema in table 6.1. as follow
120 | P a g e
Horizontal Fragmentation. A horizontal fragment of a relation is a subset of the tuples in that
relation. The tuples that belong to the horizontal fragment are specified by a condition on one or
more attributes of the relation. Often, only a single attribute is involved. For example, we may
define three horizontal fragments on the EMPLOYEE relation in Figure 3.6 with the following
conditions: (Dno= 5), (Dno= 4), and (Dno= 1) each fragment contains the EMPLOYEE tuples
working for a particular department. Similarly, we may define three horizontal fragments for the
PROJECT relation, with the conditions (Dnum= 5), (Dnum= 4), and (Dnum= 1) each fragment
contains the PROJECT tuples controlled by a particular department. Horizontal fragmentatio n
divides a relation horizontally by grouping rows to create subsets of tuples, where each subset has
a certain logical meaning. These fragments can then be assigned to different sites in the distributed
system. Derived horizontal fragmentation applies the partitioning of a primary relation
(DEPARTMENT in our example) to other secondary relations (EMPLOYEE and PROJECT in our
example), which are related to the primary via a foreign key.
This way, related data between the primary and the secondary relations gets fragmented in the same
way.
Vertical Fragmentation. Each site may not need all the attributes of a relation, which would
indicate the need for a different type of fragmentation. Vertical fragmentation divides a relation
“vertically” by columns. A vertical fragment of a relation keeps only certain attributes of the
relation. For example, we may want to fragment the EMPLOYEE relation into two vertical
fragments. The first fragment includes personal information—Name, Bdate, Address, and Sex and
the second include work-related information Ssn, Salary, Super_ssn, and Dno. This vertical
fragmentation is not quite proper, because if the two fragments are stored separately, we cannot put
the original employee tuples back together, since there is no common attribute between the two
fragments. It is necessary to include the primary key or some candidate key attribute in every
vertical fragment so that the full relation can be reconstructed from the fragments. Hence, we must
add the Ssn attribute to the personal information fragment.
Notice that each horizontal fragment on a relation R can be specified in the relational algebra by a
σCi (R) operation. A set of horizontal fragments whose conditions C 1, C2, ...,Cn include all the
tuples in R—that is, every tuple in R satisfies (C1 ORC2 OR...ORCn ) is called a complete
horizontal fragmentation of R. In many cases a complete horizontal fragmentation is also disjoint;
that is, no tuple in R satisfies (Ci ANDCj ) for any i≠j. Our two earlier examples of horizonta l
121 | P a g e
fragmentation for the EMPLOYEE and PROJECT relations were both complete and disjoint. To
reconstruct the relation R from a complete horizontal fragmentation, we need to apply the UNION
operation to the fragments. A vertical fragment on a relation R can be specified by a πLi (R)
operation in the relational algebra. A set of vertical fragments whose projection lists L1 ,L2, ...,Ln
include all the attributes in R but share only the primary key attribute of R is called a complete
vertical fragmentation of R. In this case the projection lists satisfy the following two conditions:
■ L1∪L2∪...∪Ln=ATTRS(R).
■ Li ∩Lj=PK(R) for any i≠j,where ATTRS(R) is the set of attributes of R and PK(R) is the primary
key ofR.
To reconstruct the relation R from a complete vertical fragmentation, we apply the OUTER UNION
operation to the vertical fragments (assuming no horizontal fragmentation is used). Notice that we
could also apply a FULL OUTER JOIN operation and get the same result for a complete vertical
fragmentation, even when some horizontal fragmentation may also have been applied. The two
vertical fragments of the EMPLOYEE relation with projection lists L1 = {Ssn, Name, Bdate,
Address, Sex} and L2= {Ssn, Salary, Super_ssn, Dno} constitute a complete vertical fragmentatio n
of EMPLOYEE.
Two horizontal fragments that are neither complete nor disjoint are those defined on the
EMPLOYEE relation by the conditions (Salary> 50000) and (Dno= 4); they may not include all
EMPLOYEE tuples, and they may include common tuples. Two vertical fragments that are not
complete are those defined by the attribute lists L1 = {Name, Address} and L2 = {Ssn, Name,
Salary}; these lists violate both conditions of a complete vertical fragmentation.
Data Replication and Allocation
Replication is useful in improving the availability of data. The most extreme case is replication of
the whole database at every site in the distributed system, thus creating a fully replicated distributed
database.
This can improve availability remarkably because the system can continue to operate as long as at
least one site is up. It also improves performance of retrieval for global queries because the results
of such queries can be obtained locally from any one site; hence, a retrieval query can be processed
at the local site where it is submitted, if that site includes a server module.
The disadvantage of full replication is that it can slow down update operations drastically, since a
single logical update must be performed on every copy of the database to keep the copies consistent.
122 | P a g e
This is especially true if many copies of the database exist. Full replication makes the concurrenc y
control and recovery techniques more expensive than they would be if there was no replication.
The other extreme from full replication involves having no replication that is; each fragment is
stored at exactly one site. In this case, all fragments must be disjoint, except for the repetition of
primary keys among vertical (or mixed) fragments. This is also called non redundant allocatio n
between these two extremes, we have a wide spectrum of partial replication of the data that is, some
fragments of the database may be replicated whereas others may not.
The number of copies of each fragment can range from one up to the total number of sites in the
distributed system. A special case of partial replication is occurring heavily in applications where
mobile workers such as sales forces, financial planners, and claims adjustors carry partially
replicated databases with them on laptops and PDAs and synchronize them periodically with the
server database.
A description of the replication of fragments is sometimes called a replication schema.
Each fragment or each copy of a fragment must be assigned to a particular site in the distribute d
system. This process is called data distribution (or data allocation). The choice of sites and the
degree of replication depend on the performance and availability goals of the system and on the
types and frequencies of transactions submitted at each site. For example, if high availability is
required, transactions can be submitted at any site, and most transactions are retrieval only, a fully
replicated database is a good choice.
However, if certain transactions that access particular parts of the database are mostly submitted at
a particular site, the corresponding set of fragments can be allocated at that site only. Data that is
accessed at multiple sites can be replicated at those sites. If many updates are performed, it may be
useful to limit replication. Finding an optimal or even a good solution to distributed data allocatio n
is a complex optimization problem.
123 | P a g e
Table 6.2: Data Replication and Allocation( Ramez Elmasri,2011)
124 | P a g e
Example of Fragmentation, Allocation, and Replication
Suppose that the company has three computer sites one for each current department. Sites 2 and 3
are for departments 5 and 4, respectively. At each of these sites, we expect frequent access to the
EMPLOYEE and PROJECT information for the employees who work in that department and the
projects controlled by that department.Further, we assume that these sites mainly access
theName,Ssn,Salary, and Super_ssn attributes of EMPLOYEE. Site 1 is used by company
headquarters and accesses all employee and project information regularly, in addition to keeping
track of DEPENDENT information for insurance purposes
According to these requirements, the whole database in Figure can be stored at site 1. To determine
the fragments to be replicated at sites 2 and 3, first we can horizontally fragment DEPARTMENTb y
its key Dnumber. Then we apply derived fragmentation to the EMPLOYEE, PROJECT, and
DEPT_LOCATIONS relations based on their foreign keys for department number—called
Dno,Dnum, and Dnumber, respectively, in Figure We can vertically fragment the resulting
EMPLOYEE fragments to include only the attributes {Name,Ssn,Salary,Super_ssn,Dno}.Figur e
shows the mixed fragments EMPD_5andEMPD_4, which include the EMPLOYEE tuples
satisfying the conditions Dno= 5 and Dno= 4, respectively. The horizontal fragments of PROJECT,
DEPARTMENT, and DEPT_LOCATIONS are similarly fragmented by department number. All
these fragments—stored at sites 2 and 3—are replicated because they are also stored at headquarters
site 1.
We must now fragment the WORKS_ON relation and decide which fragments of WORKS_ON to
store at sites 2 and 3. We are confronted with the problem that no attribute of WORKS_ONdirectly
indicates the department to which each tuple belongs. In fact, each tuple in WORKS_ONrelates an
employee to a project P.
We could fragment WORKS_ON based on the department D in which works or based on the
department Dthat controls P. Fragmentation becomes easy if we have a constraint stating that
D=Dfor all WORKS_ON tuples—that is, if employees can work only on projects controlled by the
department they work for. However, there is no such constraint in our database in Figure 3.6. For
example, the WORKS_ONtuple <333445555, 10, 10.0> relates an employee who works for
125 | P a g e
department 5 with a project controlled by department 4. In this case, we could fragment
WORKS_ON based on the department in which the employee works (which is expressed by the
condition C) and then fragment further based on the department that controls the projects that
employee is working on.
In Figure, the union of fragments G1, G2, and G3 gives all WORKS_ONtuples for employees who
work for department 5. Similarly, the union of fragments G4,G5, and G6 gives all WORKS_ON
tuples for employees who work for department 4. On the other hand, the union of fragments G1,G4
, and G7 gives all WORKS_ON tuples for projects controlled by department 5. The condition for
each of the fragments G1 through G9 is shown in Figure below The relations that represent M:N
relationships, such as WORKS_ON, often have several possible logical fragmentations. In our
distribution in Figure below we choose to include all fragments that can be joined to
126 | P a g e
Table 6.2, Example of Fragmentation, Allocation, and Replication( Ramez Elmasri,2011)
127 | P a g e
either an EMPLOYEE tuple or a PROJECT tuple at sites 2 and 3. Hence, we place the union of
fragments G1, G2,G3,G4, and G7at site 2 and the union of fragments G4,G5,G6,G2, and G8 at
site3. Notice that fragments G2 andG4 are replicated at both sites. This allocation strategy permits
the join between the local EMPLOYEE or PROJECT fragments at site 2 or site 3 and the local
WORKS_ON fragment to be performed completely locally. This clearly demonstrates how
complex the problem of database fragmentation and allocation is for large databases. The Selected
Bibliography at the end of this chapter discusses some of the work done in this area.
128 | P a g e
preferred unit for measuring cost. The total cost is a weighted combination of costs such as CPU
cost, I/O costs, and communication costs. Since DDBs are connected by a network, often the
communication costs over the network are the most significant. This is especially true when the
sites are connected through a wide area network (WAN).
4. Local Query Optimization. This stage is common to all sites in the DDB. The techniques are
similar to those used in centralized systems.
The first three stages discussed above are performed at a central control site, while the last stage is
performed locally.
The result of this query will include 10,000 records, assuming that every employee is related to a
department. Suppose that each record in the query result is 40 bytes long.
129 | P a g e
Table 6.3: Data Transfer Costs of Distributed Query Processing(Ramez Elmasri,2011)
The query is submitted at a distinct site 3, which is called the result site because the query result is
needed there. Neither the EMPLOYEE nor the DEPARTMENT relations reside at site 3. There are
three simple strategies for executing this distributed query:
1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, and perform
the join at site 3. In this case, a total of 1,000,000 + 3,500 = 1,003,500 bytes must be transferred.
2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send the result to site
3. The size of the query result is 40 *10,000 = 400,000 bytes, so 400,000 + 1,000,000 =
1,400,000 bytes must be transferred.
3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and send the result to
site 3. In this case, 400,000 + 3,500 = 403,500 bytes must be transferred.
If minimizing the amount of data transfer is our optimization criterion, we should choose strategy
3. Now consider another query Q: For each department, retrieve the department name and the name
of the department manager .This can be stated as follows in the relational algebra:
Again, suppose that the query is submitted at site 3. The same three strategies for executing query
Q apply to Q, except that the result of Qincludes only 100 records, assuming that each department
has a manager:
1. .Transfer both the EMPLOYEE and the DEPARTMENT relations to the result site, and perform
the join at site 3. In this case, a total of 1,000,000 + 3,500 = 1,003,500 bytes must be transferred.
130 | P a g e
2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send the result to site
3. The size of the query result is 40 *100 = 4,000 bytes, so 4,000 + 1,000,000 = 1,004,000 bytes
must be transferred.
3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and send the result to
site 3. In this case, 4,000 + 3,500 = 7,500 bytes must be transferred.
Again, we would choose strategy 3 this time by an overwhelming margin over strategies 1 and 2.
The preceding three strategies are the most obvious ones for the case where the result site (site 3) is
different from all the sites that contain files involved in the query (sites 1 and 2). However, suppose
that the result site is site 2; then we have two simple strategies:
1. Transfer the EMPLOYEE relation to site 2, execute the query, and present the result to the user
at site 2. Here, the same number of bytes 1,000,000 must be transferred for both Q and Q’.
2. Transfer the DEPARTMENT relation to site 1, execute the query at site 1, and send the result
back to site 2. In this case 400,000 + 3,500 = 403,500 bytes must be transferred for Q and 4,000
+ 3,500 = 7,500 bytes for Q.
Review questions
1. Which one of the following more describe about the distribute database system?
A. A collection of multiple database which have physical relationship and exist in different sites
B. A collection of multiple database which have found in different sites, but data are not have any
relation
C. The distributed database is manage by the DBMS software in different sites
D. While making decision distributed database is more transparent to the users
E. All of the above
F. None of the above
2. Which are the technologies are more used in the distribute database system?
A. user technology and database technology
B. Programming technology and database technology
C. Database technology and telecommunication technology
D. Programming technology and mobile technology
E. Programming technology and telecommunication technology
F. All of the above
3. The most condition satisfied by the distribute database are:-
131 | P a g e
A. The connection of the database nodes are connect over the computer programming language
B. The information of the database in different nodes are non-common-sense consistent to each
other
C. It is must that all sites have the same data, hardware, and software in any means
D. All of the above
E. None of the above
3. Which of the following are not the advantages of distributed database system?
A. It is improved the ease and flexibility of the application
B. It increase the data reliability and availability
C. It make as all distributed database users uses one centralized database
D. None of the above
E. All of the above
4. Distributed database typed as homogenous and heterogeneous based on:-
A. The physical location in which the distributed database is located
B. The logical relationship among the distributed database
C. The software which are need to implement the distributed database
D. All of the above
E. None of the above
5. From the layers involve in the distributed query processing the data fragmentation schema occur
on layer.
A. In query decomposition layer
B. In data localization layer
C. In global optimization layer
D. On distributed execution layer
6. From the layers involves on distribute query processing which one is take place on local site
A. In query decomposition layer
B. In data localization layer
C. In global optimization layer
D. On distributed execution layer
7. Which one of the following is the process of assigning the copy of the fragmentation for
particular site
132 | P a g e
A. Replication
B. Fragmentation
C. Duplication
D. Allocation
E. Data distribution
F. D and E
8. What is the main disadvantages of replication of the distribution of database:-
A. It is slow down the access of data from long distances
B. It is slow down the update of data on different sites
C. It increase the waiting time of the user while they access the data
D. All of the above
133 | P a g e
Chapter 7: Spatial /multimedia/mobile databases
The end of this course the students will be able to:-
Spatial data model spatial queries- multimedia data
Sources mobile databases-data processing
7.1. Introduction
Spatial data is associated with geographic locations such as cities, towns etc. A spatial database is
optimized to store and query data representing objects. These are the objects which are defined in a
geometric space.
134 | P a g e
There are still many challenges to multimedia databases, some of which are :
Modeling –Working in this area can improve database versus information retrieval techniques
thus, documents constitute a specialized area and deserve special consideration
Design – The conceptual, logical and physical design of multimedia databases has not yet been
addressed fully as performance and tuning issues at each level are far more complex as they
consist of a variety of formats like JPEG, GIF, PNG, MPEG which is not easy to convert from
one form to another.
Storage – Storage of multimedia database on any standard disk presents the problem of
representation, compression, mapping to device hierarchies, archiving and buffering during
input-output operation. In DBMS, a”BLOB”(Binary Large Object) facility allows un typed
bitmaps to be stored and retrieved.
Performance – For an application involving video playback or audio-video synchronization,
physical limitations dominate. The use of parallel processing may alleviate some problems but
such techniques are not yet fully developed. Apart from this multimedia database consume a
lot of processing time as well as bandwidth.
Queries and retrieval –For multimedia data like images, video, audio accessing data through
query opens up many issues like efficient query formulation, query execution and optimization
which need to be worked upon.
Documents and record management: Industries and businesses that keep detailed records and
variety of documents. Example: Insurance claim record.
Education and training: Computer-aided learning materials can be designed using multimed ia
sources which are nowadays very popular sources of learning. Example: Digital libraries.
Marketing, advertising, retailing, entertainment and travel. Example: a virtual tour of cities.
135 | P a g e
Real-time control and monitoring: Coupled with active database technology, multimed ia
presentation of information can be very effective means for monitoring and controlling complex
tasks Example: Manufacturing operation control.
Spatial data includes geographic data, such as Maps, Computer-Aided design such as Integrated circuit
design or building designs. Applications of spatial data were initially stored as files in a file system. But as
the complexity of and volume of data plus the number of users that have grown, Adhoc approaches to storing
and retrieving data from a file system
Example
A road map is a visualization of geographic information. A road map is a 2-dimensional object which
contains points, lines, and polygons that can represent cities, roads, and political boundaries such as
states or provinces.
In general, spatial data can be of two types −
Vector data: This data is represented as discrete points, lines and polygons
136 | P a g e
Table 7. 1 Spatial data
The spatial data in the form of points, lines, polygons etc. is used by many different databases as
shown above.
7.1.4. TEMPORAL DATABASES
A temporal database is a database with built-in-support for handling data involving time.
Typically, databases model only one state – the current state of the real world and don’t
store information about past states.
When state of the real world changes, the database gets updated and information about old
state gets lost. However, it is also important to store and retrieve information about current
and past states.
Examples:
Patient database must store information about the medical history of patient.
Judicial records.
Various sensory information.
So we define a temporal database – “Database that stores the states of real world across time”.
Valid Time.
137 | P a g e
Transaction Time.
Bi-temporal Data.
It is also true that we can store both Valid Time and Transaction Time in the databases. Such
a
relation involving both Transaction Time and Valid Time is known as Bi-Temporal
Relation.
STORY
Story of Mr. X….
Born on 3rd April 1971 at Pulwama.
Father of Mr. X registers D.O.B on 4th April , 1971 at Pulwama.
Mr. X completes his graduation on August,1990.
138 | P a g e
For Job purposes Mr. X goes to Srinagar on August, 1991, but forgets to register his
new address officially.
It was on December the 25, 1991 he registers the new address officially.
Unfortunately, Mr. X was accidently hit by a speedy car on April 1 2001.
The coroner reports the date of death on the very same day.
A valid time is a time for which a fact is true in the real world.
This time period may be in the past, or span the current time.
Since the official recording the birth doesn’t know if X will move to some other place or
139 | P a g e
–When Mr. X dies, the database looks like:
Transaction Time records the time period during which a database entry is accepted as correct.
Valid time doesn’t record any event that wasn’t made public even though at later
stages, for some auditing purposes, transaction time will note the record if found true.
Suppose Mr. X moved to Dubai from 1 June 1997 to 1 June 2000. But to avoid
increased taxations, he didn’t report it to authorities. On Feb. 2001 it was in fact discovered
that Mr. X lived in Dubai for so many years.
Transaction Time will allow capturing this changing information in the database.
Transaction Time takes into account each entry record when it was entered and when
it was superseded.
Date: four digits for the year (1--9999), two digits for the month (1--12), and two digits for the
date (1--31).
Time: two digits for the hour, two digits for the minute, and two digits for the second, plus
optional fractional digits.
Timestamp: the fields of date and time, with six fractional digits for the second’s field.
140 | P a g e
7.1.8. MULTIMEDIA DATABASES
Multimedia data are – Images, Audio and Video – They are the most popular form and
increasingly preferred data these days.
One approach of storing multimedia data is in File Systems.
We know the size of this type of media is large, in fact very large, they were generally stored
Outside the database in file systems. General database features – Transactional Updates and
querying facilities are seriously dented when Multimedia object size is very large
Second approach
Each media file has some descriptive attributes File/Data Statistics that include, when a
particular object is created, who created it and to which category it belongs.
This approach tells that we will use a database for storing the descriptive attributes and keeping
track of files in which the multimedia objects are stored.
Drawbacks
What if that objects is missing or corrupted and database still points the location - multimedia
Data existed.
The better approach is to store the multimedia data in database itself.
Issues that need to be addressed –
Databases must support Large Objects
Given the size of multimedia data objects are in GBs, many database objects don‟t support
objects larger than a few GBs.
So larger objects must be split into smaller pieces and stored in databases.
Alternately, the multimedia object may be stored in a file system but database may contain a
pointer to the object – pointer being essentially a file name.
SQL/MED (– Management of External Data) is a standard that allows external data such as
MPEG-1 stores a minute of 30-frame per second video and audio in approximately 12.5
MBs; compared to JPEG that stores same length of data at 75 MBs.
This huge difference is due to the fact the successive frames in JPEG are nearly same whereas
MPEG exploits commonalities among sequence of frames.
MPEG-1 experiences Loss of data – Quality of video isn‟t good. It is comparable to VHS
videotape.
MPEG-4 provides techniques for further compression of video with variable bandwidth to support
delivery of video data over networks.
Most important types of Continuous media data are Video and Audio data.
Continuous Media Systems are characterized by their Real Time Information – Deliver y
requirements.
By Continuous Media Systems we mean data must be delivered smoothly i.e. No gaps
in the Audio or Video.
Data must be delivered at a rate that doesn’t cause overflow of system buffers.
Synchronization among distinct data streams must be maintained – Lip Synchronizatio n
of Audio and Video.
Usually data are fetched in periodic cycles. Each cycle may consist of some time period – say
„n‟ seconds.
142 | P a g e
Data will be fetched from database and stored in Memory Buffers.
This stored data will be sent to consumer’s display unit.
Finally the display unit displays the content.
Time Period
If Time Period/Periodic Cycle is small – More disk arm movements are required i.e.. Wastage
of resources.
If Time Period/Periodic Cycle is large – Large memory buffer requirement plus large
initial delay.
Admission Control: When a new request arrives, admission control comes into play, i.e.
System checks if request can be satisfied with the available resources if so it is admitted
otherwise it is rejected.
Pictorial Data:
Two pictures or images that are slightly different as represented in the database may be
considered the same by a user. When a new trademark is to be registered, the system may need
to first to identify all similar trademarks that were registered previously.
Audio Data:
143 | P a g e
Speech-recognition interfaces have been developed that allow the user to give a command or
identify a data item by speaking. The input from user must be then tested for similarity to the
commands stored in the system.
Handwritten Data:
Signatures are being created to authenticate a particular customer for Bank Account
Validations.
Mobile Computing
7.2.MOBILITY
Location-dependent Queries are a different class of queries, in which location of the user
(computer) is a parameter of the query.
Example Make mytrip and Goibibo Hotels, that involve providing data on Hotels, Roadside
Services to the users travelling a particular destination.
Processing of queries about services ahead on the current route has certain constraints –
like Direction of Motion, Speed of a particular user.
In mobility the Energy (Battery Lifetime) is a scarce resource. This has a great influence on
System Design Architecture, Impact on protocols used to communicate with mobile devices.
144 | P a g e
The Mobile Hosts communicate with the wired network via computers known as Mobile
Support Stations.
Cell: is a particular geographical area that a particular Mobile Support Station cover and
support.
Handoffs are important to support mobility services among users.
It is also possible for mobile hosts to communicate directly without the intervention of mobile
support station. This is possible only for short range communications like Bluetooth. Range of
10 m and a speed of 721 Kbps.
Bluetooth, Wireless LANs and 2.5G and 3G cellular networks make it possible for a wide
Variety of devices to communicate at low cost. The accounting, monitoring and management data
To improve on the energy efficiency side of mobile devices it is always a preferable method to
use low power Flash Memories and power down a component if that is not in use for a while.
WAP – Wireless Application Protocol is a standard protocol for Wireless Internet access that
take into measure constraints of mobile and Wireless Web browsing.
Spatial data support in databases is important for efficiently storing, indexing and querying of
Computer-aided-design (CAD) data: includes spatial information about how objects -such
as buildings, cars or aircraft – are constructed. Other examples that include computer-aided-
design databases are integrated-circuit and electronic-device layouts.
Geographic Data: Such as road maps, land-usage maps, topographic elevation maps,
political maps showing boundaries, land ownership maps, and so on.
Geographic information systems are special purpose databases tailored for storing geographic
data.
145 | P a g e
7.3.1. GEOGRAPHIC DATA AND APPLICATIONS
Geographic data are spatial in nature, and are of the form of Maps and Satellite images.
Maps in particular not only provide location information like boundaries, rivers and roads but
also a detailed information like Locations, Elevations, Soil Type, Land Usage and Annual
Rainfall.
Applications
Raster Data
Raster data consists of bit maps or pixel maps, in two or more dimensions. Best example
Satellite image of an area that not only includes the actual image but information like latitude
and longitude of its corners.
Raster data is often represented as tiles, each covering a fixed size area. A larger area can be
displayed by displaying all the tiles overlapping that area.
To allow the display of a data at different zoom levels, a separate set of tiles is created for each
zoom level.. Once the zoom level is set by the user, tiles at that zoom level, which overlap the
area being displayed are retrieved and displayed.
Raster data can be 3-dimensional.
Vector Data
Vector data are constructed from basic geometric objects such as Points, Line Segments,
Polylines, Triangles and other polygons in 2-D and Cylinders, Spheres, Cuboids are 3-D data.
Map data are often represented in vector format. Roads being represented as Polylines.
Geographic features such as large lakes, states and countries are represented as complex
polygons., while as rivers may be represented as complex curves.
146 | P a g e
Topographical information, i.e information about the elevatio n of each point on surface can be
represented in raster form.
Nearness Queries
A nearness query request objects that lie near a specified location. A query to find all ATMs that
lie within a given distance is an example of a nearness query.
The nearest neighbor query requests the object that is nearest to a specified point.
Region Queries: Region queries deal with spatial regions. Such a query can ask for objects that
Example – A query to find all retail shops within the geographic boundaries of a given town.
The end
Review Questions
147 | P a g e
3. A structure of linked elements through which the user can navigate, interactive multimed ia
becomes ______.
A. Hypermedia
B. Hypertext
C. Inter media
D. Digital media
4. Moving Picture Experts Group (MPEG-2), was designed for high-quality DVD with a data rate
of
A. 3 to 6 Mbps
B. 4 to 6 Mbps
C. 5 to 6 Mbps
D. 6 to 6 Mbps
148 | P a g e
References
1. C. J. Date, A. Kannan and S. Swamynathan, An Introduction to Database Systems, Pearson Education,
Eighth Edition, 2009.
2. Abraham Silberschatz, Henry F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill
Education (Asia), Fifth Edition, 2006.
3. Shio Kumar Singh, Database Systems Concepts, Designs and Application, Pearson Education, Second
Edition, 2011.
4. Peter Rob and Carlos Coronel, Database Systems Design, Implementation and Management, Thomson
Learning-Course Technology, Seventh Edition, 2007.
5. Patrick O’Neil and Elizabeth O’Neil, Database Principles, Programming and Performance, Harcourt
Asia Pte. Ltd., First Edition, 2001.
6. Atul Kahate, Introduction to Database Management Systems, Pearson
7. Silberschatz−Korth−Sudarshan, “Database System Concepts”, The McGraw−Hill Companies.
8. Ramez Elmasri, Shamkant B. Navathe, “Fundamental OF Database Systems”, Pearson.
9. Ramez Elmasri, Shamkant B. Navathe(2011), Database Systems(Book)
149 | P a g e