My Notes
My Notes
My Notes
Basic Definitions
➢ Database: A collection of related data.
➢ Data: Known facts that can be recorded and have an implicit meaning.
➢ Mini-world: Some part of the real world about which data is stored in a database. For example, student
grades and transcripts at a university.
➢ Database Management System (DBMS): A software package/ system to facilitate the creation and
maintenance of a computerized database.
➢ Database System: The DBMS software together with the data itself. Sometimes, the applications are also
included.
4) Integrity problems
• Integrity constraints (e.g. account balance > 0) become “buried” in program code rather
than being stated explicitly
• Hard to add new constraints or change existing ones
5) Atomicity of updates
• Failures may leave database in an inconsistent state with partial updates carried out
• Example: Transfer of funds from one account to another should either complete or not
happen at all
6) Concurrent access by multiple users
● Insulation between programs and data: Called program-data independence. Allows changing data
storage structures and operations without having to change the DBMS access programs.
● Support of multiple views of the data: Each user may see a different view of the database, which
describes only the data of interest to that user.
● Sharing of data and multiuser transaction processing : allowing a set of concurrent users to retrieve and
to update the database. Concurrency control within the DBMS guarantees that each transaction is
correctly executed or completely aborted. OLTP (Online Transaction Processing) is a major part of
database applications
● Data Abstraction: A data model is used to hide storage details and present the users with a conceptual
view of the database.
Database Users
Users may be divided into those who actually use and control the content (called “Actors on the Scene”)
and those who enable the database to be developed and the DBMS software to be designed and
implemented (called “Workers Behind the Scene”).
Actors on the scene
– Database administrators: responsible for authorizing access to the database, for co-ordinating and
monitoring its use, acquiring software, and hardware resources, controlling its use and monitoring
efficiency of operations.
– Database Designers: responsible to define the content, the structure, the constraints, and functions
or transactions against the database. They must communicate with the end-users and understand
their needs.
– End-users: they use the data for queries, reports and some of them actually update the database
content.
Categories of End-users
Levels of Abstraction
• Logical level: describes data stored in database, and the relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : integer;
end;
• View level: application programs hide details of data types. Views can also hide information (such
as an employee’s salary) for security purposes.
An architecture for a database system
– If the database system is not able to handle the complexity of data because of modeling limitations
– If the database users need special operations not supported by the DBMS.
● Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users
perceive data. (Also called entity-based or object-based data models.)
● Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in
the computer.
● Implementation (representational) data models: Provide concepts that fall between the above two,
balancing user views with some computer storage details.
Hierarchical Model
ADVANTAGES:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in
manufacturing, personnel organization in companies
• Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN
PARENT etc.
DISADVANTAGES:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"
Network Model
• ADVANTAGES:
• Network Model is able to model complex relationships and represents semantics of add/delete on
the relationships.
• Can handle most situations for modeling using record types and relationship types.
• Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT
within set, GET etc. Programmers can do optimal navigation through the database.
•
• DISADVANTAGES:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through a set of records.
Little scope for automated "query optimization”
DBMS Languages
Data Definition Language (DDL)
• DDL -Specification notation for defining the database schema
Example: create table account (
account-number char(10),
balance integer) (1)
• DDL -Used by the DBA and database designers to specify the conceptual schema of a database. In many
DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate
storage definition language (SDL) and view definition language (VDL) are used to define internal and
external schemas. (2)
Basic Client/Server Architectures: The client/server architecture was developed to deal with computing
environments in which a large number of rcs, workstations, file servers, printers, database servers, Web servers,
and other equipment are connected via a network.
m
● The client machines provide the user with the appropriate interfaces to utilize
these servers, as well as with local processing power to run local applications.
● This concept can be carried over to software, with specialized software-such as a DBMS or a CAl)
(computer-aided design) package-being stored on specific server machines and
being made accessible to multiple clients.
● The user interface programs and application programs can run on the client side. When DBMS access is
required, the program establishes a connection to the DBMS (which is on the server side); once the
connection is created, the client program can communicate with the DBMS.
● A standard called Open Database Connectivity (ODBC) provides an application programming interface
(API), which allows client-side programs to call the DBMS, as long as both client and server machines
have the necessary software installed
● The architectures described here are called two-tier architectures because the software components are
distributed over two systems: client and server.
● The advantages of this architecture are its simplicity and seamless compatibility with existing systems.
● The intermediate server accepts requests from the client, processes the request and sends database
commands to the database server, and then acts as a conduit for passing (partially) processed data from the
database server to the clients, where it may be processed further and filtered to be presented to users in
GUI format
ER Model
• The ER data mode was developed to facilitate database design by allowing specification of an
enterprise schema that represents the overall logical structure of a database.
• The ER model is very useful in mapping the meanings and interactions of real-world enterprises
onto a conceptual schema. Because of this usefulness, many database-design tools draw on
concepts from the ER model.
• The ER data model employs three basic concepts:
I. entity sets,
II. relationship sets,
III.attributes.
Entity
• The ER model also has an associated diagrammatic representation, the ER diagram, which can express
the overall logical structure of a database graphically.
• An entity is an object that exists and is distinguishable from other objects.
• Example: specific person, company, event, plant
• An entity set is a set of entities of the same type that share the same properties.
• Example: set of all persons, companies, trees, holidays
• An entity is represented by a set of attributes; i.e., descriptive properties possessed by all members of
an entity set.
• Example:
• instructor = (ID, name, street, city, salary )
course= (course_id, title, credits)
• A subset of the attributes form a primary key of the entity set; i.e., uniquely identifiying each member
of the set.
Relation
• Express the number of entities to which another entity can be associated via a relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality must be one of the following types:
o One to one
o One to many
o Many to one
o Many to many
MODULE -2
Relational Algebra & Calculus
Preliminaries
A query language is a language in which user requests to retrieve some information from the database. The
query languages are considered as higher level languages than programming languages. Query languages are of
two types,
Procedural Language
Non-Procedural Language
1. In procedural language, the user has to describe the specific procedure to retrieve the information from
the database.
Example: The Relational Algebra is a procedural language.
2. In non-procedural language, the user retrieves the information from the database without describing the
specific procedure to retrieve it.
Example: The Tuple Relational Calculus and the Domain Relational Calculus are non-procedural
languages.
Relational Algebra
The relational algebra is a procedural query language. It consists of a set of operations that take one or two
relations (tables) as input and produce a new relation, on the request of the user to retrieve the specific
information, as the output.
The relational algebra contains the following operations,
The Selection is a relational algebra operation that uses a condition to select rows from a relation. A new
relation (output) is created from another existing relation by selecting only rows requested by the user that satisfy
a specified condition. The lower greek letter ‘sigma ’ is used to denote selection operation.
General Syntax: Selection condition ( relation_name )
Example: Find the customer details who are living in Hyderabad city from customer relation.
city = ‘Hyderabad’ ( customer )
The selection operation uses the column names in specifying the selection condition. Selection conditions
are same as the conditions used in the ‘if ’ statement of any programming languages, selection condition uses the
relational operators < > <= >= != . It is possible to combine several conditions into a large condition using the
logical connectives ‘and’ represented by ‘ ‘ and ‘or’ represented by ‘ ’.
Example:
Find the customer details who are living in Hyderabad city and whose customer_id is greater than 1000 in
Customer relation.
city = ‘Hyderabad’ customer_id > 1000 ( customer )
The projection is a relational algebra operation that creates a new relation by deleting columns from an
existing relation i.e., a new relation (output) is created from another existing relation by selecting only those
columns requested by the user from projection and is denoted by letter pi ( .
The Selection operation eliminates unwanted rows whereas the projection operation eliminates unwanted
columns. The projection operation extracts specified columns from a table.
Example: Find the customer names (not all customer details) who are living in Hyderabad city from customer
relation.
In the above example, the selection operation is performed first. Next, the projection of the resulting
relation on the customer_name column is carried out. Thus, instead of all customer details of customers living
in Hyderabad city, we can display only the customer names of customers living in Hyderabad city.
The above example is also known as relational algebra expression because we are combining two or
more relational algebra operations (ie., selection and projection) into one at the same time.
Example: Find the customer names (not all customer details) from customer relation.
customer_name ( customer )
The above stated query lists all customer names in the customer relation and this is not called as
relational algebra expression because it is performing only one relational algebra operation.
The union denoted by ‘ ’ It is a relational algebra operation that creates a union or combination of two
relations. The result of this operation, denoted by d b is a relation that includes all tuples that all either in d or
in b or in both d and b, where duplicate tuples are eliminated.
Example: Find the customer_id of all customers in the bank who have either an account or a loan or both.
customer_id ( depositor ) customer_id ( borrower )
To solve the above query, first find the customers with an account in the bank. That is customer_id
( depositor ). Then, we have to find all customers with a loan in the bank, customer_id ( borrower ). Now, to
answer the above query, we need the union of these two sets, that is, all customer names that appear in either or
both of the two relations by customer_id ( depositor ) customer_id ( borrower )
If some customers A, B and C are both depositors as well as borrowers, then in the resulting relation, their
customer ids will occur only once because duplicate values are eliminated.Therefore, for a union operation d
b to be valid, we require that two conditions to be satisfied, i) The relations depositor and borrower must have
same number of attributes / columns.
th th
ii) The domains of i attribute of depositor relation and the i attribute of borrower relation must be the
same, for all i.
• The Intersection ‘ ’ Operation:
The intersection operation denoted by ‘ ’ It is a relational algebra operation that finds tuples that are in
both relations. The result of this operation, denoted by d b, is a relation that includes all tuples common in both
depositor and borrower relations.
Example: Find the customer_id of all customers in the bank who have both an account and a loan.
customer_id ( depositor ) customer_id ( borrower )
The resulting relation of this query, lists all common customer ids of customers who have both an account
and a loan. Therefore, for an intersection operation d b to be valid, it requires that two conditions to be
satisfied as was the case of union operation stated above.
iii) The Set-Difference ‘ ’ Operation:
The set-difference operation denoted by’ ’ It is a relational algebra operation that finds tuples that are
in one relation but are not in another.
Example:
customer_id ( depositor ) customer_id ( borrower )
The resulting relation for this query, lists the customer ids of all customers who have an account but not a
loan. Therefore a difference operation d b to be valid, it requires that two conditions to be satisfied as was case
Example: Find the customer_id of all customers in the bank who have loan > 10,000.
customer_id ( borrower.loan_no= loan.loan_no (( borrower.loan_no= ( borrower X
loan ) ) )
That is, get customer_id from borrower relation and loan_amount from loan relation. First, find Cartesian
product of borrower X loan, so that the new relation contains both customer_id, loan_amoount with each
combination. Now, select the amount, by bloan_ampunt > 10000.
So, if any customer have taken the loan, then borrower.loan_no = loan.loan_no should be selected as their
entries of loan_no matches in both relation.
4) The Renaming “ ” Operation:
The Rename operation is denoted by rho ’ ’. It is a relational algebra operation which is used to give the
new names to the relation algebra expression. Thus, we can apply the rename operation to a relation ‘borrower’ to
get the same relation under a new name. Given a relation ‘customer’, then the expression returns the same relation
‘customer’ under a new name ‘x’.
x ( customer )
After performed this operation, Now there are two relations, one with customer name and second with
‘x’ name. The ‘rename’ operation is useful when we want to compare the values among same column attribute in
a relation.
If we want to find the largest account balance in the bank, Then we have to compare the values among
same column (balance) with each other in a same relation account, which is not possible.
So, we rename the relation with a new name‘d’. Now, we have two relations of account, one with account
name and second with ‘d’ name. Now we can compare the balance attribute values with each other in separate
relations.
The join operation, denoted by join ‘ ’. It is a relational algebra operation, which is used to combine
(join) two relations like Cartesian-product but finally removes duplicate attributes and makes the operations
(selection, projection, ..) very simple. In simple words, we can say that join connects relations on columns
containing comparable information.
There are three types of joins,
i) Natural Join
ii) Outer Join
iii) Theta Join (or) Conditional Join
i) Natural Join:
The natural join is a binary operation that allows us to combine two different relations into one relation
and makes the same column in two different relations into only one-column in the resulting relation. Suppose
we have relations with following schemas, which contain data on full-time employees.
employee ( emp_name, street, city ) and
employee_works relation
Example: Find the employee names and city who have salary details.
emp_name, salary, city ( employee employee_works )
The join operation selects all employees with salary details, from where we can easily project the
employee names, cities and salaries. Natural Join operation results in some loss of information.
employee_works relation
employee relation
c) Full Outer-join:
any tuples from the reight relations, as well as adds tuples from the right relation that did not match any tuple
from the left relation and adding them to the result of natural join as follows,The relations are,
employee_works relation
employee relation
that combines two relations into one relation with a selection condition ( ).
The theta join operation is expressed as employee salary < 19000 employee_works and the resulting
is as follows,
employee salary > 20000 employee_works
Revo
lv
Smith Valley
er
Seavi
e
Williams Seattle
w
The result of this expression is the relation,
The division operation, denoted by “ ”, is a relational algebra operation that creates a new relation by
selecting the rows in one relation that does not match rows in another relation.
Let, Relation A is (x1, x2, …., xn, y1, y2, …, ym) and
Where, y1, y2, …, ym tuples are common to the both relations A and B with same domain
compulsory.
Then, A B = new relation with x1, x2, …., xn tuples. Relation A and B represents the dividend and
devisor respectively. A tuple ‘t’ is in a b, if and only if two conditions are to be satisfied,
t is in A-B (r)
for every tuple tb in B, there is a tuple ta in A satisfying the following two things,
1. ta[B] = tb[B]
2. ta[A-B] = t
Relational Calculus
Relational calculus is an alternative to relational algebra. In contrast to the algebra, which is procedural,
the relational calculus is non-procedural or declarative.
It allows user to describe the set of answers without showing procedure about how they should be
computed. Relational calculus has a big influence on the design of commercial query languages such as SQL and
Variables in TRC takes tuples (rows) as values and TRC had strong influence on SQL.
Variables in DRC takes fields (attributes) as values and DRC had strong influence on QBE.
i) Tuple Relational Calculus (TRC):
The tuple relational calculus, is a non-procedural query language because it gives the
desired information without showing procedure about how they should be computed.
A query in Tuple Relational Calculus (TRC) is expressed as { T | p(T) }
Where, T - tuple variable,
P(T) - ‘p’ is a condition or formula that is true for ‘t’.
In addition to that we use,
T[A] - to denote the value of tuple t on attribute A and
T r - to denote that tuple t is in relation r.
Examples:
1) Find all loan details in loan relation.
{t|t loan }
This query gives all loan details such as loan_no, loan_date, loan_amt for all loan table in a bank.
2) Find all loan details for loan amount over 100000 in loan relation.
{t|t loan t [loan_amt] > 100000 }
This query gives all loan details such as loan_no, loan_date, loan_amt for all loan over 100000 in a loan
table in a bank.
Page 49
{ < x1, x2, …., xn > | p( < x1, x2, …., xn > ) }
Where, each xi is either a domain variable or a constant and p(< x1, x2, …., xn >) denotes a DRC
formula.
A DRC formula is defined in a manner that is very similar to the definition of a TRC formula. The main
difference is that the variables are domain variables.
Examples:
1) Find all loan details in loan relation.
{ < N, D, A > | < N, D, A > loan }
This query gives all loan details such as loan_no, loan_date, loan_amt for all loan table in a bank. Each
column is represented with an initials such as N- loan_no, D – loan_date, A – loan_amt. The condition < N, D, A >
loan ensures that the domain variables N, D, A are restricted to the column domain.
For any given I, the set of answers for Q contains only values that are in dom(Q, I).
For each sub expression of the form R(p(R)) in Q, if a tuple r makes the formula true, then r contains
Page 50
3) For each sub expression of the form R(p(R)) in Q, if a tuple r contains a constant that is not in
dom(Q, I), then r must make the formula true.
The expressive power of relational algebra is often used as a metric how powerful a relational database
query language is. If a query language can express all the queries that we can express in relational algebra, it is
said to be relationally complete. A practical query language is expected to be relationally complete. In addition,
commercial query languages typically support features that allow us to express some queries that cannot be
expressed in relational algebra.
When the domain relational calculus is restricted to safe expression, it is equivalent in expressive power
to the tuple relational calculus restricted to safe expressions. All three of the following are equivalent,
The relational algebra
Page 51
UNIT - 3
Page 52
MODULE-3
THE DATABASE LANGUAGE SQL
Introduction to SQL:
What is SQL?
• SQL is Structured Query Language, which is a computer language for storing, manipulating
and retrieving data stored in relational database.
• SQL is the standard language for Relation Database System. All relational database
management systems like MySQL, MS Access, and Oracle, Sybase, Informix, postgres and
SQL Server use SQL as standard database language.
Why SQL?
• Allows users to access data in relational database management systems.
• Allows users to describe the data.
• Allows users to define the data in database and manipulate that data.
• Allows embedding within other languages using SQL modules, libraries & pre-compilers.
• Allows users to create and drop databases and tables.
• Allows users to create view, stored procedure, functions in a database.
• Allows users to set permissions on tables, procedures and views
History:
1970 -- Dr. E. F. "Ted" of IBM is known as the father of relational databases. He described a
relational model for databases.
1974 -- Structured Query Language appeared.
1978 -- IBM worked to develop Codd's ideas and released a product named System/R.
1986 -- IBM developed the first prototype of relational database and standardized by ANSI. The
first relational database was released by Relational Software and its later becoming Oracle.
SQL Process:
• When you are executing an SQL command for any RDBMS, the system determines the best
way to carry out your request and SQL engine figures out how to interpret the task.
• There are various components included in the process. These components are Query
Dispatcher, Optimization Engines, Classic Query Engine and SQL Query Engine, etc.
Classic query engine handles all non-SQL queries, but SQL query engine won't handle
logical files.
SQL Process:
SQL Commands:
The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT,
UPDATE, DELETE and DROP. These commands can be classified into groups based on their
nature. They are:
D D L
Commands
D M L
Commands
D C L
Commands
D R L / D Q L
Commands
T C L
Commands
Command Descriptio
CREATE Creates a new table, a view of an table, or other object in
database
Command Descriptio
INSERT Creates a record n
UPDATE Modifies records
DELETE Deletes records
Command Descriptio
GRANT Gives a privilege tonuser
REVOKE Takes back privileges granted from user
Command Descriptio
SELECT n from one or more
Retrieves certain records
tables
Command Descriptio
commit Save work done n
Save point Identify a point in a transaction to which we
can later roll back.
Roll backs Restore database to original since the last
Commit
What is Query?
• A query is a question.
• A query is formulated for a relation/table to retrieve some useful information from the table.
• Different query languages are used to frame queries.
Form of Basic SQL Query
• This SELECT command is used to retrieve the data from the database.
• For retrieving the data every query must have SELECT clause, which specifies what columns to be
selected.
• And FROM clause, which specifies the table’s names. The WHERE clause, specifies the selection
condition.
• SELECT: The SELECT list is list of column names of tables named in the FROM list.
Column names can be prefixed by a range variable.
• FROM: The FROM list in the FROM clause is a list of table names. A Table name can be
followed by a range variable. A range variable is particularly useful when the same table name
appears more than once in the from-list.
• WHERE: The qualification in the WHERE clause is a Boolean combination (i.e., an expression
using the logical connectives AND, OR, and NOT) of conditions of the form expression op
expression, where op is one of the comparison operators {<, <=, =, <>, >=,>}.
• An expression is a column name, a constant, or an (arithmetic or string) expression.
• DISTINCT: The DISTINCT keyword is used to display the unique tuple or eliminated the
duplicate tuple.
• This DISTINCT keyword is Optional.
DDL Commands:
• The following are the DDL commands. They are:
Create
Alter
Truncate
Drop
CREATE:
• The SQL CREATE TABLE statement is used to create a new table.
• Creating a basic table involves naming the table and defining its columns and each column's data
type.
Syntax:
• Basic syntax of CREATE TABLE statement is as follows:
CREATE TABLE table name (column1 datatype (size), column2 datatype (size), column3
datatype (size) ... columnN datatype (size), PRIMARY KEY (one or more columns));
Example:
SQL> create table customers (id number (10) not null, name varchar2 (20) not null, age number
(5) not null, address char (25), salary decimal (8, 2), primary key (id));
ALTER:
• SQL ALTER TABLE command is used to add, delete or modify columns in an existing table
Syntax:
• The basic syntax of ALTER TABLE to add a new column in an existing table is as follows:
ALTER TABLE table_name ADD column_name datatype;
EX: ALTER TABLE CUSTOMERS ADD phno number (12);
ii) The basic syntax of ALTER TABLE to DROP COLUMN in an existing table is as follows:
ALTER TABLE table_name DROP COLUMN column_name;
EX: ALTER TABLE CUSTOMERS DROP column phno;
• The basic syntax of ALTER TABLE to change the DATA TYPE of a column in a table is as
follows:
ALTER TABLE table_name MODIFY COLUMN column_name datatype;
Ex: ALTER TABLE customer MODIFY COLUMN phno number(12);
• The basic syntax of ALTER TABLE to add a NOT NULL constraint to a column in a table is as
follows:
TRUNCATE:
• SQL TRUNCATE TABLE command is used to delete complete data from an existing table.
Syntax:
The basic syntax of TRUNCATE TABLE is as follows:
TRUNCATE TABLE table name;
EX:
TRUNCATE TABLE student;
SELECT * FROM student;
DROP:
SQL DROP TABLE statement is used to remove a table definition and all data, indexes, triggers,
constraints, and permission specifications for that table.
Syntax:
Basic syntax of DROP TABLE statement is as follows:
DROP TABLE table_name;
EX: DROP TABLE student;
DML Commands:
The following are the DML commands. They are:
• Insert
• Update
• Delete
INSERT:
SQL INSERT INTO Statement is used to add new rows of data to a table in the database.
Syntax1:
INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)] VALUES
(value1, value2, value3,...valueN);
• Here, column1, column2...columnN are the names of the columns in the table into which you
want to insert data.
EX:
insert into customers (id,name,age,address,salary) values (1, 'ramesh', 32, 'ahmedabad', 2000);
insert into customers (id,name,age,address,salary) values (2, 'khilan', 25, 'delhi', 1500.00 );
2 rows inserted.
Syntax2:
Ex:
insert into customers values (1, 'ramesh', 32, 'ahmedabad', 2000.00 );
UPDATE:
• SQL UPDATE Query is used to modify the existing records in a table.
• We can use WHERE clause with UPDATE query to update selected rows, otherwise all the rows
would be affected.
Syntax:
• The basic syntax of UPDATE query with WHERE clause is as follows:
UPDATE table_name SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
EX:
• UPDATE CUSTOMERS SET ADDRESS = 'Pune' WHERE ID = 6;
• UPDATE CUSTOMERS SET ADDRESS = 'Pune', SALARY = 1000.00;
DELETE:
SQL DELETE Query is used to delete the existing records from a table.
You can use WHERE clause with DELETE query to delete selected rows, otherwise all the
records would be deleted.
Syntax:
The basic syntax of DELETE query with WHERE clause is as follows:
DELETE FROM table_name WHERE [condition];
Ex: DELETE FROM CUSTOMERS WHERE ID = 6;
If you want to DELETE all the records from CUSTOMERS table, you do not need to use WHERE
clause and DELETE query would be as follows:
DELETE FROM CUSTOMERS;
DRL/DQL Command:
The select command is comes under DRL/DQL.
SELECT:
SELECT Statement is used to fetch the data from a database table which returns data in the form
of result table. These result tables are called result-sets.
Syntax1:
The Following Syntax is used to retrieve specific attributes from the table is as follows:
UNION:
SQL UNION clause/operator is used to combine the results of two or more SELECT statements
without returning any duplicate rows.
To use UNION, each SELECT must have the same number of columns selected, the same number
of column expressions, the same data type, and have them in the same order, but they do not have
to be the same length.
Syntax:
The basic syntax of UNION is as follows:
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
UNION
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
EX:
SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS LEFT JOIN ORDERS ON
CUSTOMERS.ID = ORDERS.CUSTOMER_ID
UNION
SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
The same rules that apply to UNION apply to the UNION ALL operator.
Syntax:
• The basic syntax of UNION ALL is as follows:
INTERSECT:
• The SQL INTERSECT clause/operator is used to combine two SELECT statements, but returns
rows only from the first SELECT statement that are identical to a row in the second SELECT
statement.
• This means INTERSECT returns only common rows returned by the two SELECT statements.
• Just as with the UNION operator, the same rules apply when using the INTERSECT operator.
MySQL does not support INTERSECT operator
Syntax:
The basic syntax of INTERSECT is as follows:
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
INTERSECT
EXCEPT:
• The SQL EXCEPT clause/operator is used to combine two SELECT statements and returns rows
from the first SELECT statement that are not returned by the second SELECT statement.
• This means EXCEPT returns only rows, which are not available in second SELECT statement.
• Just as with the UNION operator, the same rules apply when using the EXCEPT operator.
Syntax:
The basic syntax of EXCEPT is as follows:
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
EXCEPT
SQL Operators
What is an Operator in SQL?
An operator is a reserved word or a character used primarily in an SQL statement's WHERE
clause to perform operation(s), such as comparisons and arithmetic operations.
Operators are used to specify conditions in an SQL statement and to serve as conjunctions for
multiple conditions in a statement.
1. Arithmetic operators
2. Comparison operators
3. Logical operators
SQLArithmetic Operators:
Operator Description Example
+ Addition - Adds values on either side of the operator a + b will give 30
- Subtraction - Subtracts right hand operand from left hand a - b will give -10
operand
• The following are example illustrate the relational operators usage on tables:
Ex:
• SELECT * FROM CUSTOMERS WHERE SALARY > 5000;
• SELECT * FROM CUSTOMERS WHERE SALARY = 2000;
• SELECT * FROM CUSTOMERS WHERE SALARY != 2000;
• SELECT * FROM CUSTOMERS WHERE SALARY >= 6500;
Operat Descr
or The AND operator allows theiptio
existence of multiple conditions
in an
AND
SQL
OR statement's WHERE
The OR operator clause
is used to combine multiple conditions in an SQL
statement's
WHERE clause.
NOT The NOT operator reverses the meaning of the logical operator with which it
is
used. Eg: NOT EXISTS, NOT BETWEEN, NOT IN, etc. This
is a negatation operator
• SQL AND and OR operators are used to combine multiple conditions to narrow data in an
SQL statement. These two operators are called conjunctive operators.
• These operators provide a means to make multiple comparisons with different operators in
the same SQL statement.
AND Operator:
• The AND operator allows the existence of multiple conditions in an SQL statement's
WHERE clause.
Syntax:
• The basic syntax of AND operator with WHERE clause is as follows:
SELECT column1, column2, columnN FROM table_name WHERE [condition1]
AND [condition2]...AND [conditionN];
Ex:
SELECT * FROM CUSTOMERS WHERE AGE >= 25 AND SALARY >= 6500;
OR Operator:
• The OR operator is used to combine multiple conditions in an SQL statement's WHERE clause.
Syntax:
• The basic syntax of OR operator with WHERE clause is as follows:
SELECT column1, column2, columnN FROM table_name WHERE [condition1] OR
[condition2]...OR [conditionN];
Ex:
NOT Operator:
• The NOT operator reverses the meaning of the logical operator with which it is used. Eg: NOT
Syntax:
SELECT column1, column2, … column FROM table_name WHERENOT [condition];
EX:
SELECT * FROM CUSTOMERS WHERE AGE IS NOT NULL;
Operator
UNIQUE
BETWEEN
EXISTS
IN LIKE
IS NULL
The
EXIST
S
operato
r is
used to
search
for the
presenc
e of a
row in a
specifie
d table
that
meets
certain
criteria.
The IN
operator is
used to
compare a
value to a
list of
literal
values that
have been
specified.
LIKE Operator:
SQL LIKE clause is used to compare a value to similar values using wildcard operators.
There are two wildcards used in conjunction with the LIKE operator:
1. The percent sign (%)
2. The underscore (_)
The percent sign represents zero, one, or multiple
characters. The underscore represents a single
number or character.
The symbols can be used in combinations.
Syntax:
The basic syntax of % and _ is as follows:
Statement
WHERE SALARY LIKE 's%'
WHERE SALARY LIKE '%sad%'
Finds any values that start with 2 and are at least 3 characters in length
Finds any values that end with r
Finds any values that have a 2 in the second position and end with a 3
Finds any values in a five-digit number that start with 2 and end with
BETWEEN Operator
Syntax:
SELECT column_name(s)
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
EX: SELECT * FROM Products WHERE Price BETWEEN 10 AND 20;
NOT BETWEEN Operator:
SELECT * FROM Products WHERE Price NOT BETWEEN 10 AND 20;
IN Operator:
The IN operator allows you to specify multiple values in a WHERE clause.
Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...);
Ex: SELECT * FROM Customers WHERE salary IN (5000, 10000);
SQL Joins:
• SQL Joins clause is used to combine records from two or more tables in a database.
• A JOIN is a means for combining fields from two tables by using values common to each.
• Consider the following two tables, CUSTOMERS and ORDERS tables are as follows:
CUSTOMERS TABLE
| ID | NAME | AGE | ADDRESS |
SALARY | | 1 | Ramesh | 32 |
Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi |
1500.00 | | 3 | kaushik | 23 |
Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai |
6500.00 | | 5 | Hardik | 27 |
Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
ORDERS TABLE
|OID | DATE | CUSTOMER_ID |
AMOUNT | | 102 | 2009-10-08 00:00:00 |
3 | 3000 | | 100 | 2009-10-08 00:00:00 | 3 |
1500 | | 101 | 2009-11-20 00:00:00 | 2 |
1560 | | 103 | 2008-05-20 00:00:00 | 4 |
2060 |
Ex:
SELECT ID, NAME, AGE, AMOUNT FROM CUSTOMERS,
ORDERS WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following
result: | ID | NAME | AGE |
AMOUNT |
| 3 | kaushik | 23 |
3000 | | 3 | kaushik
| 23 | 1500 | | 2 |
Khilan | 25 | 1560
NOTE:
• Join is performed in the WHERE clause. Several operators can be used to join tables, such as
=, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used to join tables.
However, the most common operator is the equal symbol.
SQL Join Types:
• There are different types of joins available in SQL: They are:
• INNER JOIN
• OUTER JOIN
• SELF JOIN
• CARTESIAN JOIN
INNER JOIN:
The most frequently used and important of the joins is the INNER JOIN. They are also
referred to as an EQUIJOIN.
The INNER JOIN creates a new result table by combining column values of two tables
• This means that a left join returns all the values from the left table, plus matched values from
the right table or NULL in case of no matching join predicate.
Syntax:
• The basic syntax of LEFT JOIN is as follows:
SELECT table1.column1, table2.column2... FROM table1 LEFT JOIN
table2 ON table1.common_filed = table2.common_field;
RIGHT JOIN:
• The SQL RIGHT JOIN returns all rows from the right table, even if there are no matches in
the left table.
• This means that a right join returns all the values from the right table, plus matched values
from the left table or NULL in case of no matching join predicate.
Syntax:
• The basic syntax of RIGHT JOIN is as follows:
SELECT table1.column1, table2.column2... FROM table1 RIGHT JOIN
table2 ON table1.common_filed = table2.common_field;
Ex: SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
RIGHT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
FULL JOIN:
• The SQL FULL JOIN combines the results of both left and right outer joins.
• The joined table will contain all records from both tables, and fill in NULLs for missing
matches on either side.
Syntax:
SELF JOIN:
• The SQL SELF JOIN is used to join a table to it as if the table were two tables,
temporarily renaming at least one table in the SQL statement.
Syntax:
• The basic syntax of SELF JOIN is as follows:
SELECT a.column_name, b.column_name...FROM table1 a, table1 b
WHERE a.common_filed = b.common_field;
Ex:
SELECT a.ID, b.NAME, a.SALARY FROM CUSTOMERS a,
CUSTOMERS b WHERE a.SALARY < b.SALARY;
CARTESIAN JOIN:
• The CARTESIAN JOIN or CROSS JOIN returns the cartesian product of the sets of records
from the two or more joined tables.
• Thus, it equates to an inner join where the join-condition always evaluates to True or where
the join-condition is absent from the statement.
Syntax:
• The basic syntax of CROSS JOIN is as follows:
SELECT table1.column1, table2.column2... FROM table1, table2 [,
table3]; Ex: SELECT ID, NAME, AMOUNT, DATE FROM
CUSTOMERS, ORDERS;
VIEWS IN SQL:
• A view is nothing more than a SQL statement that is stored in the database with an
associated name.
• A view is actually a composition of a table in the form of a predefined SQL query.
• A view can contain all rows of a table or select rows from a table.
• A view can be created from one or many tables which depends on the written SQL
query to create a view.
• Views, which are kind of virtual tables, allow users to do the following:
• Structure data in a way that users or classes of users find natural or intuitive.
• Restrict access to the data such that a user can see and (sometimes) modify exactly
what they need and no more.
• Summarize data from various tables which can be used to generate reports.
Advantages of views:
• Views provide data security
• Different users can view same data from different perspective in different ways at
the same time.
• Views cal also be used to include extra/additional information
Creating Views:
• Database views are created using the CREATE VIEW statement. Views can be created
from a single table, multiple tables, or another view.
• To create a view, a user must have the appropriate system privilege according to the
specific implementation.
• The basic CREATE VIEW syntax is as follows:
CREATE VIEW view_name AS SELECT column1, column2..... FROM
table_name WHERE [condition];
Ex: CREATE VIEW CUSTOMERS_VIEW AS SELECT name, age FROM
CUSTOMERS;
You can query CUSTOMERS_VIEW in similar way as you query an actual
table. Following is the example:
Updating a View:
A view can be updated under certain conditions: TUTORIALS POINT Simply
Easy Learning
• The SELECT clause may not contain the keyword DISTINCT.
• The SELECT clause may not contain summary functions.
• The SELECT clause may not contain set functions.
• The SELECT clause may not contain set operators.
• The SELECT clause may not contain an ORDER BY clause.
• The FROM clause may not contain multiple tables.
• The WHERE clause may not contain sub queries.
• The query may not contain GROUP BY or HAVING.
NOTE:
So if a view satisfies all the above mentioned rules then you can update a
view. Following is an example to update the age of Ramesh:
Dropping Views:
• Obviously, where you have a view, you need a way to drop the view if it is no longer needed.
• The syntax is very simple as given below:
ORDER BY clause.
Syntax:
The GROUP BY clause must follow the conditions in the WHERE clause and must
precede the ORDER BY clause if one is used.
SELECT column1,
column2 FROM
table_name WHERE
[ conditions ]
GROUP BY column1,
column2 ORDER BY
Syntax:
The basic syntax of ORDER BY clause is as follows:
SELECT
column-list
FROM
table_name
[WHERE
condition]
[ORDER BY column1, column2, .. columnN] [ASC |
DESC]; Ex:
1. select * from customers order by name, salary;
2. select * from customers order by name desc;
HAVING Clause:
The HAVING clause enables you to specify conditions that filter which group results appear in
the final results.
The WHERE clause places conditions on the selected columns, whereas the HAVING clause
places conditions on groups created by the GROUP BY clause.
Syntax:
SELECT column1,
column2 FROM table1,
table2 WHERE
[ conditions ]
GROU BY column
P
column 1, VIN
HA
2 G
[ conditions ] ORDER BY
AVG () Function
The AVG () function returns the average value of a numeric column.
AVG () Syntax
SELECT AVG (column_name) FROM
table_name; Ex:
SELECT AVG (Price) FROM Products;
COUNT () Function
COUNT aggregate function is used to count the number of rows in a database table.
COUNT () Syntax:
SELECT COUNT (column_name) FROM
table_name; Ex:
SELECT COUNT (Price) FROM Products;
MAX () Function
The SQL MAX aggregate function allows us to select the highest (maximum) value for a
certain column.
MAX () Syntax:
SELECT MAX (column_name) FROM
table_name; EX:
SELECT MAX (SALARY) FROM EMP;
MIN () Syntax:
SELECT MIN (column_name) FROM
table_name; EX:
SUM () Syntax:
SELECT COUNT (column_name) FROM
table_name; EX:
SELECT COUNT (EID) FROM EMP;
PRIMARY Key:
A primary key is a field in a table which uniquely identifies each row/record in a database
table.
1) Unique values
2) NOT NULL values.
A table can have only one primary key, which may consist of single or multiple fields.
If a table has a primary key defined on any field(s), then you cannot have two records having
the same value of that field(s).
FOREIGN Key:
A foreign key is a key used to link two tables together.
This is sometimes called a referencing key.
Foreign Key is a column or a combination of columns whose values match a Primary Key
in a different table.
The relationship between 2 tables matches the Primary Key in one of the tables with a
A nested query is a query that has another query embedded within it; the embedded
query is called a sub query.
When writing a query, we sometimes need to express a condition that refers to a table
that must itself be computed.
A subquery typically appears within the WHERE clause of a query. Subqueries
can sometimes appear in the FROM clause or the HAVING clause.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE
statements along with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
There are a few rules that subqueries must follow:
WHERE column_name
OPERATOR (SELECT
column_name
[, column_name ] FROM table1
[, table2 ]
[WHERE])
Ex: select *from customers where id in (select id from customers where salary >4500);
The INSERT statement uses the data returned from the subquery to insert into another table.
The selected data in the subquery can be modified with any of the character,
date or number functions.
Syntax
I N S E RT I N T O t a b l e _ n a m e
[ (column1 [,
column2 ]) ] SELECT
[ *|column1 [, column2 ]
FROM table1 [, table2]
[ WHERE VALUE OPERATOR ]
Ex:
insert into customers_bkp select * from customers where id in (select id from customers) ;
Subqueries with the UPDATE Statement:
The subquery can be used in conjunction with the UPDATE statement.
Syntax:
UPDATE table SET column_name = new_value [ WHERE
OPERATOR [ VALUE ] (SELECTCOLUMN_NAME FROM TABLE_NAME)
[ WHERE) ];
EX:
UPDATE CUSTOMERS SET SALARY = SALARY * 0.25 WHERE AGE IN (SELECT AGE
A transaction is a unit of program execution that accesses and possibly updates various data
items.
(or)
A transaction is an execution of a user program and is seen by the DBMS as a series or list of
actions i.e., the actions that can be executed by a transaction includes the reading and writing of
database.
Transaction Operations:
Access to the database is accomplished in a transaction by the following two operations,
read(X) : Performs the reading operation of data item X from the database.
write(X) : Performs the writing operation of data item X to the database.
Example:
Let T1 be a transaction that transfers $50 from account A to account B. This transaction
can be illustrated as follows,
T1 : read(A);
A := A – 50;
write(A);
read(B);
B := B + 50;
write(B);
Transaction Concept:
The concept of transaction is the foundation for concurrent execution of transaction in a DBMS
and recovery from system failure in a DBMS.
A user writes data access/updates programs in terms of the high-level query language supported
by the DBMS.
To understand how the DBMS handles such requests, with respect to concurrency control and
recovery, it is convenient to regard an execution of a user program or transaction, as a series of
reads and writes of database objects.
To read a database object, it is first brought in to main memory from disk and then its value is
copied into a program. This is done by read operation.
To write a database object, in-memory, copy of the object is first modified and then written to
disk. This is done by the write operation.
Properties of Transaction (ACID):
There are four important properties of transaction that a DBMS must ensure to maintain
data in concurrent access of database and recovery from system failure in DBMS.
MODULE –5
Representing Data Elements & Index Structures
Data on External Storage:
Disks: Can retrieve random page at fixed cost
But reading several consecutive pages is much cheaper than reading them in random
order
Indexes are data structures that allow us to find the record ids of records with given
values in index search key fields
Architecture: Buffer manager stages pages from external storage to main memory buffer
pool. File and index layers make calls to the buffer manager.
Cost of retrieving data records through index varies greatly based on whether index is
clustered or not!
Clustered vs. Unclustered Index
Suppose that Alternative (2) is used for data entries, and that the data records are stored in
a Heap file.
To build clustered index, first sort the Heap file (with some free space on each page for
future inserts).
Overflow pages may be needed for inserts. (Thus, order of data recs is `close to’, but not
identical to, the sort order.)
Search key is not the same as key (minimal set of fields that uniquely identify a record in a
relation).
An index contains a collection of data entries, and supports efficient retrieval of all data
entries k* with a given key value k.
Given data entry k*, we can find record with key k in at most one disk I/O.
(Details soon …)
B+ Tree Indexes
Example B+ Tree
Hash-Based Indexing:
Hash-Based Indexes
Bucket = primary page plus zero or more overflow pages. Buckets contain data entries.
Hashing function h: h(r) = bucket in which (data entry for) record r belongs. h looks
atthe search key fields of r.
– Typically, index contains auxiliary information that directs searches to the desired data
entries
Alternative 1:
– If this is used, index structure is a file organization for data records (instead of a Heap file or
sorted file).
– At most one index on a given collection of data records can use Alternative 1. (Otherwise,
data records are duplicated, leading to redundant storage and potential inconsistency.)
– If data records are very large, # of pages containing data entries is high.
Implies size of auxiliary information in the index is also large, typically.
– Measuring number of page I/O’s ignores gains of pre-fetching a sequence of pages; thus,
even I/O cost is only approximated.
Choice of Indexes
1. What indexes should we create?
– Which relations should have indexes? What field(s) should be the search key?
Should we build several indexes?
1. For each index, what kind of an index should it be?
Clustered? Hash/tree?
1. One approach: Consider the most important queries in turn. Consider the best plan using
the current indexes, and see if a better plan is possible with an additional index.
If so, create it.
– Obviously, this implies that we must understand how a DBMS evaluates queries and creates
query evaluation plans!
Before creating an index, must also consider the impact on updates in the workload!
– Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too.
Clustering is especially useful for range queries; can also help on equality queries if there are
many duplicates.
Multi-attribute search keys should be considered when a WHERE clause contains several
conditions.
– Order of attributes is important for range queries.
– Such indexes can sometimes enable index-only strategies for important queries.
For index-only strategies, clustering is not important!
B+ Tree:
B+ Tree: Most Widely Used Index. Insert/delete at log F N cost; keep tree height-balanced. (F
= fanout, N = # leaf pages) Minimum 50% occupancy (except for root). Each node contains d <= m
<= 2d entries. The parameter d is called the order of the tree. Supports equality and
range-searches efficiently.
Example B+ Tree
1. Search begins at root, and key comparisons direct it to a leaf (as in ISAM).
2. Search for 5*, 15*, all data entries >= 24* ...
B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.
– average fanout = 133
Typical capacities:
4
– Height 4: 133 = 312,900,700 records
3
– Height 3: 133 = 2,352,637 records
Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could
propagate to root, decreasing height.
Example Tree After (Inserting 8*, Then) Deleting 19* and 20* ...
Deleting 19* is easy.
Deleting 20* is done with re-distribution. Notice how middle key is copied up.... And
Then Deleting 24*
Must merge.
Observe `toss’ of index entry (on right), and `pull down’ of index entry (below).
Hash Function: A hash function h, is a mapping function that maps all set of search-keys K to
the address where actual records are placed. It is a function from search keyto bucket addresses.
MODULE-6
Coping with System Failures & Concurrency Control
failure.
Causes of failures:
Some failures might cause the database to go down, some others might be trivial. On the other
hand, if a data file has been lost, recovery requires additional steps. Some common causes of
failures include:
1) System Crashes:
It can be happen due to hardware or software errors resulting in loss of main memory.
2) User error:
It can be happen due to a user inadvertently deleting a row or dropping a table.
3) Carelessness:
It can be happen due to the destruction of data or facilities by operators/users because of
lack of concentration.
4) Sabotage:
It can be happen due to the intentional corruption or destruction of data, hardware or
software facilities.
5) Statement failure:
It can be happen due to the inability by the database to execute an SQL statement.
6) Application software errors:
It can be happen due to the logical errors in the program to access the database, which causes
one or more transactions to fail.
7) Network failure:
It can be happen due to a network failure / communication software failure / aborted
asynchronous connections.
8) Media failure:
It can be happen due to the disk controller failure / disk head crash / disk to be lost. It is ht most
dangerous failure.
9) Natural physical disasters:
It can be happen due to the natural disasters like fires, floods, earthquakes, power failure, etc.
Undo Logging
Logging is a way to assure that transactions are atomic. They appear to the database either to
have executed in their entirety or not to have executed at all. A log is a sequence of log records,
each telling something about what some transaction has done. The actions of several transactions
can "interleave," so that a step of one transaction may be executed and its effect logged, then the
same happens for a step of another transaction, then for a second step of the first transaction or a
step of a third transaction, and so on. This interleaving of transactions complicates logging; it is not
sufficient simply to log the entire story of a transaction after that transaction completes.
If there is a system crash, the log is consulted to reconstruct what transactions were doing
when the crash occurred. The log also may be used, in conjunction with an archive, if there is a
media failure of a disk that does not store the log. Generally, to repair the effect of the crash, some
transactions will have their work done again, and the new values they wrote into the database are
written again. Other transactions will have their work undone, and the database restored so that it
appears that they never executed.
The first style of logging, which is called undo logging, makes only repairs of the second type.
If it is not absolutely certain that the effects of a transaction have been completed and stored on
disk, then any database changes that the transaction may have made to the database are undone,
and the database state is restored to what existed prior to the transaction.
Log Records
The log is a file opened for appending only. As transactions execute, the log manager has the
job of recording in the log each important event. One block of the log at a time is filled with log
records, each representing one of these events. Log blocks are initially created in main memory
and are allocated by the buffer manager like any other blocks that the DBMS needs. The log blocks
are written to nonvolatile storage on disk as soon as is feasible.
There are several forms of log record that are used with each of the types of logging. These are:
1. <START T> : This record indicates that transaction T has begun.
2. <COMMIT T>: Transaction T has completed successfully and will make no more changes to
database elements. Any changes to the database made by T should appear on disk. If we insist that
the changes already be on disk, this requirement must be enforced by the log manager.
3. <ABORT T> : Transaction T could not complete successfully. If transaction T aborts, no
changes it made can have been copied to disk, and it is the job of the transaction manager to make
sure that such changes never appear on disk, or that their effect on disk is cancelled if they do.
The Undo-Logging Rules
There are two rules that transactions must obey in order that an undo log allows us to
recover from a system failure. These rules affect what the buffer manager can do and also require
that certain actions be taken whenever a transaction commits.
U1 : If transaction T modifies database element X, then the log record of the form <T, X,v> must
be written to disk before the new value of X is written to disk.
U2 : If a transaction commits, then its COMMIT log record must be written to disk only after all
database elements changed by the transaction have been written to disk, but as soon thereafter as
possible.
To summarize rules U1 and U2, material associated with one transaction must be written to disk in
the following order:
a) The log records indicating changed database elements.
b) The changed database elements themselves.
c) The COMMIT log record.
In order to force log records to disk, the log manager needs a flush-log command that tells
the buffer manager to copy to disk any log blocks that have not previously been copied to disk or
that have been changed since they were last copied.
Example:
Redo Logging
While undo logging provides a natural and simple strategy for maintaining a log and
recovering from a system failure, it is not the only possible approach.
The requirement for immediate backup of database elements to disk can be avoided if we use
a logging mechanism called redo logging.
The principal differences between redo and undo logging are:
1. While undo logging cancels the effect of incomplete transactions and ignores committed ones
during recovery, redo logging ignores incomplete transactions and repeats the changes made by
committed transactions
2. While undo logging requires us to write changed database elements to disk before the COMMIT
log record reaches disk, redo logging requires that the COMMIT record appear on disk before any
changed values reach disk.
3. While the old values of changed database elements are exactly what we need to recover when
the undo rules U1 and U2 are followed, to recover using redo logging, we need the new values.
An important consequence of the redo lule RI is that unless the log has a <COMMIT T> record,
we know that no changes to the database made by transaction T have been written to disk.
To recover, using a redo log after a system crash, we do the following:
2. Scan the log forward from the beginning. For each log record <T,X,v> encountered:
Page 102
The steps to be taken to perform a nonquiescent checkpoint of a redo log are as follows:
1. Write a log record <START CKPT (T},..., Tk)>, where T1,...,Tk are all the active
<T1,A,5>
<START T2>
<COMMIT T1>
<T2,5,10>
<T2,c7,15>
<START T3>
<T3,D,20>
<END CKPT>
<COMMIT T2>
<COMMIT T3>
log records for a given transaction helps us to find the necessary records, as it did for undo logging.
ii) The last checkpoint record on the log is a <START CKPT (T1,... ,Tk)> record. We must search
back to the previous <END CKPT> record, find its matching <START CKPT (Si,..., Srn)> record, and
redo all those committed transactions that either started after that START CKPT or are among the S1's.
Undo/Redo Logging
We have two different approaches to logging, differentiated by whether the log holds old
values or new values when a database element is updated. Each has certain drawbacks:
i) Undo logging requires that data be written to disk immediately after a transaction finishes.
ii) Redo logging requires us to keep all modified blocks in buffers until the transaction commits
and the log records have been flushed.
iii) Both undo and redo logs may put contradictory requirements on how buffers are handled
during a checkpoint, unless the database elements are complete blocks or sets of blocks.
To overcome these drawbacks we have a kind of logging called undo/redo logging, that
provides increased flexibility to order actions, at the expense of maintaining more information on the
log.
The Undo/Redo Rules:
An undo/redo log has the same sorts of log records as the other kinds of log, with one exception.
The update log record that we write when a database element changes value has four components.
Record <T, X, v,w> means that transaction T changed the value of database element X; its former
value was v, and its new value is w. The constraints that an undo/redo logging system must follow are
summarized by the following rule:
UR1 : Before modifying any database element X on disk because of changes made by some
transaction T, it is necessary that the update record <T,X,v,w> appear on disk.
Rule UR1 for undo/redo logging thus enforces only the constraints enforced by both undo
logging and redo logging. In particular, the <COMMIT T> log record can precede or follow any of
A possible sequence of actions and their log entries using undo/redo logging.
Recovery with Undo/Redo Logging:
When we need to recover using an undo/redo log, we have the information in the update
records either to undo a transaction T, by restoring the old values of the database elements that T
changed, or to redo T by repeating the changes it has made. The undo/redo recovery policy is:
1. Redo all the committed transactions in the order earliest-first, and
2. Undo all the incomplete transactions in the order latest-first.
It is necessary for us to do both. Because of the flexibility allowed by undo/redo logging
regarding the relative order in which COMMIT log records and the database changes themselves
are copied to disk, we could have either a committed transaction with some or all of its changes not on
disk, or an uncommitted transaction with some or all of its changes on disk.
Check pointing an Undo/Redo Log:
A nonquiescent checkpoint is somewhat simpler for undo/redo logging than for the other
logging methods. We have only to do the following:
1. Write a <START CKPT (T1,...,Tk)> record to the log, where T1...,Tk are all the active
transactions, and flush the log.
2. Write to disk all the buffers that are dirty; i.e., they contain one or more changed database
elements. Unlike redo logging, we flush all buffers, not just those written by committed transactions.
3. Write an <END CKPT> record to the log, and flush the log.
<START T1>
<T1, 4, 4, 5>
<COMMIT Ti >
<T2, B, 9, 10>
<END CKPT>
<COMMIT T2>
<COMMIT T3>
An undo/redo log
Suppose the crash occurs just before the <COMMIT T3> record is written to disk. Then we
identify T2 as committed but T3 as incomplete. We redo T2 by setting C to 15 on disk; it is not
necessary to set B to 10 since we know that change reached disk before the <END CKPT>.
However, unlike the situation with a redo log, we also undo T3; that is, we set D to 19 on
disk. If T3 had been active at the start of the checkpoint, we would have had to look prior to the
START-CKPT record to find if there were more actions by T3 that may have reached disk and need to
be undone.
4. Make sure that enough of the log has been copied to the secure, remote site that at
least the prefix of the log up to and including the checkpoint in item (2) will survive a
media failure of the database.
5. Write a log record <END DUMP>.
At the completion of the dump, it is safe to throw away log prior to the beginning of the
checkpoint previous to the one performed in item (2) above.
<START DUMP>
<START CKPT (T1, T2)>
<T1, A, l, 5>
<T2, C, 3, 6>
<COMMIT T2>
<T1, B, 2, 7>
<END CKPT>
Dump completes
<END DUMP>
Log taken during a dump
Note that we did not show T1 committing. It would be unusual that a transaction remained
active during the entire time a full dump was in progress, but that possibility doesn't affect the
correctness of the recovery method.
Recovery Using an Archive and Log:
Suppose that a media failure occurs, and we must reconstruct the database from the most
recent archive and whatever prefix of the log has reached the remote site and has not been lost in the
crash. We perform the following steps:
1. Restore the database from the archive.
(a) Find the most recent full dump and reconstruct the database from it (i.e., copy the
archive into the database).
(b) If there are later incremental dumps, modify the database according to each,
earliest first.
2. Modify the database using the surviving log. Use the method of recovery appropriate to the
5) Logging Methods: The three principal methods for logging are undo, redo, and undo/redo, named
for the way(s) that they are allowed to fix the database during recovery.
6) Undo Logging : This method logs only the old value, each time a database element is
changed. With undo logging, a new value of a database element can only be written to disk after the
log record for the change has reached disk, but before the commit record for the transaction
performing the change reaches disk. Recovery is done by restoring the old value for every
uncommitted transaction.
7) Redo Logging : Here, only the new value of database elements is logged. With this form of
logging, values of a database element can only be written to disk after both the log record of its
change and the commit record for its transaction have reached disk. Recovery involves rewriting the
new value for every committed transaction.
8) Undo/Redo Logging: In this method, both old and new values are logged. Undo/redo logging is
more flexible than the other methods, since it requires only that the log record of a change appear on
the disk before the change itself does. There is no requirement about when the commit record
appears. Recovery is effected by redoing committed transactions and undoing the uncommitted
transactions.
9) Check pointing: Since all methods require, in principle, looking at the entire log from the dawn of
history when a recovery is necessary, the DBMS must occasionally checkpoint the log, to assure that
no log records prior to the checkpoint will be needed during a recovery. Thus, old log records can
eventually be thrown away and its disk space reused.
10) Nonquiescent Check pointing : To avoid shutting down the system while a checkpoint is
made, techniques associated with each logging method allow the checkpoint to be made while the
system is in operation and database changes are occurring. The only cost is that some log records
prior to the nonquiescent checkpoint may need to be examined during recovery.
11) Archiving: While logging protects against system failures involving only the loss of main
memory, archiving is necessary to protect against failures where the contents of disk are lost.
Archives are copies of the database stored in a safe place.
12) Recovery from Media Failures: When a disk is lost, it may be restored by starting with a full
backup of the database, modifying it according to any later incremental backups, and finally
recovering to a consistent database state by using an archived copy of the log.
13) Incremental Backups : Instead of copying the entire database to an archive periodically, a
single complete backup can be followed by several incremental backups, where only the changed
data is copied to the archive.
14) Nonqmescent Archiving : Techniques for making a backup of the data while the database is in
operation exist. They involve making log records of the beginning and end of the archiving, as well
as performing a checkpoint for the log during the archiving.
Serializability
Serializability is a widely accepted standard that ensures the consistency of a schedule. A
schedule is consistent if and only if it is serializable. A schedule is said to be serializable if the
interleaved transactions produces the result, which is equivalent to the result produced by
Example:
The result of swapping these instructions doesn’t have any impact on the remaining
instructions in the schedule. If Ia and IB refers to same data item then the following four cases must be
considered,
Case 1 IA = IB =
: Case read(x), IA read(x), IB
2Case
: 3 IA = IB =
: Case write(x), IA read(x), IB
4 :
Case 1 : Here, both IA and IB are read instructions. In this case, the execution order of the
instructions is not considered since the same data item x is read by both the transactions T A and TB.
Case 2 : Here, IA and IB are read and write instructions respectively. If the execution order of
instructions is IA I B, then transaction TA cannot read the value written by transaction TB in
instruction IB. but order is IB IA, then transaction TA can read the value written by transaction TB.
Therefore in this case, the execution order of the instructions is important.
Case 3 : Here, IA and IB are write and read instructions respectively. If the execution order of
instructions is IA I B, then transaction TB can read the value written by transaction TA, but order is IB
IA, then transaction TB cannot read the value written by transaction TB. Therefore in this case, the
execution order of the instructions is important.
Case 1 : Here, both IA and IB are write instructions. In this case, the execution order of the
instructions doesn’t matter. If a read operation is performed before the write operation, then the data
item which was already stored in the database is read.
ii) View Serializability:
Two schedules S1 and S1’ consisting of some set of transactions are said to be view equivalent, if
the following conditions are satisfied,
1) If a transaction TA in schedule S1 performs the read operation on the initial value of data
item x, then the same transaction in schedule S1’ must also perform the read operation on the initial
value of x.
2) If a transaction TA in schedule S1 reads the value x, which was written by transaction TB,
then TA in schedule S1’ must also perform the read the value x written by transaction TB.
3) If a transaction TA in schedule S1 performs the final write operation on data item x, then the
same transaction in schedule S1’ must also perform the final write operation on x.
Example:
Transaction Transaction
T1 T2
read(x)
read(x)
x := x
x := x
-10
*10
write(x)
write(x)
read(y)
read(y)
y := y
y := y /
-10
10
write(y)
write(y)
The view equivalence leads to another notion called view serializability. A schedule say S
is said to be view Serializable, if it is view equivalent with the serial schedule.
Every conflict Serializable schedule is view Serializable but every view Serializable is not
conflict Serializable.
Concurrency Control:
In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions.
when two or more database transactions that access the same data or data set are executed
concurrently with time overlap. According to Wikipedia.org, if multiple transactions are executed
serially or sequentially, data is consistent in a database. However, if concurrent transactions with
interleaving operations are executed, some unexpected data and inconsistent result may occur. Data
interference is usually caused by a write operation among transactions on the same set of data in
DBMS. For example, the lost update problem may occur when a second transaction writes a second
value of data content on top of the first value written by a first concurrent transaction. Other problems
such as the dirty read problem, the incorrect summary problem
Concurrency Control Techniques:
The following techniques are the various concurrency control techniques. They are:
A lock is nothing but a mechanism that tells the DBMS whether a particular data item is
being used by any transaction for read/write purpose.
There are two types of operations, i.e. read and write, whose basic nature are different, the
locks for read and write operation may behave differently.
The simple rule for locking can be derived from here. If a transaction is reading the content of
a sharable data item, then any number of other processes can be allowed to read the content of
the same data item. But if any transaction is writing into a sharable data item, then no other
transaction will be allowed to read or write that same data item.
Depending upon the rules we have found, we can classify the locks into two types.
Shared Lock: A transaction may acquire shared lock on a data item in order to read its content.
The lock is shared in the sense that any other transaction can acquire the shared lock on that same
data item for reading purpose.
Exclusive Lock: A transaction may acquire exclusive lock on a data item in order to both read/
write into it. The lock is excusive in the sense that no other transaction can acquire any kind of lock
(either shared or exclusive) on that same data item.
The relationship between Shared and Exclusive Lock can be represented by the following table
Shared Exclusive
Growing Phase:
In this phase the transaction can only acquire locks, but cannot release any lock
The transaction enters the growing phase as soon as it acquires the first lock it wants.
It cannot release any lock at this phase even if it has finished working with a locked data item.
Ultimately the transaction reaches a point where all the lock it may need has been acquired.
This point is called Lock Point.
Shrinking Phase:
After Lock Point has been reached, the transaction enters the shrinking phase. In this phase the
transaction can only release locks, but cannot acquire any new lock.
The transaction enters the shrinking phase as soon as it releases the first lock after crossing the
Lock Point.
A Cascading Schedule is a typical problem faced while creating concurrent schedule. Consider
the following schedule once again.
T1 T2
Lock-S (A)
Lock-X (B)
Read A;
Read B;
Temp = A * 0.1;
B = B + 100;
Unlock (A)
Write B;
Lock-X(C)
Unlock (B)
Read C;
C = C + Temp;
Write C;
Unlock (C)
The schedule is theoretically correct, but a very strange kind of problem may arise here.
T1 releases the exclusive lock on A, and immediately after that the Context Switch is made.
T2 acquires a shared lock on A to read its value, perform a calculation, update the content of account C
and then issue COMMIT. However, T1 is not finished yet. What if the remaining
portion of T1 encounters a problem (power failure, disc failure etc) and cannot be committed?
In that case T1 should be rolled back and the old BFIM value of A should be restored. In such a case T2,
which has read the updated (but not committed) value of A and calculated the value of C based on this
value, must also have to be rolled back.
We have to rollback T2 for no fault of T2 itself, but because we proceeded with T2 depending on a value
which has not yet been committed. This phenomenon of rolling back a child transaction if the parent
transaction is rolled back is called Cascading Rollback, which causes a tremendous loss of processing power
and execution time.
Using Strict Two Phase Locking Protocol, Cascading Rollback can be prevented.
In Strict Two Phase Locking Protocol a transaction cannot release any of its acquired exclusive locks
until the transaction commits.
In such a case, T1 would not release the exclusive lock on A until it finally commits, which makes it
impossible for T2 to acquire the shared lock on A at a time when A’s value has not been committed. This
makes it impossible for a schedule to be cascading.
Timestamp ordering technique is a method that determines the serializability order of different
transactions in a schedule. This can be determined by having prior knowledge about the order in which the
transactions are executed.
Timestamp denoted by TS(TA) is an identifier that specifies the start time of transaction and is generated
by DBMS. It uniquely identifies the transaction in a schedule. The timestamp of older transaction (TA) is
less than the timestamp of a newly entered transaction (TB) i.e., TS(TA) < TS(TB).
In timestamp-based concurrency control method, transactions are executed based on priorities that are
assigned based on their age. If an instruction IA of transaction TA conflicts with an instruction IB of
transaction TB then it can be said that IA is executed before IB if and only if TS(TA)
< TS(TB) which implies that older transactions have higher priority in case of conflicts.
i) System Clock : When a transaction enters the system, then it is assigned a timestamp which is equal to the
time in the system clock.
ii) Logical Counter: When a transaction enters the system, then it is assigned a timestamp which is equal to the
counter value that is incremented each time for a newly entered transaction.
Every individual data item x consists of the following two timestamp values,
i) WTS(x) (W-Timestamp(x)) : It represents the highest timestamp value of the transaction that successfully
executed the write instruction on x.
ii) RTS(x) (R-Timestamp(x)) : It represents the highest timestamp value of the transaction that successfully
executed the read instruction on x.
1) If TA executes read(x) instruction, then the following two cases must be considered, i) TS(TA) <
WTS(x)
Case 1 : If a transaction TA wants to read the initial value of some data item x that had been overwritten
by some younger transaction then, the transaction TA cannot perform the read operation and therefore
the transaction must be rejected. Then the transaction TA must be rolled back and restarted with a new
timestamp.
Case 2 : If a transaction TA wants to read the initial value of some data item x that had not been updated then
the transaction can execute the read operation. Once the value has been read, changes occur in the read
timestamp value (RTS(x)) which is set to the largest value of RTS(x) and
TS
2) If TA executes write(x) instruction, then the following two cases must be considered, i) TS(TA)
< RTS(x)
Case 1 : If a transaction TA wants to write the value of some data item x on which the read operation has
been performed by some younger transaction, then the transaction cannot execute the write operation. This is
because the value of data item x that is being generated by T A was required previously and therefore, the
system assumes that the value will never be generated. The write operation is thereby rejected and the
transaction TA must be rolled back and should be restarted with new timestamp value.
Case 2 : If a transaction TA wants to write a new value to some data item x, that was overwritten by some
younger transaction, then the transaction cannot execute the write operation as it may lead to inconsistency of
data item. Therefore, the write operation is rejected and the transaction should be rolled back with a new
timestamp value.
Case 3 : If a transaction TA wants to write a new value on some data item x that was not updated by a younger
transaction, then the transaction can executed the write operation. Once the value has been written, changes
occur on WTS(x) value which is set to the value of TS(TA).
Example:
T1 T2
read(y)
read(y)
read(x) y:= y +
show(x+y) 100
write(y)
read(x)
x:= x –
100
write(x)
The above schedule can be executed under the timestamp protocol when TS(T1) < TS(T2).
If read only transactions are executed without employing any of the concurrency control mechanisms,
then the result generated is in inconsistent state.
However if concurrency control schemes are used then the execution of transactions may be delayed and
overhead may be resulted. To avoid such issue, optimistic concurrency control mechanism is used that
reduces the execution overhead.
But the problem in reducing the overhead is that, prior knowledge regarding the conflicting transactions
will not be known. Therefore, a mechanism called “monitoring” the system is required to gain such
knowledge.
Let us consider that every transaction TA is executed in two or three-phases during its life-time. The phases
involved in optimistic concurrency control are,
1) Read Phase
2) Validation Phase and
3) Write Phase
1) Read phase: In this phase, the copies of the data items (their values) are stored in local variables and the
modifications are made to these local variables and the actual values are not modified in this phase.
2) Validation Phase: This phase follows the read phase where the assurance of the serializability is checked
upon each update. If the conflicts occur between the transaction, then it is aborted and restarted else it is
committed.
3) Write Phase : The successful completion of the validation phase leads to the write phase in which all the
changes are made to the original copy of data items. This phase is applicable only to the read-write transaction.
Each transaction is assigned three timestamps as follows,
i) When execution is initiated I(T)
ii) At the start of the validation phase V(T)
TA is less than the timestamp of transaction TB i.e., TS (TA) < TS (TB) then,
1) Before the start of transaction TB, transaction TA must complete its execution. i.e., E(TA) < I(TB)
2) The values written by transaction TA must not be necessarily matched with the values read by transaction
TB. TA must execute the write phase before TB initiate the execution of validation phase, i.e., I(TB) < E(TA) <
V(TB)
3) If transaction TA starts its execution before transaction TB completes, then the write phase of transaction
TB must be finished before transaction TA starts the validation phase.
Advantages:
i) The efficiency of optimistic techniques lie in the scarcity of the conflicts.
ii) It doesn’t cause the significant delays. iii)
Cascading rollbacks never occurs.
Disadvantages:
i) Wastage in processing time during the rollback of aborting transactions which are very long. ii) Hence,
when one process is in its critical section ( a portion of its code), no other process is