Full Copy Dbms
Full Copy Dbms
Full Copy Dbms
MANAGEMENT SYSTEMS
GCS PUBLISHERS
INDIA
DATABASE
MANAGEMENT SYSTEMS
Authors
Dr.S.Sathappan
Associate Professor, Department of Computer Science and Engineering
St.Martins Engineering College Telangana 500100, India.
Mrs.M.Prasanna Lakshmi
Assistant Professor, Department of Computer Applications
Velagapudi Ramakrishna Siddhartha Engineering College,
Vijayawada, Andhra Pradesh, India.
Mr.B Srinivas
Assistant Professor, Department of Computer Applications
Velagapudi Ramakrishna Siddhartha Engineering College,
Vijayawada, Andhra Pradesh, India.
GCS PUBLISHERS
INDIA
Book Title Database Management Systems
Authors Dr.S.Sathappan
Mrs.M.Prasanna Lakshmi
Mr.B Srinivas
Mr.Janardhana Rao Alapati,
Published by
GCS PUBLISHERS
INDIA.
ISBN: 978-93-94304-23-9
PREFACE
This book aims to provide a broad DATABASE MANAGEMENT SYSTEMS for
the importance of DATABASE MANAGEMENT SYSTEMS is well known in
various engineering fields.
It provides a logical method of explaining various complicated concepts
and stepwise methods to explain essential topics. Each chapter is well
supported with the necessary illustrations. All the chapters in the book are
arranged in a proper sequence that permits each topic to build upon earlier
studies.
This book is original in style and method. No pains have been spared to
make it as compact, perfect, and reliable as possible. Every attempt has been
made to make the book a unique one.
Writing takes a great deal of energy and can quickly consume all of the
hours in a day. With that in mind, We have to thank the numerous editors
whom I have worked with on freelance projects while concurrently writing this
book. Without their understanding and flexibility, We could never have written
this book or any other.
CHAPTER 1
FUNDAMENTALS OF DBMS
1|Page
Database Management Systems
2|Page
Database Management Systems
4|Page
Database Management Systems
5|Page
Database Management Systems
7|Page
Database Management Systems
8|Page
Database Management Systems
10 | P a g e
Database Management Systems
1. Centralized Database
The information (data) is stored at a centralized location, and
users from different locations can access this data. This type of
database contains application procedures that help users access
the data even from a remote location.
Various authentication procedures are applied for the
verification and validation of end-users. Likewise, the
application procedures provide a registration number, keeping
track and record data usage. The local area office handles this
thing.
2. Distributed Database
The distributed database has contributions from the
common database. The information captured by local computers
is just opposite the centralized database concept. The data is not
in one place and is distributed at various sites of an
organization. These sites are connected with the help of
11 | P a g e
Database Management Systems
3. Personal Database
Data is collected and stored on personal computers, which
are small and easily manageable. The data is generally used by
12 | P a g e
Database Management Systems
4. End-User Database
The end-user is usually not concerned about the transaction
or operations at various levels and is only aware of the product,
which may be software or an application. Therefore, this is a
shared database specifically designed for the end-user, just like
managers from different levels. A summary of complete
information is collected in this database.
5. Commercial Database
These are the paid versions of the enormous databases
designed uniquely for the users who want to access the
information for help. These databases are subject-specific, and
one cannot afford to maintain such a piece of enormous
information. Access to such databases is provided through
commercial links.
6. NoSQL Database
These are used for large sets of distributed data. Relational
databases effectively handle some significant data performance
issues. NoSQL databases easily manage such issues. There are
very efficient in analyzing large-size unstructured data stored at
multiple virtual servers of the cloud.
7. Operational Database
Information related to the operations of an enterprise is
stored inside this database. Functional lines like marketing,
employee relations, customer service, etc., require such kinds of
databases.
13 | P a g e
Database Management Systems
14 | P a g e
Database Management Systems
16 | P a g e
Database Management Systems
20 | P a g e
Database Management Systems
21 | P a g e
Database Management Systems
22 | P a g e
Database Management Systems
3. Three-tier architecture
24 | P a g e
Database Management Systems
2. Conceptual level
3. Internal level
1. External level
It is also known as the view level. Several people can see the
data they want from this level, which comes from the database
through conceptual and internal level mapping. This level is
called a "view."
The user is not required to understand database schema
specifics such as data structure, table definition, etc. After the
data has been retrieved from the database and sent to the view
level, the user only cares about the data that the view level
sends back to the database.
The "top level" of the three-level DBMS architecture is the
superficial level.
2. Conceptual level
It is also known as the logical level. This level describes the
whole database architecture, including data relationships,
schema, etc.
Database restrictions and security are also implemented at this
level of architecture. A DBA maintains this level (database
administrator).
3. Internal level
This level is often referred to as the physical level. This level
describes the actual storage of data in storage devices. This level
is also in charge of allocating data storage space. This is the most
fundamental level of architecture.
View of Data in DBMS Abstraction is a key aspect of database
systems. Hiding irrelevant details from users and presenting
users with an abstract view of data facilitates quick and effective
user-database interaction. In the last session, we reviewed the
25 | P a g e
Database Management Systems
26 | P a g e
Database Management Systems
27 | P a g e
Database Management Systems
DBMS Schema
Schema definition: the schema is the design of a database. There
are three sorts of schema: physical, logical, and view schema.
As an example, the schema in the diagram below shows the
relationship between three tables: course, student, and section.
The diagram simply depicts the database's design; it does not
depict the data contained in the tables. The diagram below
demonstrates that a schema is just a database's structural view
(design).
29 | P a g e
Database Management Systems
31 | P a g e
Database Management Systems
Adabas D Software AG
Adaptive Server Anywhere Sybase
Adaptive Server Enterprise Sybase
Advantage Database Server Extended Systems
Datacom Computer Associates
DB2 Everyplace IBM
Filemaker FileMaker Inc.
IDMS Computer Associates
Ingres ii Computer Associates
Interbase Inprise (Borland)
MySQL Freeware
NonStop SQL Tandem
Pervasive.SQL 2000 (Btrieve) Pervasive Software
Pervasive.SQL Workgroup Pervasive Software
Progress Progress Software
Quadbase SQL Server Quadbase Systems, Inc.
R: Base R:Base Technologies
Rdb Oracle
Red Brick Informix (Red Brick)
SQL Server Microsoft
SQLBase Centura Software
SUPRA Cincom
Teradata NCR
YARD-SQL YARD Software Ltd.
TimesTen TimesTen Performance
Software
Adabas Software AG
Model 204 Computer Corporation of
America
32 | P a g e
Database Management Systems
33 | P a g e
Database Management Systems
CHAPTER 2
DATA BASE DESIGN AND DATA MODELS
2.1 Introduction
A database collects bulk data stored in a framework that
makes it easy to find and explore related data. A well-designed
database provides reliable and up-to-date information, making
data retrieval quick and easy. We should appreciate the value of
a database to an organization that deals with large amounts of
data daily. However, it necessitates a database design capable of
analyzing all kinds of data designs quicker and more accurately.
Database Design A set of measures that aid in designing,
developing, implementing, and maintaining a company's data
management systems is referred to as database design. The
primary goal of database design is to create physical and
conceptual representations of the proposed database structure.
Designing a Good Database
Basic rules guide a successful database design process. The
first rule states that duplicate data should be avoided because it
consumes room and increases storage errors and anomalies. The
following maxim is that knowledge consistency and
comprehensiveness are critical. If a database contains incorrect
information, all records that retrieve data from it may also
contain incorrect information. As a result, all conclusions based
on such records would be incorrect, emphasizing the value of a
database design that follows all of the above guidelines.
So, how do you make sure the database design is up to par?
A well-designed database satisfies the following criteria:
• To reduce data complexity, divide the data into tables based on
particular subject areas.
34 | P a g e
Database Management Systems
35 | P a g e
Database Management Systems
2.5.1 Entity
Any object, class, individual, or location may be considered
an entity. Rectangles can be used to describe entities in the ER
diagram. See an organization as an example: a boss, a
commodity, an employee, a group, and so on may all be
considered separate entities.
a. Weak Entity
A vulnerable entity is reliant on another entity. The
vulnerable individual does not have any of its primary
attributes. A double rectangle represents the vulnerable entity.
2.5.2 Attribute
The attribute is used to define an entity's property. An
eclipse is a symbol for an attribute.
40 | P a g e
Database Management Systems
41 | P a g e
Database Management Systems
b. Composite Attribute
A composite attribute is an attribute that is made up of
several other attributes. An ellipse represents the composite
attribute, and an ellipse links those ellipses.
c. Multivalued Attribute
There can be several values for an attribute. A multivalued
attribute is a kind of attribute that has several values. A double
oval represents a multivalued attribute.
A student, for example, can have several phone numbers.
42 | P a g e
Database Management Systems
d. Derived Attribute
A derived attribute is an attribute that can be derived from
another attribute. A dotted ellipse may be used to reflect it.
A person's age, for example, varies over time and may be
determined by another characteristic such as their date of birth.
43 | P a g e
Database Management Systems
2.5.3 Relationship
The term "relationship" describes the connection between
two or more individuals. A diamond or rhombus symbolizes the
partnership.
44 | P a g e
Database Management Systems
b. One-to-many relationship
A one-to-many relationship exists where only one entity on
the left and multiple instances of the entity on the right is
associated with the relationship.
Scientists, for example, may create a large number of
inventions, but a single scientist creates each discovery.
c. Many-to-one relationship
A many-to-one relationship exists where more than one
instance of the entity on the left and only one instance of the
entity on the right is associated with the relationship.
User, for example, enrolls in only one course, but a course
may contain a large number of students.
d. Many-to-many relationship
A many-to-many relationship exists where more than one
instance of the entity on the left and more than one instance of
the entity on the right is associated with the relationship.
Workers, for example, can be assigned to a variety of tasks,
and projects can involve a large number of employees.
46 | P a g e
Database Management Systems
49 | P a g e
Database Management Systems
50 | P a g e
Database Management Systems
52 | P a g e
Database Management Systems
56 | P a g e
Database Management Systems
57 | P a g e
Database Management Systems
58 | P a g e
Database Management Systems
59 | P a g e
Database Management Systems
60 | P a g e
Database Management Systems
The view level is the most abstract level of results. The user
interface with the database system is defined at this level.
Let us pretend we are using a custom table to store customer
records. These records are memory data blocks (bytes,
gigabytes, terabytes, and so on). Programmers are often
unaware of this information.
These documents can be represented logically as fields and
attributes and their data types, and their relationships can be
applied logically. Since they are familiar with database systems,
programmers usually work at this level.
At the display level, users simply communicate with the
device through a graphical user interface (GUI) and enter data
on the screen; they have no idea how or when data is processed
since such information is shielded from them.
62 | P a g e
Database Management Systems
64 | P a g e
Database Management Systems
67 | P a g e
Database Management Systems
CHAPTER 3
RELATIONAL MODEL
3.1 Introduction
Data is represented in a relational model by tables or links.
Relational Schema: A schema describes a relationship's
configuration; for instance; The STUDENT relation's relational
schema looks like this:
Student (Stud_No, Stud_Name, Stud_Phone, Stud_State,
Stud_Country, Stud_Age)
Relational Instance: Table 1 and Table 2 represent
relational instances, which collect values present in a reference
at a certain time.
Student _Course
Stu_No Course_Id Coursce_Name
1 C1 Dwdm
2 C2 Bda
1 C2 Bda
68 | P a g e
Database Management Systems
69 | P a g e
Database Management Systems
71 | P a g e
Database Management Systems
Example:
EID NAME PHONE
0010 GUPTHA 9492004956
Explanation:
Since Name is a composite attribute and Phone is a multi-
valued attribute in the above relationship, it violates the domain
restriction.
2. Key Constraints or Uniqueness Constraints :
These are known as uniqueness constraints because they
guarantee that each tuple in the relation is unique. A connection
may have several keys or candidate keys (minimal superkeys),
from which we choose one as the primary key. There are no
restrictions on selecting the primary key from candidate keys.
However, choosing the candidate key with the fewest attributes
is recommended.
Since null values are not permitted in the primary key, the
Not Null restriction is also part of the key constraint.
Example:
EID NAME PHONE
0010 GURU 9492004956
0112 RAJ 123456987
0113 NARESH 897456123
Explanation:
The primary key in the above table is EID, and the first and
last tuples have the same value in EID, i.e., 01, so the key
constraint is violated.
72 | P a g e
Database Management Systems
Example:
EID NAME DNO
010 GURU 10
011 GAJA 10
73 | P a g e
Database Management Systems
012 RAJA 11
013 RANGA 11
DNO PLACE
10 CHENNAI
11 HYDERABAD
Explanation:
The foreign key in the first relation is DNO, and the main
key in the second relation is DNO. DNO = 22 in the first table's
international key is not permitted because DNO = 22The
primary key of the second relation is not specified. As a result,
the referential integrity constraints are violated here. Database
Constraints should be used to enforce data integrity.
3.4 Data Integrity
The accuracy, continuity, and dependability of data in a
database is data integrity. Data integrity is implemented within
one or more similar databases by both database designers and
database developers. For example, in the Northwind categories
table, the Category Name must be unique regardless of how
many records the table contains. If this rule is not followed, the
Seafood category can be stored twice in the table, which breaks
our market guidelines.
3.4.1 Types of Data Integrity
There are four types of data integrity:
1. Row integrity
2. Column integrity
3. Referential integrity
4. User-defined integrity
Row integrity
74 | P a g e
Database Management Systems
Column integrity
Column integrity is the condition that all data contained in a
column follow the same format and meaning. It includes data
sort, data length, data default value, number of potential values,
whether duplicate values are permitted, and whether null
values are permitted.
For example, in the employees' table, LastName must be
varchar, no more than 20 characters long, default to an empty
string, and cannot be null.
75 | P a g e
Database Management Systems
Referential integrity
How can you say who supplied Longlife Tofu in the goods
table? Referential integrity ensures the existence of a seller.
You find the data row Longlife Tofu in the goods table and
discover that the value in the SupplierID column is
products table:
You then look in the suppliers' table for the record Supplier ID 4
and discover that the Company Name is Tokyo Traders.
Supplierstable:
76 | P a g e
Database Management Systems
This specifies the form of data, data volume, and a few other
attributes directly associated with the type of data in a column.
Default constraint:
This specifies what value the column can use where no value
is expressly specified when entering a record into the row.
Nullability constraint:
This specifies when a column is NOT NULL or allows NULL
values to be stored.
Primary key constraint:
This is the table's special identifier. Each row must have its
worth. The primary key may be either a sequentially
incremented integer number or a natural set of data reflecting
what is happening in the real world (e.g., Social Security
Number). NULL values are not permitted in primary key
values.
Unique constraint:
This specifies that the values in a column must be identical
and that no duplicates can be kept. Even if a column is not the
table's primary key, the data in that column must be unique at
times. For example, the CategoryName column is special in the
Categories table, but it is not the primary key.
Foreign key constraint:
This determines how referential integrity is applied between
two tables.
Check constraint:
This defines a validation rule for the data values in a
column, so it is a user-defined data integrity constraint. The user
defines this rule when designing the column in a table. Not
every database engine supports check constraints. As of version
6.0, MySQL does not support check constraints. However, you
79 | P a g e
Database Management Systems
can use enum data type or set data type to achieve some of its
functionalities in other Relational Database Management
Systems (Oracle, SQL Server, etc.).
80 | P a g e
Database Management Systems
81 | P a g e
Database Management Systems
82 | P a g e
Database Management Systems
83 | P a g e
Database Management Systems
1. Primary key
It is the first key used to mark one and only one instance of
an object uniquely. In the PERSON table, an agent may have
several keys. The most appropriate key from both lists is
designated as the primary key.
Since each employee's ID is unique, ID can be used as the
primary key in the EMPLOYEE table. We may also use License
Number and Passport Number as primary keys in the
EMPLOYEE table since they are separate.
The primary key for each organization is chosen depending on
the requirements and developers.
84 | P a g e
Database Management Systems
2. Candidate key
A candidate key is an attribute or collection of attributes that
can uniquely define a Tuple.
Except for the primary key, the remaining attributes are called
candidate keys. The nominee keys are just as powerful as the
primary key.
For, e.g., the primary key should be id in the EMPLOYEE
table. The remaining attributes, such as SSN, Passport Number,
and License Number, are called candidate keys.
Super Key
A super key is a set of attributes that can be used to define a
tuple uniquely. A candidate key is a superset of a super key.
85 | P a g e
Database Management Systems
86 | P a g e
Database Management Systems
87 | P a g e
Database Management Systems
88 | P a g e
Database Management Systems
90 | P a g e
Database Management Systems
93 | P a g e
Database Management Systems
CHAPTER 4
RELATIONAL ALGEBRA AND
RELATIONAL CALCULUS
Select (σ)
Select (σ) is a one-dimensional hierarchical operation. This
procedure retrieves the horizontal subset (row subset) of the
relation that meets the conditions. This can include operators
such as>, =, >=, =, and! = to exclude data from the link. It may
95 | P a g e
Database Management Systems
Project (∏)
Project (∏) – This unary operator is identical to the pick
function mentioned above. Depending on the conditions
defined, it generates the subset of relations. It only selects
chosen columns/attributes from the relation-vertical subset of
relation in this case. The above select operation produces a
subset of the relation, except with all of the attributes. It is
written as follows:
96 | P a g e
Database Management Systems
97 | P a g e
Database Management Systems
Union (U)
Union (U) – Union (U) is a binary operator that joins the
tuples of two ties. It is represented by
RUS
Where R and S are relationships and U is the operator.
Design_Employee U Testing_Employee
It differs from Cartesian products in the following ways:
98 | P a g e
Database Management Systems
99 | P a g e
Database Management Systems
Set-difference (-)
Set-difference (-)-The operator is a binary operator. This
operator generates a new relation with tuples in one but not the
other. The ‘-‘symbol represents it.
R–S
Where R and S denote the relationships.
Assume we want to find staff working in the Design
department but not in research.
Assignment
The assignment operator ‘ ' is used to delegate the product of
a relational operation to a temporary relational attribute, as the
name implies. This is useful because there are several phases in
a relational operation. It is impossible to handle it in a single
sentence. Assigning the outcome to a temporary relation and
then using this temporary relation in the next operation
simplifies the job.
T S – denotes relation S is assigned to temporary relation T
A relational operation ∏a1, a2 (σ p (E)) with selection and
projection can be divided below.
T σ p (E)
S ∏a1, a2 (T)
100 | P a g e
Database Management Systems
4.1.1 JOINS
Natural join – Natural join – As previously said, the
Cartesian product essentially blends the properties of two
relations into one. However, the current reference would not
have valid tuples. It just contains tuple variations. We must
perform a selection procedure on the Cartesian product result to
get the right tuples. This sequence of operations – Cartesian
product followed by collection – is merged into a single relation
known as natural join. RS denotes it.
R∞S
Assume we want to choose staff from Department 10. Then
we will do a Cartesian product on EMPLOYEES and DEPT to
find the DEPT ID in both ties that match 10. The same is
achieved for natural joins by
σ EMPLOYEE.DEPT_ID = DEPT>DEPT_ID AND EMPLOYEE.DEPT_ID =
10(EMPLOYEE X DEPT)
101 | P a g e
Database Management Systems
102 | P a g e
Database Management Systems
Left outer join– This action keeps all the tuples in the left-
hand side connection. Both matching attributes in the right-hand
relation are displayed with values. Those that do not have a
value are displayed as NULL.
103 | P a g e
Database Management Systems
104 | P a g e
Database Management Systems
105 | P a g e
Database Management Systems
107 | P a g e
Database Management Systems
4.2.1 TRC
Tuple relational calculus (TRC) is a basic subset of first-order
logic that filters tuples based on defined conditions. TRC
considers tuples as equivalent status as variables, and field
referencing can pick the tuple components. It is denoted by the
letter 'T,' with conditions denoted by the pipe sign and enclosed
by curly braces.
Syntax of TRC:
{T | Conditions)
The TRC syntax allows you to denote table names or
reference names and define tuple variables and column names.
108 | P a g e
Database Management Systems
It specifies the column names with the table name using the ‘.'
operator symbol.
The Tuple variable name, such as 'T,' is used to specify the
reference names in TRC. TRC Relationship Specification Syntax:
Relation(T)
E.g., if the relation name is Product, it can be denoted as
Product (T). Similarly, TRC allows you to decide the parameters.
The condition applies to a certain attribute or column.
For example, suppose data for a certain product id of value
10 must be represented. In that case, it can be denoted as
T.product id=10, where T is the tuple variable representing the
row of the table. Let us assume the Product table in the database
as follows:
109 | P a g e
Database Management Systems
4.2.2 DRC
The regional domain calculus is dependent on domain and
attributes filtering. DRC is the vector spectrum over the domain
elements or the field values. It is a kind of first-order logic
simple subset. It is domain-dependent as opposed to TRC,
which is tuple-dependent. For the relational calculus
representations in DRC, the formal variables are explicit. In
DRC, the domain attributes are denoted as C1, C2,..., and Cn.
The condition relevant to the attributes is denoted as the
formula specifying the condition for fetching the F(C1, C2,...Cn
).
Syntax of DRC in DBMS
{c1, c2,...,cn| F(c1, c2,... ,cn)}
Let us assume the same Product table in the database as
follows:
Product Product Product Unit
Product_id
Category Name Price
8 New TV Unit 1 $100
10 New TV Unit 2 $120
12 Existing TV Cabinet $77
DRC for the product name attribute from the Product table
needs where the product id is 10; It will be demoted as:
{< Product Name, Product_id> | ∈ Product ∧ Product_id> 10}
The result of the domain relational calculus for the Product
table will be
110 | P a g e
Database Management Systems
111 | P a g e
Database Management Systems
CHAPTER 5
BASIC OF SQL
112 | P a g e
Database Management Systems
113 | P a g e
Database Management Systems
Creating Database
1 To build a database, enter the command CREATE DATABASE
name; in the prompt. CREATE DATABASE database_name;
For example, to create a database to store the tables: CREATE
DATABASE stud;
2. Enter the following command to interact with the database
USE DATABASE;
For example, to use the stud database created, give the
command USE stud;
114 | P a g e
Database Management Systems
118 | P a g e
Database Management Systems
119 | P a g e
Database Management Systems
120 | P a g e
Database Management Systems
COUNT() Function
Count returns the number of rows in the table, either with or
without a condition.
Its general syntax is as follows:
FIRST () Function
The first function returns the first value of a selected
column.
Using FIRST () function
LAST () Function
The LAST function returns the last value of the chosen
column.
Syntax of the LAST function is,
Using LAST () function
122 | P a g e
Database Management Systems
last(salary)
8000
MAX() Function
The MAX function returns the highest value from a table
column.
SUM() Function
The SUM function returns the absolute sum of the numeric
values in a given column.
Syntax for SUM is,
SELECT SUM(column_name) from table-name;
Using SUM() function
Consider the following Emp table
SQL query to find sum of salaries will be,
SELECT SUM(salary) FROM emp;
Result of the above query is,
SUM(salary)
41000
Syntax of UCASE,
SELECT UCASE(column_name) from table-name;
Using UCASE() function
Consider the following Emp table
SQL query for using UCASE is,
124 | P a g e
Database Management Systems
LCASE() Function
The LCASE function translates the values of string columns
to lowercase characters.
125 | P a g e
Database Management Systems
MID() Function
The MID function derives substrings from string-style
column values in a table.
The MID function syntax is,Syntax for MID function is,
Using MID() function
SELECT MID(column_name, start, length) from table-name;
Consider the following Emp table
SQL query will be,
SELECT MID(name,2,2) FROM emp;
Result will come out to be,
MID(name,2,2)
ROUND() Function
The ROUND function is used to round a numeric field to
the nearest integer. It is applied to decimal point values.
Syntax of Round function is,
SELECT ROUND(column_name, decimals) from table-name;
Using ROUND() function
Consider the following Emp table
SQL query is,
SELECT ROUND(salary) from emp;
Result will be,
ROUND(salary)
9001
8001
6000
10000
126 | P a g e
Database Management Systems
8000
(i)Unique Constraint
This restriction ensures that no two rows have the same
value in the stated columns. For example, using the UNIQUE
constraint on the Admno of student table ensures that no two
students have the same admission number, and the constraint
can be used as follows:
question is from keyword. The table list will include all of the
tables that must be accessed during query execution. So far, the
table-list has only included one table since our queries have only
ever accessed one table. However, if you want to mention
employee numbers and names alongside department names, the
FROM clause would look like this:
However, listing both the EMP and DEPT tables after the
FROM keyword is insufficient to achieve the desired
performance. We do not just want the tables to be accessed in
the query; we want the way they are accessed to be coordinated
in a specific way. We would like to link the display of a
department name to the display of employee numbers and
names who work in that department. As a result, we need to
link employee records in the EMP table to department records in
the DEPT table. The Relational operator JOIN is used in SQL to
accomplish this. The JOIN is a fundamental principle in
relational databases and, by extension, the SQL language. This
logical combining or relating data from different tables is a
standard and requirement in almost all applications, so it is such
a central concept. The ability to consistently connect data from
various tables has been a key factor in the widespread adoption
of relational database systems.
A peculiar feature of performing JOINs, or relating
information from different tables logically as required in the
above query, is that, although the method is universally referred
to as performing a JOIN, the way it is represented in SQL does
not always involve the use of the word JOIN. This can be
particularly perplexing for newcomers to JOINs. To satisfy the
question above, for example, we will code the WHERE clause as
follows:
129 | P a g e
Database Management Systems
SELECT EMPNO,ENAME,DNAME
FROM EMP,DEPT
Outer JOINs
In addition to the basic form of the JOIN, also known as a
NATURAL JOIN and used to connect rows in various tables, we
often need a little more syn-tax than we have seen so far to get
all of the details we need. Assume we want to list all
departments and their employee numbers and names and any
departments that do not have any employees.
As a first attempt, we might code:
ORDER BY DEPT.DEPTNO;
However, the findings of this first attempt do not provide a
full response to the original question. Department 40, titled
Operations, has no staff assigned to it, but it does not appear in
the results.
The issue here is that the simple JOIN only extracts matching
instances of records from the joined tables. Something else is
required to force any record instances that do not fit a record in
the other table. To do this,
We use a construct known as an OUTER JOIN in situations
where we want to force rows that match and do not match a
typical JOIN condition into our results set. There are three types
of OUTER JOINS: LEFT, RIGHT, and FULL OUTER JOINS. The
following tables will be used to illustrate the OUTER JOINS.
Person table
131 | P a g e
Database Management Systems
The car table holds information about cars. The REG is the
primary key. A car can have an owner or not.
132 | P a g e
Database Management Systems
JOIN keyword in the question. For example, list all cars and
their owner's identification and name, including any cars that no
one owns.
SELECT REG,MODEL,ID,NAME
Self Joins
To compare records from the same table, it is often
important to JOIN a table to itself. For example, suppose we
want to compare compensation values on an individual basis
among employees.
134 | P a g e
Database Management Systems
Sample Tables:
Student Details
S_Id Name Address
1 GURU CHENNAI
2 KUMAR GUNTUR
3 NARESH NELLORE
4 VENU GUDUR
Student Marks
S_ID Name Marks Age
1 GURU 86 19
2 KUMAR 91 20
3 NARESH 87 23
4 VENU 81 22
135 | P a g e
Database Management Systems
Creating Views
Using the Build VIEW argument, we can create a View. A
View can be built from a single table or several tables.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
136 | P a g e
Database Management Systems
Naresh Nellore 87
Venu Gudur 81
Deleting Views
We learned how to construct a View, but what if the View
we generated is no longer required We will want to get rid of it.
SQL gives us the ability to remove a current View. Using the
DROP phrase, we can delete or exclude a View.
Syntax:
DROP VIEW view_name;
Updating Views
Certain requirements must be met to upgrade a view. If any
of these criteria are not met, we will be unable to update the
view.
1. The GROUP BY and ORDER BY clauses should not be included
in the SELECT statement used to build the view.
2. The DISTINCT keyword should not be included in the SELECT
argument.
Everything NOT NULL values should be present in the
View.
Nested or complex queries should not be used to build the
view.
137 | P a g e
Database Management Systems
Syntax:
CREATE OR REPLACE VIEW view_name AS
SELECT column1,coulmn2,..
FROM table_name
WHERE condition;
For example, if we want to update the view Marks
View and add the field AGE to this View from Student
Marks Table, we can do this as:
CREATE OR REPLACE VIEW Marks View AS
SELECT Student Details. NAME, Student Details.
ADDRESS, Student Marks. MARKS, Student Marks. AGE
FROM Student Details, Student Marks
WHERE StudentDetails.NAME = Student Marks.NAME;
If we fetch all the data from Marks View now as:
SELECT * FROM Marks View;
Output:
NAME ADDRESS MARKS AGE
GURU CHENNAI 86 19
KUMAR GUNTUR 91 20
NARESH NELLORE 87 23
VENU GUDUR 81 22
138 | P a g e
Database Management Systems
Syntax:
INSERT INTO view_name (column1, column2 , column3,..)
VALUES (value1, value2, value.);
view_name: Name of the View
Example:
In the below example we will insert a new row in the View
Details View that we created above in the example of “creating
views from a single table”.
INSERT INTO Details View (NAME, ADDRESS)
VALUES ("Suresh", "Gurgaon");
If we fetch all the data from Details View now,
SELECT * FROM Details View;
Output:
NAME ADDRESS
GURU CHENNAI
KUMAR GUNTUR
NARESH NELLORE
VENU GUDUR
the row from the actual chart, and the adjustment is reflected in
the view.
DELETE FROM view_name
WHERE condition;
view_name: Name of view from where we want to delete
rows
condition: Condition to select rows
Example:
We will delete the last row from the view Details View in
this example, which we added in the above example of inserting
rows.
DELETE FROM Details View
WHERE NAME="Suresh";
If we fetch all the data from Details View now,
SELECT * FROM Details View;
Output:
Name Address
GURU CHENNAI
KUMAR GUNTUR
NARESH NELLORE
VENU GUDUR
140 | P a g e
Database Management Systems
Example:
In the below example, we create a View Sample View from
Student Details Table with the CHECK OPTION clause.
CREATE VIEW Sample View AS
SELECT S_ID, NAME
FROM Student Details
WHERE NAME IS NOT NULL
WITH CHECK OPTION;
In this view, if we now try to insert a new row with a null
value in the Name column, it will give an error because the view
is created with the condition for NAME column as NOT NULL.
For example, though the View is updatable but then also the
below query for this View is not valid:
INSERT INTO Sample View(S_ID)
VALUES(6);
NOTE: The default value of NAME column is null.
Uses of a View :
Views should be present in a good database for the
following reasons:
1. Limiting data access –
Views provide an extra layer of protection to a table by
limiting access to a predefined collection of rows and columns.
141 | P a g e
Database Management Systems
142 | P a g e
Database Management Systems
You can group several SQL queries and execute them all at
once as part of a transaction.
Transactional Properties
Transactions have the four standard properties mentioned
below, often referred to by the acronym ACID.
Atomicity guarantees that all activities within the work unit are
completed. Otherwise, the transaction is aborted at the point of
failure, and all prior operations are reverted to their previous
state.
Consistency ensures that the database switches states correctly
after a successfully committed transaction.
Isolation allows transactions to run independently and
transparently to one another.
Durability guarantees that the outcome or consequence of a
committed transaction is preserved in the event of a system
failure.
5.11 Nested Queries
Nested queries are one of SQL's most powerful features. A
nested query contains another query; the embedded query is a
subquery. Of course, the embedded query can be a nested
query, allowing for queries with extremely deep nested
structures. We occasionally need to express a condition in a
query that refers to a table that must be computed. The query
used to create this subsidiary table is a subquery included in the
main query. A subquery is typically found in a query's WHERE
clause. Subqueries can occasionally appear in the FROM or
HAVING clauses.
143 | P a g e
Database Management Systems
144 | P a g e
Database Management Systems
145 | P a g e
Database Management Systems
146 | P a g e
Database Management Systems
147 | P a g e
Database Management Systems
148 | P a g e
Database Management Systems
149 | P a g e
Database Management Systems
And
conditon1 condition2 Result
F F F
F T F
F U F
T F F
T T T
T U U
U U U
150 | P a g e
Database Management Systems
OR
conditon1 condition2 Result
F F F
F T T
F U U
T F T
T T T
T U T
U U U
151 | P a g e
Database Management Systems
153 | P a g e
Database Management Systems
CHAPTER 6
PL/SQL AND ADVANCED SQL
6.1 Introduction to PL/SQL
PL/SQL is a combination of SQL and the procedural
features of programming languages. Oracle Corporation
developed it in the early 90's to enhance the capabilities of SQL.
PL/SQL is one of three key programming languages embedded
in the Oracle Database and SQL itself and Java.
SQL's disadvantages include
SQL does not provide programmers with state checking,
looping, or branching techniques.
SQL statements are sent to the Oracle engine one at a time,
increasing traffic and decreasing speed.
SQL does not support error checking when manipulating data.
154 | P a g e
Database Management Systems
BEGIN
executable statements
EXCEPTIONS
exception handling statements
END;
155 | P a g e
Database Management Systems
156 | P a g e
Database Management Systems
var3 varchar2(20) ;
BEGIN
null;
END;
/
Output:
PL/SQL procedure completed.
Explanation:
SET SERVEROUTPUT ON: It displays the buffer used by the
dbms_output.
var1 INTEGER : It is the declaration of variable,
named var1 which is of integer type. Many other data types can
be used like float, int, real, smallint, long etc. It also supports
variables used in SQL and NUMBER(prec, scale), varchar,
varchar2 etc.
PL/SQL procedure completed.: It is displayed when the code is
compiled and executed successfully.
Slash (/) after END;: The slash (/) tells the SQL*Plus to execute
the block.
INITIALISING VARIABLES:
The variables can also be initialised just like in other
programming languages. Let us see an example for the same:
157 | P a g e
Database Management Systems
null;
END;
/
Output:
PL/SQL procedure completed.
Explanation:
Assignment operator (:=) : It assigns a value to a variable.
Displaying Output:
The outputs are displayed using DBMS_OUTPUT, a built-in
package that enables users to display output, debug
information, and send messages from PL/SQL blocks,
subprograms, packages, and triggers.
Let us see an example to see how to display a message using
PL/SQL :
Output:
I love GeeksForGeeks
158 | P a g e
Database Management Systems
2. Using Comments:
Like in many other programming languages, in PL/SQL,
comments can also be put within the code that does not affect
the code. There are two syntaxes to create comments in
PL/SQL :
Single Line Comment: The symbol-- is used to create a single
line comment.
Multi Line Comment: To create comments that span over
several lines, the symbol /* and */ is used.
Example to show how to create comments in PL/SQL :
Output:
I love GeeksForGeeks
159 | P a g e
Database Management Systems
Output:
Enter value for a: 24
old 2: a number := &a;
new 2: a number := 24;
Enter value for b: 'GeeksForGeeks'
old 3: b varchar2(30) := &b;
new 3: b varchar2(30) := 'GeeksForGeeks';
160 | P a g e
Database Management Systems
Sum of 2 and 3 is = 5
161 | P a g e
Database Management Systems
PL/SQL construct that allows the user to name the work area
and access the stored information.
Use of Cursor
The major function is to retrieve data, one row at a time, from
a result set, unlike the SQL commands which operate on all the
rows in the result set at one time.
Cursors are used when the user needs to update records in a
singleton fashion or row by row in a database table.
The Data stored in the Cursor is called the Active Data Set.
Oracle DBMS has another predefined area in the main memory
Set, within which the cursors are opened. Hence the size of the
cursor is limited by the size of this pre-defined area.
Cursor Actions
Declare Cursor: A cursor is declared by defining the SQL
statement that returns a result set.
Open: A Cursor is opened and populated by executing the SQL
statement defined by the cursor.
162 | P a g e
Database Management Systems
Fetch: When the cursor is opened, rows can be fetched from the
cursor one by one or in a block to perform data manipulation.
Close: After data manipulation, close the cursor explicitly.
Deallocate: Finally, delete the cursor definition and release all
the system resources associated with the cursor.
163 | P a g e
Database Management Systems
Syntax:
DECLARE variables;
records;
create a cursor;
BEGIN
OPEN cursor;
FETCH cursor;
process the records;
CLOSE cursor;
END;
164 | P a g e
Database Management Systems
Creating a Procedure
A procedure is created with the CREATE OR REPLACE
PROCEDURE statement. The simplified syntax for the CREATE
OR REPLACE PROCEDURE statement is as follows −
CREATE [OR REPLACE] PROCEDURE procedure_name
[(parameter_name [IN | OUT | IN OUT] type [, ...])]
{IS | AS}
BEGIN
< procedure_body >
END procedure_name;
In this case,
procedure-name specifies the name of the procedure.
[OR REPLACE] option allows the modification of an existing
procedure.
The optional parameter list contains name, mode and types of
the parameters. IN represents the value passed from outside
and OUT represents the parameter used to return a value
outside of the procedure.
procedure-body contains the executable part.
The AS keyword is used instead of the IS keyword for creating a
standalone procedure.
Example
The example below shows how to write a basic procedure
that displays the string 'Hello World!' 'appears on the projector
as executed.
166 | P a g e
Database Management Systems
When the above code is run via the SQL prompt, it yields the
following result:
Executing a Standalone Procedure
A standalone procedure can be called in two ways −
Using the EXECUTE keyword
Calling the name of the procedure from a PL/SQL block
The above procedure named 'greetings' can be called with the
EXECUTE keyword as −
BEGIN
greetings;
END;
/
The above call will display −
Hello World
PL/SQL procedure completed.
Deleting a Standalone Procedure
The DROP Protocol argument deletes a standalone procedure.
The syntax for deleting a protocol is as follows:
DROP PROCEDURE procedure-name;
You can drop the greetings procedure by using the following
statement −
DROP PROCEDURE greetings;
PL/SQL Subprogram Parameter Modes
167 | P a g e
Database Management Systems
since quarter has four values and area has two, the resulting
multiset would have 4*2+4*1+1*2+1 or 15 tuples, as seen in
Table 1. NULL values have been applied to the dimension
columns Quarter and Area to denote the accumulation. If
required, they can be quickly replaced by the more accurate
‘ALL.' To be more precise, we should add two CASE clauses as
follows:
SELECT CASE WHEN grouping (QUARTER) = 1 THEN
'All' ELSE QUARTER END AS QUARTER, CASE WHEN
grouping (REGION) = 1 THEN 'All' ELSE REGION END AS
REGION, SUM(SALES)
FROM SALESTABLE
GROUP BY CUBE (QUARTER, REGION)
If a NULL value is produced during the aggregation, the
grouping() function returns 1, otherwise, it returns 0. This
distinguishes between generated NULLs and potential actual
NULLs arising from the data. We would not do this in future
OLAP queries to avoid overcomplicating them.
Also, observe the NULL value for Sales in the fifth row. This
represents an attribute combination not present in the original
SALESTABLE since no products were sold in Q3 in Europe.
Remark that besides SUM() also other SQL aggregator functions
such as MIN(), MAX(), COUNT() and AVG() can be used in the
SELECT statement.
170 | P a g e
Database Management Systems
Q2 Europe 40
Q2 America 60
Q3 Europe NULL
Q3 America 40
Q4 Europe 20
Q4 America 80
Q1 NULL 130
Q2 NULL 100
Q3 NULL 40
Q4 NULL 90
NULL Europe 110
NULL America 250
NULL NULL 360
171 | P a g e
Database Management Systems
operator but not for the CUBE operator. Consider the following
problem:
SELECT QUARTER, REGION, SUM (SALES)
FROM SALESTABLE
GROUP BY ROLLUP (QUARTER, REGION)
This query generates the union of three groupings {(quarter,
region), (quarter}, ()} where () again represents the full
aggregation. The resulting multiset will thus have 4*2+4+1 or 13
rows and is displayed in Table 2. You can see that the regional
dimension is first rolled up followed by the quarter dimension.
Note the two rows that have been left out compared to the result
of the CUBE operator in Table 1.
172 | P a g e
Database Management Systems
UNION ALL
SELECT NULL, REGION, SUM(SALES)
FROM SALESTABLE
GROUP BY REGION
The result is given in Table
Table 3: Result from SQL query with GROUPING SETS
operator
QUARTER REGION SALES
Q1 NULL 130
Q2 NULL 100
Q3 NULL 40
Q4 NULL 90
NULL Europe 110
NULL America 250
A single SQL query can contain several CUBE, ROLLUP,
and GROUPING SETS statements. Various CUBE, ROLLUP can
generate equivalent result sets, and GROUPING SETS
combinations. Consider the following problem:
SELECT QUARTER, REGION, SUM (SALES)
FROM SALESTABLE
GROUP BY CUBE (QUARTER, REGION)
This query is equivalent to:
SELECT QUARTER, REGION, SUM(SALES)
FROM SALESTABLE
GROUP BY GROUPING SETS ((QUARTER, REGION),
(QUARTER), (REGION), ())
Likewise, the following query:
SELECT QUARTER, REGION, SUM(SALES)
FROM SALESTABLE
GROUP BY ROLLUP (QUARTER, REGION)
174 | P a g e
Database Management Systems
is identical to:
SELECT QUARTER, REGION, SUM(SALES)
FROM SALESTABLE
GROUP BY GROUPING SETS ((QUARTER, REGION),
(QUARTER),())
Given the volume of data to be aggregated and collected,
OLAP SQL queries can become extremely time-consuming.
Turning some of these OLAP queries into materialized views is
one way to improve results. For example, a SQL query with a
CUBE operator can be used to precompute aggregations on a set
of dimensions, which can then be saved as a materialized view.
A downside of view materialization is that additional work is
required to update these materialized views periodically.
However, it can be remembered that most businesses are happy
with a near to current version of the data, such that
synchronization can be achieved overnight or at set time
intervals.
EMP_NAME CHAR(20))
;
Of course, this is a simplified implementation, and a
development hierarchy will most certainly necessitate several
more columns. However, the simplicity of this table would be
sufficient for our purposes of studying recursion. To make the
data in this table fit the data in our diagram, we will load it as
follows:
MGR_ID EMP_ID EMP_NAME
-1 1 BIG BOSS
1 2 LACKEY
1 3 LIL BOSS
1 4 BOOTLICKER
2 5 GRUNT
3 6 TEAM LEAD
6 7 LOW MAN
6 8 SCRUB
The MGR ID for the top-most node is set to some value
showing that this row has no parent, in this case, –1. Now that
we have loaded the data, we can write a query to traverse the
hierarchy using recursive SQL. If we need to report on the
whole organizational framework under LIL BOSS, the recursive
SQL using a CTE would suffice::
WITH EXPL (MGR_ID, EMP_ID, EMP_NAME) AS
(
SELECT ROOT.MGR_ID, ROOT.EMP_ID,
ROOT.EMP_NAME
FROM ORG_CHART ROOT
WHERE ROOT.MGR_ID = 3
177 | P a g e
Database Management Systems
UNION ALL
178 | P a g e
Database Management Systems
179 | P a g e
Database Management Systems
181 | P a g e
Database Management Systems
182 | P a g e
Database Management Systems
CHAPTER 7
QUERY PROCESSING
7.1 Introduction
The primary goal of creating a database is to store related
data in one location, allowing the user to access and manipulate
it as needed. Data access and manipulation should be done
efficiently, that is, it should be easy and quick to access.
However, a database is a system, and the users can be
another system, an application, or a person. The data can be
requested in a language that the user understands. On the other
hand, DBMS has its vocabulary (SQL) that it knows. As a result,
users must query the database in its native language, SQL. SQL
is a high-level language designed to bridge the gap between the
user and the database management system. However, the
underlying systems in the DBMS will not comprehend SQL.
There must be some kind of low-level language that these
systems can understand. Typically, any SQL query is converted
into a low-level language that the system can understand using
relational algebra. However, no user will be able to write
relational algebra queries directly. It necessitates a thorough
understanding of it.
As a result, DBMS asks its users to write queries in SQL. It
validates the user's code before converting it to low-level
languages. It then chooses the best execution path, runs the
query, and retrieves the data from internal memory. Many of
these methods are referred to together as query processing.
183 | P a g e
Database Management Systems
185 | P a g e
Database Management Systems
187 | P a g e
Database Management Systems
This SQL is written in plain English that all parties can read. As
a result, the user will write his request in SQL as follows:
SELECT STD_ID, STD_NAME, ADDRESS, DOB
FROM STUDENT s, CLASS c
WHERE s.CLASS_ID = c.CLASS_ID
AND c.CLASS_NAME = ‘DESIGN_01’;
As he issues this query, the DBMS reads it and transforms it
into a format that the DBMS can use to further process and
synthesize it. This is the parsing and translation step of query
processing. The query processor scans the submitted SQL query
and partitions it into individual meaningful tokens. The various
tokens in our example are ‘SELECT * FROM', ‘STUDENT s',
‘CLASS c', ‘WHERE', ‘s.CLASS ID = c.CLASS ID', ‘AND', and
‘c.CLASS NAME = ‘DESIGN 01'. The processor can simply use
these tokenized query types to continue processing. It executes a
query on the data dictionary tables to determine if the tables and
columns in these tokens exist or not. If they are not in the data
dictionary, the submitted query would fail at this stage.
Otherwise, it checks to see if the syntax in the query is valid.
Please keep in mind that it does not check whether or not
DESIGN 01 resides in the table; rather, it verifies whether or not
'SELECT * FROM', 'WHERE','s.CLASS ID = c.CLASS ID', 'AND',
and other SQL-defined syntaxes are included. It transforms the
syntaxes into relational algebra, relational tree, and graph
representations after validating them. These are simple to
comprehend and are managed by the optimizer for further
processing. The query above can be translated into one of the
two relation algebra forms below. The first query recognizes the
students in the DESIGN 01 class and extracts only the requested
columns. Another query first extracts the desired columns from
188 | P a g e
Database Management Systems
190 | P a g e
Database Management Systems
The query spends majority of time in accessing the data from the
memory. It too has several factors determining the cost of access
time – disk I/O time, CPU time, network access time etc. Disk
access time is the time the processor takes to search and find the
record in the secondary memory and return the result. This
takes the majority of time while processing a query. Other times
can be ignored compared to disk I/O time.
While calculating the disk I/O time, only two factors are usually
considered – seek time and transfer time. The seek time takes
the processor to find a single record in the disk memory and is
represented by tS. For example, to find the student ID of a
student ‘John’, the processor will fetch the memory based on the
index and the file organization method. The time taken by the
processor to hit the disk block and search for his ID is called the
seek time. The time taken by the disk to return fetched result to
the processor / user is called transfer time and is represented by
tT.
Suppose a query needs to seek S times to fetch a record and B
blocks must be returned to the user. Then the disk I/O cost is
calculated as below
(S* tS)+ (B* tT)
It is the sum of the total time taken for seek S times and the total
time taken to transfer B blocks. Here, other costs like CPU,
RAM, etc are ignored as they are comparatively small. Disk I/O
alone is considered as cost of a query. However, we have to
calculate the worst case cost – the maximum time taken by the
query when there is a worst case like buffer is full or no buffers,
etc. The memory space / buffers depend on the number of
queries executing in parallel. All queries would be using the
buffers and determining the number of buffers / blocks
191 | P a g e
Database Management Systems
192 | P a g e
Database Management Systems
193 | P a g e
Database Management Systems
n (r)
A disjunction is the union of all records that satisfy the given
collection condition i.
Negation: The product of a selection (r) is the set of tuples of
the given relation r where the selection condition evaluates to
false. However, there are no nulls, and this set is just the set of
tuples to r that are not in (r).
We can execute the selection operations using the following
algorithms using the previously mentioned selection predicates:
Conjunctive sorting with one index: In this method of selection
operation implementation, we first decide if an attribute has any
access paths. If one is discovered, algorithms based on the index
would do well. The collection operation is completed by
ensuring that each chosen document meets the remaining basic
requirements. The cost of the chosen algorithm provides the cost
of this algorithm.
Conjunctive selection via Composite index: A composite index
provides information on several attributes. For certain
conjunctive choices, such an index can exist. If the given
selection operation proves correct on the equality condition on
two or more attributes and a composite index exists on these
combined attribute fields, then explicitly check the index. This
kind of index determines the appropriate index algorithms.
Conjunctive collection through identifier intersection: This
implementation uses database pointers or record identifiers. It
employs indices with record pointers on the fields involved in
the particular selection condition. It searches each index for
pointers to tuples that satisfy the individual condition. As a
result, the intersection of all the retrieved pointers is the set of
pointers to the tuples that satisfy the conjunctive condition. The
195 | P a g e
Database Management Systems
Selection
Cost Why So?
Algorithms
Linear Search ts + br * It needs one initial seek with
tT br block transfers.
Linear Search, ts + It is the average case where it
Equality on Key (br/2) * needs only one record satisfying
tT the condition. So as soon as it is
found, the scan terminates.
Primary B+-tree (hi +1) * Each I/O operation needs one
index, Equality on (tr + ts) seek and one block transfer to
Key fetch the record by traversing the
tree's height.
Primary B+-tree hi * (tT + It needs one seek for each level of
index, Equality on ts) + b * the tree, and one seek for the first
a Nonkey tT block.
Secondary B+-tree (hi + 1) * Each I/O operation needs one
index, Equality on (tr + ts) seek and one block transfer to
Key fetch the record by traversing the
tree's height.
Secondary B+-tree (hi + n) * It requires one seek per record
index, Equality on (tr + ts) because each record may be on a
Nonkey different block.
Primary B+-tree hi * (tr + It needs one seek for each level of
index, ts) + b * the tree, and one seek for the first
Comparison tT block.
Secondary B+-tree (hi + n) * It requires one seek per record
index, (tr + ts) because each record may be on a
Comparison different block.
197 | P a g e
Database Management Systems
for i= 0 to nh do begin
read Hsi and build an in-memory hash index on it;
for each tuple tr in Hri do begin
probe the hash index on Hsi to locate all tuples
such that ts[JoinAttrs] = tr[JoinAttrs];
for each matching tuple ts in Hsi do begin
add tr ⋈ ts to the result;
end
end
end
The Hash join algorithm in which we have computed the
natural join of two given relations r and s. In the algorithm,
there are various terms used:
tr ⋈ ts: It defines the concatenation of tuple tr and ts attributes,
further followed by projecting the repeated attributes.
tr and ts: These are the tuples of relations r and s, respectively.
Let us understand the hash join algorithm with the following
steps:
Step 1: In the algorithm, firstly, we have partitioned both
relations r and s.
Step 2: After partitioning, we perform a separate indexed
nested-loop join on each partition pair i using for loop as i = 0 to
nh.
Step 3: For performing the nested-loop join, it initially creates a
hash index on each si and then probes with tuples from ri. In the
algorithm, relation r is the probe input, and relation s is
the build input.
There is a benefit of using the Hash Join algorithm i.e., the hash
index on si is built-in memory, so for fetching the tuples, we do
199 | P a g e
Database Management Systems
200 | P a g e
Database Management Systems
1. Overflow Resolution
The overflow resolution method is applied when a hash index
overflow is detected during the build phase. The overflow
resolution works in the following way:
201 | P a g e
Database Management Systems
202 | P a g e
Database Management Systems
205 | P a g e
Database Management Systems
206 | P a g e
Database Management Systems
CHAPTER 8
QUERY OPTIMIZATION
8.1 Introduction
We have seen how a query can be processed based on
indexes and joins, and how they can be transformed into
relational expressions. The query optimizer uses these two
techniques to determine which process or expression to consider
for evaluating the query.
8.2 Types of Query Optimization
There are two methods of query optimization
8.2.1 Cost-based Optimization (Physical)
This is based on the cost of the query. The query can use
different paths based on indexes, constraints, sorting methods
etc. This method mainly uses the statistics like record size,
number of records, number of records per block, number of
blocks, table size, whether whole table fits in a block,
organization of tables, uniqueness of column values, size of
columns etc.
Suppose, we have series of table joined in a query.
T1 ∞ T2 ∞ T3 ∞ T4∞ T5 ∞ T6
For above query we can have any order of evaluation. We can
start taking any two tables in any order and start evaluating the
query. Ideally, we can have join combinations in (2(n-1))! / (n-1)!
ways. For example, suppose we have 5 tables involved in join,
then we can have 8! / 4! = 1680 combinations. However, when
query optimizer runs, it does not always evaluate in all these
ways. It uses Dynamic Programming to generate the costs for
join orders of any combination of tables. It is calculated and
generated only once. This least cost for all the table
207 | P a g e
Database Management Systems
208 | P a g e
Database Management Systems
209 | P a g e
Database Management Systems
210 | P a g e
Database Management Systems
Here both the queries will return same result. But when
we observe them closely, we can see that the first query will join
the two tables and then apply the filters. That means, it traverses
whole table to join, hence the number of records involved is
more. Nevertheless,second query, applies the filters on each
table first. This reduces the number of records on each table (in
class table, the number of record reduces to one in this case!).
Then it joins these intermediary tables. Hence the cost in this
case is comparatively less.
Instead of writing query the optimizer creates relational algebra
and tree for above case.
211 | P a g e
Database Management Systems
212 | P a g e
Database Management Systems
213 | P a g e
Database Management Systems
216 | P a g e
Database Management Systems
217 | P a g e
Database Management Systems
219 | P a g e
Database Management Systems
Have a look at below relational algebra and tree for EMP and
DEPT.
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND
EMP_LAST_NAME = ‘Joseph’ (EMP) ∞DEPT)
Or
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND
EMP_LAST_NAME = ‘Joseph’ (EMP ∞DEPT))
Or
σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (∏
EMP_ID, DEPT_NAME, DEPT_ID (EMP ∞DEPT))
223 | P a g e
Database Management Systems
225 | P a g e
Database Management Systems
CHAPTER 9
SCHEMA REFINEMENT
226 | P a g e
Database Management Systems
228 | P a g e
Database Management Systems
229 | P a g e
Database Management Systems
would violate the FD; to see this violation, compare the first
tuple in the figure with the new tuple.
schema R:
230 | P a g e
Database Management Systems
result:= A;
while (changes to result) do
for each functional dependency, B → C in, F do
begin
if B ⊆ result then result:= result ∪ C;
end
231 | P a g e
Database Management Systems
Identifying (ABF)+
Then, what is the secret for R? As I previously said, we can
experiment with any of the left-hand side attributes (because
they are the determiners) or any of their combinations. We
might get the idea to use F as one of the key attributes from the
preceding example. So, let us see if we can find (ABF)+, the
closure of attribute set ABF.
the end product is ABF
Using the preceding example, we might say (AB)+ = ABCDE
If we know C and F, we can deduce the result as ABCDEF,
which contains all of R's attributes, using CF B.
As a result, the solution is that ABF is one of the keys for R
since (ABF)+ contains all of R's attributes.
233 | P a g e
Database Management Systems
9.7.1 First Normal Form (1NF): As per the rule of the first
normal form, an attribute (column) of a table cannot hold
multiple values. It should hold only atomic values.
Example: Suppose a company wants to store its employees'
names and contact details. It creates a table that looks like this:
Rather, we must divide such data into several rows, with the
value being automatic in row and column intersections. Data
redundancy improves by using the First Normal Form since
several columns of the same data will be in different rows.
However, each row as a whole will be unique.
236 | P a g e
Database Management Systems
237 | P a g e
Database Management Systems
238 | P a g e
Database Management Systems
For example
239 | P a g e
Database Management Systems
240 | P a g e
Database Management Systems
Since the lecturers and books associated with the course are
independent, this database design has a multivalued
dependency; if we added a new book to the AHA course, we
would have to add one record for each lecturer and vice versa.
There are two multivalued dependencies in this relation:
coursebook and, equivalently, course lecturer. Databases with
multivalued dependencies show redundancy as a result. In
database normalization, the fourth normal form requires that
either every multivalued dependency X Y is trivial or that X is a
super key for every nontrivial multivalued dependency X Y. A
multivalued dependency X Y is trivial if Y is a subset of X or if X
U Y is the entire set of the relation's attributes.
(F1 U F2 U F3 U … U Fn)+ = F+
where,
F1, F2, F3, …, Fn – Sets of Functional dependencies of relations
R1, R2, R3, …, Rn.
(F1 U F2 U F3 U … U Fn)+ - Closure of Union of all sets of
functional dependencies.
242 | P a g e
Database Management Systems
243 | P a g e
Database Management Systems
244 | P a g e
Database Management Systems
245 | P a g e
Database Management Systems
246 | P a g e
Database Management Systems
247 | P a g e
Database Management Systems
CHAPTER 10
TRANSACTION MANAGEMENT
10.1 Introduction
Often, a collection of several operations on the database
appears to be a single unit from the point of view of the
database user. For example, transferring funds from a checking
account to a savings account is a single operation from the
customer’s standpoint; however, it consists of several operations
within the database system.
Collections of operations that form a single logical unit of work are
called transactions. A database system must ensure proper
execution of transactions despite failures—either the entire
transaction executes, or none of it does. Furthermore, it must
manage concurrent execution of transactions to avoid the
introduction of inconsistency.
248 | P a g e
Database Management Systems
249 | P a g e
Database Management Systems
251 | P a g e
Database Management Systems
the identifier of the modified data item, and both the old value
(before modification) and the new value (after modification) of
the data item. Only then is the database itself modified.
Maintaining a log allows redoing a modification to ensure
atomicity and durability and the possibility of undoing a
modification to ensure atomicity in case of a failure during
transaction execution.
252 | P a g e
Database Management Systems
253 | P a g e
Database Management Systems
255 | P a g e
Database Management Systems
256 | P a g e
Database Management Systems
257 | P a g e
Database Management Systems
258 | P a g e
Database Management Systems
10.7 Serializability
Before we can consider how the concurrency-control
component of the database system can ensure serializability; we
consider determining when a schedule is serializable. Certainly,
serial schedules are serializable, but it is harder to determine
259 | P a g e
Database Management Systems
261 | P a g e
Database Management Systems
262 | P a g e
Database Management Systems
264 | P a g e
Database Management Systems
266 | P a g e
Database Management Systems
10.9.1 Locking
Instead of locking the entire database, a transaction could lock
only those data items it accesses. The two-phase locking
protocol is a simple, widely used technique that ensures
serializability. Stated simply, two-phase locking requires a
transaction to have two phases, one where it acquires locks but does
not release any, and a second phase where the transaction releases locks
but does not acquire any. (In practice, locks are usually released
only when the transaction completes its execution and has been
either committed or aborted.)
268 | P a g e
Database Management Systems
10.9.2 Timestamps
Another category of techniques for implementing isolation
assigns each transaction a timestamp, typically when it begins.
For each data item, the system keeps two timestamps. The read
timestamp of a data item holds the largest (that is, the most
recent) timestamp of those transactions that read the data item.
The write timestamp of a data item holds the transaction's
timestamp that wrote the current value of the data item.
Timestamps ensure that transactions access each data item in
order; otherwise, transactions are aborted and restarted with a
new timestamp.
269 | P a g e
Database Management Systems
CHAPTER 11
CONCURRENCY CONTROL
11.2 Locks:
1. : The two modes of locks are:
2. 1. Shared. If a transaction Ti has obtained a shared-mode lock
(denoted by S) on item Q, Ti can read, but cannot write, Q.
3. 2. Exclusive. If a transaction Ti has obtained an exclusive-mode
lock (denoted by X) on item Q, Ti can read and write Q.
270 | P a g e
Database Management Systems
271 | P a g e
Database Management Systems
11.3 Starvation
Suppose a transaction T2 has a shared-mode lock on a data item,
and another transaction T1 requests an exclusive-mode lock on
the data item. T1 has to wait for T2 to release the shared-mode
272 | P a g e
Database Management Systems
273 | P a g e
Database Management Systems
274 | P a g e
Database Management Systems
275 | P a g e
Database Management Systems
The lock manager uses this data structure: For each data item
currently locked, it maintains a linked list of records, one for
each request, in the order in which the requests arrived. It uses a
hash table, indexed on the name of a data item, to find the
linked list (if any) for a data item; this table is called the lock
table. Each record of the linked list for a data item notes which
transaction made the request and what lock mode it requested.
The record also notes if the request has currently been granted.
277 | P a g e
Database Management Systems
278 | P a g e
Database Management Systems
There are two principal methods for dealing with the deadlock
problem. We can use a deadlock prevention protocol to ensure
that the system never enters a deadlock state. Alternatively, we
can allow the system to enter a deadlock state and recover by
using a deadlock detection and deadlock recovery scheme.
11.7.1 Dead Lock Prevention:
Various locking protocols do not guard against deadlocks.
One way to prevent deadlock is to use an ordering of data items
and request locks in a sequence consistent with the ordering.
281 | P a g e
Database Management Systems
Fig. 11.2 Wait for Graph with Cycle and Without Cycle
This implies that transactions T18, T19, and T20 are all stuck.
282 | P a g e
Database Management Systems
283 | P a g e
Database Management Systems
284 | P a g e
Database Management Systems
285 | P a g e
Database Management Systems
286 | P a g e
Database Management Systems
1. Use the value of the system clock as the timestamp; that is, a
transaction’s timestamp is equal to the value of the clock when
the transaction enters the system.
2. Use a logical counter that is incremented after a new
timestamp has been assigned; a transaction’s timestamp is equal
to the counter's value when the transaction enters the system.
The timestamps of the transactions determine the serializability
order. Thus, if TS(Ti ) < TS(Tj ), then the system must ensure that
287 | P a g e
Database Management Systems
291 | P a g e
Database Management Systems
Snapshot Isolation
Snapshot isolation is a multi-version concurrency-control
protocol based on validation. Unlike multi-version two-phase
locking, it does not require transactions to be declared read-only
or updated. Snapshot isolation does not guarantee serializability
but is supported by many database systems.
292 | P a g e
Database Management Systems
CHAPTER 12
RECOVERY SYSTEM &
DATA ON EXTERNAL STORAGE
293 | P a g e
Database Management Systems
Block transfer between memory and disk storage can result in:
• Successful completion. The transferred information arrived
safely at its destination.
• Partial failure. A failure occurred amid transfer, and the
destination block has incorrect information.
• Total failure. The failure occurred sufficiently early during
the transfer that the destination block remains intact.
294 | P a g e
Database Management Systems
• Undo sets the data item specified in the log record to the old
value using a log record.
• Redo sets the data item specified in the log record to the new
value using a log record.
297 | P a g e
Database Management Systems
298 | P a g e
Database Management Systems
299 | P a g e
Database Management Systems
300 | P a g e
Database Management Systems
301 | P a g e
Database Management Systems
2. In the undo phase, the system rolls back all transactions in the
undo-list. It performs rollback by scanning the log backward
from the end.
a. Whenever it finds a log record belonging to a transaction in
the undolist, it performs undo actions just as if the log record
had been found during the rollback of a failed transaction.
b. When the system finds a <Ti start> log record for a
transaction Ti in undo-list, it writes a <Ti abort> log record to
the log, and removes Ti from undo-list.
c. The undo phase terminates once undo-list becomes empty,
that is, the system has found <Ti start> log records for all
transactions that were initially in undo-list. After the undo
phase of recovery terminates, normal transaction processing can
resume.
302 | P a g e
Database Management Systems
303 | P a g e
Database Management Systems
Database Buffering:
One might expect transactions to force-output all modified
blocks to disk when they commit. Such a policy is called the
force policy. The no-force policy alternative allows a transaction
to commit even if it has modified some blocks that have not yet
been written back to disk.
Similarly, one might expect that blocks modified by a still active
transaction should not be written to disk. This policy is called
the no-steal policy. The alternative, the steal policy, allows the
system to write modified blocks to disk even if the transactions
that made those modifications have not all committed. As long
as the write-ahead logging rule is followed, all the recovery
algorithms work correctly even with the steal policy.
When a block B1 is to be output to disk, all log records about
data in B1 must be output to stable storage before B1 is output.
No must write to the block B1 be in progress while the block is
being output, since such a write could violate the write-ahead
logging rule. We can ensure that there are no writes in progress
by using a special means of locking:
• Before a transaction performs a write on a data item, it
acquires an exclusive lock on the block in which the data item
resides. The lock is released immediately after the update has
been performed.
• The following sequence of actions is taken when a block is to
be output:
a) Obtain an exclusive lock on the block, to ensure that no
transaction performs a write on the block.
b) Output log records to stable storage until all log records about
block B1 have been output.
c) Output block B1 to disk.
304 | P a g e
Database Management Systems
12.7 Aries
Introduction: The ARIES recovery scheme is a state-of-
the-art scheme that supports several features to provide greater
concurrency, reduce logging overheads, and minimize recovery time. It
is also based on repeating history, and allows logical undo
operations. The scheme flushes pages continuously and does not need
to flush all pages at the time of a checkpoint. It uses log sequence
numbers (LSNs) to implement various optimisations that reduce
the time taken for recovery.
12.7.1 ARIES:
ARIES uses many techniques to reduce the time taken for
recovery and reduce checkpointing overhead. In particular,
ARIES can avoid redoing many logged operations that have
306 | P a g e
Database Management Systems
307 | P a g e
Database Management Systems
308 | P a g e
Database Management Systems
309 | P a g e
Database Management Systems
310 | P a g e
Database Management Systems
311 | P a g e
Database Management Systems
312 | P a g e
Database Management Systems
313 | P a g e
Database Management Systems
314 | P a g e
Database Management Systems
The data entries are located at the lowest level of the tree,
known as the leaf level; there were additional employee records,
with ages less than 22 and ages greater than 50. (the lowest and
highest age values that appear in Figure 12.6). Additional
records under the age of 22 would appear in the leaf pages to
the left of page L1, and records over the age of 50 would appear
in the leaf pages to the right of the page.
The B+ tree is an index structure that ensures that all paths
from the root to a leaf in a given tree are equal length, ensuring
that the structure is still balanced in height. Since each non-leaf
node can handle many node-pointers and the tree's height,
finding the correct leaf page is faster than a binary search of the
pages in a sorted file.
315 | P a g e
Database Management Systems
316 | P a g e
Database Management Systems
I/O. The cost equals the time required to seek the first page in
the block and transfer all pages in the block. Such blocked access
can be much cheaper than issuing one I/O request per page in
the block, especially if these requests do not follow
consecutively. We would have an additional seek cost for each
page in the block.
318 | P a g e
Database Management Systems
320 | P a g e
Database Management Systems
B+ Tree Removal
Assume we want to remove 60 from the preceding example.
In this case, we must delete 60 from both the intermediate and
fourth leaf nodes. If we delete it from the intermediate node, the
tree will no longer fulfill the B+ tree law. As a result, we must
change it to have a balanced tree.
321 | P a g e
Database Management Systems
REFERENCES
Raghurama Krishnan, Johannes Gehrke , Database Management
Systems, 3rd edition, Tata McGraw Hill, New Delhi,India.
Elmasri Navate, Fundamentals of Database Systems, Pearson
Education,India. Abraham Silberschatz, Henry F. Korth, S.
Sudarshan (2005),
Database System Concepts, 5th edition, McGraw-Hill, New
Delhi,India.
Peter Rob, Carlos Coronel (2009), Database Systems Design,
Implementation and Management, 7thedition.
322 | P a g e
Database Management Systems
325 | P a g e
Database Management Systems
327 | P a g e