0% found this document useful (0 votes)
23 views97 pages

CH 2

Uploaded by

Abel Demelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views97 pages

CH 2

Uploaded by

Abel Demelash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 97

Chapter 2:

Advanced SQL
Learning Objectives
• In this chapter, the student will learn:
• Revision on what the SQL means,
• How to use the advanced SQL JOIN operator syntax,
• About the different types of subqueries and correlated
queries,
• How to use SQL functions to manipulate dates, strings, and
other data,
• About the relational set operators UNION, UNION
ALL, INTERSECT, and MINUS.
• How does a DBMS organize files?
• What is an index?
• What are different types of indexes?
What is SQL?
• SQL is Structured Query Language, which is a computer
language for storing, manipulating and retrieving data
stored in a relational database.
• SQL is the standard language for Relational Database
System.
• All the Relational Database Management Systems
(RDMS) like MySQL, MS Access, Oracle, Sybase,
Informix, Postgres and SQL Server use SQL as their
standard database language. Also, they are using
different dialects, such as:
• MS SQL Server using T-SQL,
• Oracle using PL/SQL,
• MS Access version of SQL is called JET SQL (native
format) etc.
Why SQL?
SQL is widely popular because it offers the following
advantages:
•Allows users to access data in the relational database
management systems.
•Allows users to describe the data.
•Allows users to define the data in a database and
manipulate that data.
•Allows to embed within other languages using SQL
modules, libraries & pre-compilers.
•Allows users to create and drop databases and tables.
•Allows users to create view, stored procedure, functions in
a database.
•Allows users to set permissions on tables, procedures and
views.
A Brief History of SQL
• 1970 – Dr. Edgar F. "Ted" Codd of IBM is known as the
father of relational databases. He described a
relational model for databases.
• 1974 – Structured Query Language appeared.
• 1978 – IBM worked to develop Codd's ideas and
released a product named System/R.
• 1986 – IBM developed the first prototype of
relational database and standardized by ANSI. The
first relational database was released by Relational
Software which later came to be known as Oracle.
What is RDBMS?
• RDBMS stands for Relational Database Management
System.
• RDBMS is the basis for SQL, and for all modern database
systems like MS SQL Server, IBM DB2, Oracle, MySQL, and
Microsoft Access.
• A Relational database management system (RDBMS) is a
database management system (DBMS) that is based on the
relational model as introduced by E. F. Codd.
What is a table?
• The data in an RDBMS is stored in database objects which
are called as tables. This table is basically a collection of
related data entries and it consists of numerous columns
and rows.
• Remember, a table is the most common and simplest form
of data storage in a relational database. The following
program is an example of a CUSTOMERS table:
What is RDBMS?
SQL Process
• When you are executing an SQL command for any
RDBMS, the system determines the best way to carry
out your request and SQL engine figures out how to
interpret the task. There are various components
included in this process. These components are –
 Query Dispatcher
 Optimization Engines
 Classic Query Engine
 SQL Query Engine, etc.
• A classic query engine handles all the non-SQL
queries, but a SQL query engine won't handle logical
files.
SQL Process
• The full definition of an SQL query engine is a piece of
software that Recognizes and interprets
the SQL language. Implements data access, both
reading and writing, for a relational database, in a way
that can be controlled by a user's SQL queries
• optimization Engine Query optimization is a feature
of many relational database management systems.
The query optimizer attempts to determine the most
efficient way to execute a given query by considering
the possible query plans
• Query dispatcher. The function of the dispatcher is to
route the query request to either CQE or SQE,
depending on the attributes of the query.
All queries are processed by the dispatcher.
SQL Architecture:|
Assignment
SQL Commands
• The standard SQL commands to interact with
relational databases are CREATE, SELECT, INSERT,
UPDATE, DELETE and DROP.
• These commands can be classified into the following
groups based on their nature:
• DDL - Data Definition Language
Command Description
Creates a new table, a
view of a table, or other
CREATE
object in the
database.
Modifies an existing
ALTER database object, such as a
table.
Deletes an entire table, a
view of a table or other
DROP
objects in the
database.
SQL Commands
• DML - Data Manipulation Language
Command Description
Retrieves certain records
SELECT
from one or more tables.
INSERT Creates a record.
UPDATE Modifies records.
DELETE Deletes records.

• DCL - Data Control Language


Command Description
GRANT Gives a privilege to user.
Takes back privileges
REVOKE
granted from user.
What is a field?
• Every table is broken up into smaller entities called
fields. The fields in the CUSTOMERS table consist of ID,
NAME, AGE, ADDRESS and SALARY.
• A field is a column in a table that is designed to
maintain specific information about every record in the
table.
What is a Record or a Row?
• A record is also called as a row of data is each
individual entry that exists in a table. A record is a
horizontal entity in a table.
What is a column?
• A column is a vertical entity in a table that contains all
information associated with a specific field in a table.
What is a NULL value?
• A NULL value in a table is a value in a field that
appears to be blank, which means a field with a
NULL value is a field with no value.
• It is very important to understand that a NULL
value is different than a zero value or a field that
contains spaces. A field with a NULL value is the
one that has been left blank during a record
creation.
SQL Join Operators
• Relational join operation merges rows from two
tables and returns rows with one of the following
• Natural join - Have common values in common
columns
• Equality or inequality - Meet a given join condition
• Outer join: Have common values in common
columns or have no matching values
• Inner join: Only rows that meet a given criterion
are selected.
SQL Join Operators
SQL Join Operators
INNER JOIN
• Syntax
• SELECT table1.column1, table1.column2, table2.column1,....
• FROM table1
• INNER JOIN table2
• ON table1.matching_column = table2.matching_column;
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT
FROM EMPLOYEE
INNER JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

EMP_NAME DEPARTMENT

Angelina Testing
Robert Development
Christian Designing
Kristen Development
Sub queries and
Correlated Queries
• Subquery is a query inside another query
Sub query can return:
• One single value - One column and one row
• A list of values - One column and multiple rows
• A virtual table - Multicolumn, multirow set of values
• No value - Output of the outer query might result in
an error or a null empty set
WHERE Sub queries
• Uses inner SELECT subquery on the right side of a
WHERE comparison expression
• Value generated by the subquery must be of a
comparable data type
• If the query returns more than a single value, the
DBMS will generate an error
• Can be used in combination with joins
IN subqueries
• Used to compare a single attribute to a list of values
HAVING subqueries
• HAVING clause is used to specify a search condition for a
group or an aggregate.
• Having is used in a GROUP BY clause. If you are not using
GROUP BY clause then you can use HAVING function like a
WHERE clause.
Syntax
• HAVING subquery • IN subquery
SELECT column-names SELECT column-names
FROM table-name FROM table-name1
WHERE condition WHERE value IN (SELECT column-name
GROUP BY column-names FROM table-name2
HAVING condition WHERE condition)
Eg: Eg: SELECT ProductName
FROM Product
WHERE Id IN (SELECT ProductId
FROM OrderItem
WHERE Quantity > 100)
Multirow Subquery
Operators:
ANY and ALL
•ALL operator: Allows comparison of a single value with a list
of values returned by the first subquery Uses a comparison
operator other than equals.
•ANY operator: Allows comparison of a single value to a list of
values and selects only the rows for which the value is greater
than or less than any value in the list.
FROM clause:
•Specifies the tables from which the data will be
drawn
•Can use SELECT subquery
Syntax
• ANY • ALL
SELECT column-names SELECT column-names
FROM table-name FROM table-name
WHERE column-name operator ANY WHERE column-name operator ALL
(SELECT column-name (SELECT column-name
FROM table-name FROM table-name
WHERE condition) WHERE condition)
• Eg: Eg:
Attribute List
Subqueries
• SELECT statement uses attribute list to indicate what
columns to project in the resulting set:
• Inline subquery
• Subquery expression included in the attribute list that must
return one value
• Column alias cannot be used in attribute list computation if
alias is defined in the same attribute list
Correlated Subquery| Assignment
• Executes once for each row in the outer query
• Inner query references a column of the outer
subquery
• Can be used with the EXISTS special operato
Syntax
Subqueries with the SELECT Statement • Subqueries with the UPDATE
SELECT column_name [, column_name ] Statement
FROM table1 [, table2 ] SET column_name = new_value
WHERE column_name OPERATOR [ WHERE OPERATOR [ VALUE ]
(SELECT column_name [, (SELECT COLUMN_NAME FROM
column_name ] TABLE_NAME)
FROM table1 [, table2 ] [ WHERE) ]
[WHERE]) • Subqueries with the DELETE
Statement
•Subqueries with the INSERT
Statement DELETE FROM TABLE_NAME
INSERT INTO table_name [ (column1 [, [ WHERE OPERATOR [ VALUE ]
column2 ]) ] (SELECT COLUMN_NAME FROM
SELECT [ *|column1 [, column2 ] FROM TABLE_NAME)
table1 [, table2 ] [ WHERE) ]
[ WHERE VALUE OPERATOR ]
Functions and Procedures
• SQL:1999 supports functions and procedures
• Functions/procedures can be written in SQL itself, or in an
external programming language (e.g., C, Java).
• Functions written in an external languages are particularly useful
with specialized data types such as images and geometric
objects.
• Example: functions to check if polygons overlap, or to compare images
for similarity.
• Some database systems support table-valued functions, which
can return a relation as a result.
• SQL:1999 also supports a rich set of imperative constructs,
including
• Loops, if-then-else, assignment
• Many databases have proprietary procedural extensions to
SQL that differ from SQL:1999.
SQL Functions
• A function is a predefined formula which takes one
or more arguments as input then process the
arguments and returns an output.
• Functions always use a numerical, date, or string
value
• Value may be part of a command or may be an
attribute located in a table
• Function may appear anywhere in an SQL
statement where a value or an attribute can be
used
SQL Functions
• Define a function that, given the name of a
department, returns the count of the number of
instructors in that department.
create function dept_count (dept_name varchar(20))
returns integer
begin
declare d_count integer;
select count (* ) into d_count
from instructor
where instructor.dept_name = dept_name
return d_count;
end

• The function dept_count can be used to find the


department names and budget of all departments
with more that 12 instructors.
select dept_name, budget
from department
where dept_count (dept_name ) > 12
SQL functions

• Compound statement: begin … end


• May contain multiple SQL statements between
begin and end.
• returns -- indicates the variable-type
that is returned (e.g., integer)
• return -- specifies the values that are to
be returned as result of invoking the
function
• SQL function are in fact parameterized
views that generalize the regular notion of
views by allowing parameters.
Table Functions
• SQL:2003 added functions that return a relation as a result
• Example: Return all instructors in a given department
create function instructor_of (dept_name char(20))
returns table (
ID varchar(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
return table
(select ID, name, dept_name, salary
from instructor
where instructor.dept_name = instructor_of.dept_name)

• Usage
select *
from table (instructor_of (‘Music’))
SQL Join Operators
SQL Set Operation |
Assignment
• The SQL Set operation is used to combine the two
or more SQL SELECT statements.
Types of Set Operation
• Union
• UnionAll
• Intersect
• Minus
Procedural SQL
• PL/SQL is a combination of SQL along with the
procedural features of programming languages. It
was developed by Oracle Corporation in the early
90's to enhance the capabilities of SQL.
• Performs a conditional or looping operation by
isolating critical code and making all application
programs call the shared code
• Yields better maintenance and logic control
• Standard SQL statements
• PL/SQL is one of three key programming languages
embedded in the Oracle Database, along with SQL
itself and Java.
Features of PL/SQL
• PL/SQL is tightly integrated with SQL.
• It offers extensive error checking.
• It offers numerous data types.
• It offers a variety of programming structures.
• It supports structured programming through
functions and procedures.
• It supports object-oriented programming.
• It supports the development of web applications
and server pages.
Advantages of PL/SQL
• SQL is the standard database language and PL/SQL is strongly integrated with
SQL. PL/SQL supports both static and dynamic SQL.
• Static SQL supports DML operations and transaction control from PL/SQL block.
In Dynamic SQL, SQL allows embedding DDL statements in PL/SQL blocks.
• PL/SQL allows sending an entire block of statements to the database at one
time. This reduces network traffic and provides high performance for the
applications.
• PL/SQL gives high productivity to programmers as it can query, transform, and
update data in a database.
• PL/SQL saves time on design and debugging by strong features, such as
exception handling, encapsulation, data hiding, and object-oriented data types.
• Applications written in PL/SQL are fully portable.
• PL/SQL provides high security level.
• PL/SQL provides access to predefined SQL packages.
• PL/SQL provides support for Object-Oriented Programming.
• PL/SQL provides support for developing Web Applications and Server Pages.
PL/SQL Stored Functions
• Stored function: Named group of procedural and
SQL statements that returns a value
• As indicated by a RETURN statement in its program
code
• Can be invoked only from within stored procedures
or triggers
PL/SQL Basic Data Types
Triggers
• Database Trigger is a statement or set of statements
which would be executed due to the modification in
a database.
• The modification would mean Insertion, Deletion, or
Updating of records.
• Trigger: A trigger is a stored procedure in database
which automatically invokes whenever a special
event in the database occurs. For example, a trigger
can be invoked when a row is inserted into a
specified table or when certain table columns are
being updated.
Triggers…
• A database trigger is special stored procedure that
is run when specific actions occur within a
database. Most triggers are defined to run when
changes are made to a table’s data.
• Triggers help the database designer ensure certain
actions, such as maintaining an audit file, are
completed regardless of which program or user
makes changes to the data.
• The programs are called triggers since an event,
such as adding a record to a table, fires their
execution.
Triggers
Purpose of Triggers
• To maintain database integrity.
• To safeguard a database from inconsistency,
especially in a large database.
• To prevent invalid transactions.
• Syntax:
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
What do we need to have
a Trigger?
• Event-Condition-Action Model
We need to,
• Specify “When you need your trigger gets executed?”
• Specify “What actions to be taken while executing Triggers?” In other
words, “you need to specify the actions that need to be caused by the
execution of Trigger”.
• The above said requirements for having a trigger is called Event-
Condition-Action model.
• Event – The event which causes the Trigger to be executed
• Condition – The condition which need to be satisfied by the event to
trigger an action
• Action – The actual modification to be done on database due to the
Event and Condition.

Triggers
• Procedural SQL code automatically invoked by
RDBMS when given data manipulation event occurs
• Parts of a trigger definition
• Triggering timing - Indicates when trigger’s PL/SQL
code executes
• Triggering event - Statement that causes the trigger
to execute
• Triggering level - Statement- and row-level
• Triggering action - PL/SQL code enclosed between
the BEGIN and END keywords
BEFORE and AFTER of
Trigger
• BEFORE triggers run the trigger action before the
triggering statement is run.
• AFTER triggers run the trigger action after the
triggering statement is run.
• Example:
Given Student Report Database, in which student
marks assessment is recorded. In such schema,
create a trigger so that the total and average of
specified marks is automatically inserted whenever a
record is insert.
• Here, as trigger will invoke before record is inserted
so, BEFORE Tag can be used.
BEFORE and AFTER of
Trigger
• Suppose the database Schema –

create trigger stud_marks


before INSERT
on
Student
for each row
set Student.total = Student.subj1 + Student.subj2 + Student.subj3, Student.per =
Student.total * 60 / 100;
BEFORE and AFTER of
Trigger
• Above SQL statement will create a trigger in the
student database in which whenever subjects
marks are entered, before inserting this data into
the database, trigger will compute those two
values and insert with the entered values.
Embedded SQL
Embedded SQL
• This is a method for combining data manipulation
capabilities of SQL and computing power of any
programming language. Then embedded
statements are in line with the program source
code of the host language.
• Standard syntax to identify embedded SQL code
within the host language
• Standard syntax to identify host variables
• Communication area used to exchange status and
error information between SQL and host language
Embedded SQL
• EXEC SQL statement is used to identify embedded SQL
request to the preprocessor
EXEC SQL <embedded SQL statement >;
• Note: this varies by language:
• In some languages, like COBOL, the semicolon is replaced
with END-EXEC
• In Java embedding uses # SQL { …. };
• Before executing any SQL statements, the program
must first connect to the database. This is done using:
EXEC-SQL connect to server user user-name using
password;
• Here, server identifies the server to which a
connection is to be established.
Embedded SQL
• Variables of the host language can be used within
embedded SQL statements. They are preceded by a
colon (:) to distinguish from SQL variables
(e.g., :credit_amount )
• Variables used as above must be declared within
DECLARE section, as illustrated below. The syntax for
declaring the variables, however, follows the usual
host language syntax.
EXEC-SQL BEGIN DECLARE SECTION}
int credit-amount ;
EXEC-SQL END DECLARE SECTION;
Embedded SQL
To write an embedded SQL query, we use the
declare c cursor for <SQL query>
statement. The variable c is used to identify the query
Example:
From within a host language, find the ID and name of
students who have completed more than the number of
credits stored in variable credit_amount in the host langue
Specify the query in SQL as follows:
EXEC SQL
declare c cursor for
select ID, name
from student
where tot_cred > :credit_amount
END_EXEC
Embedded SQL
Example:
From within a host language, find the ID and name of
students who have completed more than the number of
credits stored in variable credit_amount in the host langue
Specify the query in SQL as follows:
EXEC SQL
declare c cursor for
select ID, name
from student
where tot_cred > :credit_amount
END_EXEC
The variable c (used in the cursor declaration) is used
to identify the query
Embedded SQL
• The open statement for our example is as follows:
EXEC SQL open c ;
• This statement causes the database system to execute
the query and to save the results within a temporary
relation. The query uses the value of the host language
variable credit-amount at the time the open statement
is executed.
• The fetch statement causes the values of one tuple in
the query result to be placed on host language
variables.
EXEC SQL fetch c into :si, :sn END_EXEC
• Repeated calls to fetch get successive tuples in the
query result
Embedded SQL
• The close statement causes the database system to
delete the temporary relation that holds the result of
the query.
EXEC SQL close c ;
Note: above details vary with language.
Embedded SQL
 Embedded SQL expressions for database modification
(update, insert, and delete)
 Can update tuples fetched by cursor by declaring that
the cursor is for update
EXEC SQL
declare c cursor for
select *
from instructor
where dept_name = ‘Music’
for update
 We then iterate through the tuples by performing
fetch operations on the cursor (as illustrated earlier),
and after fetching each tuple we execute the following
code:
update instructor
set salary = salary + 1000
where current of c
Need for Embedded SQL in
DBMS
• When you embed SQL with another language.The
language that is embedded is known as host
language and the SQL standard which defines the
embedding of SQL is known as embedded SQL.
• The result of a query is made available to the
program which is embedded as one tuple or record
at a time
Systems that Support
Embedded SQL
• Altibase
• Microsoft SQL Server
• Oracle Database
• PostgreSQL
• Raima Database Manager (RDM)
• SAP Sybase
File Organization, Record
Organization and Storage Access
File Organization
• A database consist of a huge amount of data. The data is grouped
within a table in RDBMS, and each table have related records. A user
can see that the data is stored in form of tables, but in actual this
huge amount of data is stored in physical memory in form of files.
• File – A file is named collection of related information that is recorded
on secondary storage such as magnetic disks, magnetic tables and
optical disks.
• What is File Organization?
File Organization refers to the logical relationships among various
records that constitute the file, particularly with respect to the means
of identification and access to any specific record.
• In simple terms, Storing the files in certain order is called file
Organization.
• File Structure refers to the format of the label and data blocks and of
any logical control record.
File Organization
• The File is a collection of records. Using the primary key, we can access
the records. The type and frequency of access can be determined by the
type of file organization which was used for a given set of records.
• File organization is a logical relationship among various records. This
method defines how file records are mapped onto disk blocks.
• File organization is used to describe the way in which the records are
stored in terms of blocks, and the blocks are placed on the storage
medium.
• The first approach to map the database to the file is to use the several
files and store only one fixed length record in any given file. An
alternative approach is to structure our files so that we can contain
multiple lengths for records.
• Files of fixed length records are easier to implement than the files of
variable length records.
File Organization

• Record id (rid): is sufficient to physically locate the page containing


the record on disk
• Indexes are data structures that allow us to find the record ids of
records with given values in index search key fields
• NOTE: Several uses of “keys” in a database
• Primary/foreign/candidate/super keys
• Index search keys
Objective of file organization
• It contains an optimal selection of records, i.e.,
records can be selected as fast as possible.
• To perform insert, delete or update transaction on
the records should be quick and easy.
• The duplicate records cannot be induced as a result of
insert, update or delete.
• For the minimal cost of storage, records should be
stored efficiently.
Types of file organization:

• File organization contains various methods. These particular methods


have pros and cons on the basis of access or selection. In the file
organization, the programmer decides the best-suited file
organization method according to his requirement.
1. Sequential File Organization –

• The easiest method for file Organization is Sequential method. In this


method the file are stored one after another in a sequential manner.
There are two ways to implement this method:
• Pile File Method – This method is quite simple, in which we store the
records in a sequence i.e one after other in the order in which they
are inserted into the tables.
Pile File Method

• Insertion of new record –


Let the R1, R3 and so on upto R5 and R4 be four records in the
sequence. Here, records are nothing but a row in any table. Suppose
a new record R2 has to be inserted in the sequence, then it is simply
placed at the end of the file.
Sorted File Method

• Sorted File Method –In this method, As the name itself suggest
whenever a new record has to be inserted, it is always inserted in a
sorted (ascending or descending) manner.
• Sorting of records may be based on any primary key or any other key.
Sorted File Method

• Insertion of new record –


Let us assume that there is a preexisting sorted sequence of four
records R1, R3, and so on upto R7 and R8. Suppose a new record R2
has to be inserted in the sequence, then it will be inserted at the end
of the file and then it will sort the sequence .
Pros and Cons of Sequential File
Organization
Pros
• Fast and efficient method for huge amount of data.
• Simple design.
• Files can be easily stored in magnetic tapes i.e cheaper storage
mechanism.
• Cons
• Time wastage as we cannot jump on a particular record that is
required, but we have to move in a sequential manner which takes
our time.
• Sorted file method is inefficient as it takes time and space for sorting
records.
2. Heap file organization
• It is the simplest and most basic type of organization. It works
with data blocks.
• In heap file organization, the records are inserted at the file's
end. When the records are inserted, it doesn't require the
sorting and ordering of records.
• When the data block is full, the new record is stored in some
other block. This new data block need not to be the very next
data block, but it can select any data block in the memory to
store new records. The heap file is also known as an unordered
file.
• In the file, every record has a unique id, and every page in a file
is of the same size. It is the DBMS responsibility to store and
manage the new records.
Insertion of new record –
Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a
new record R2 has to be inserted in the heap then, since the last data block i.e data
block 3 is full it will be inserted in any of the data blocks selected by the DBMS, lets
say data block 1.
Pros and Cons of Heap File
Organization
Pros
• Fetching and retrieving records is faster than sequential
record but only in case of small databases.
• When there is a huge number of data needs to be loaded
into the database at a time, then this method of file
Organization is best suited.
Cons
• Problem of unused memory blocks.
• Inefficient for larger databases.
3. Hashing
• Hashing is an efficient technique to directly search the
location of desired data on the disk without using
index structure.
• Data is stored at the data blocks whose address is
generated by using hash function. The memory
location where these records are stored is called as
data block or data bucket.
• Hash File Organization uses the computation of hash
function on some fields of the records. The hash
function's output determines the location of disk block
where the records are to be placed.
Hash File Organization :
• Data bucket – Data buckets are the memory locations where the
records are stored. These buckets are also considered as Unit Of
Storage.
• Hash Function – Hash function is a mapping function that maps all
the set of search keys to actual record address. Generally, hash
function uses primary key to generate the hash index – address of the
data block. Hash function can be simple mathematical function to any
complex mathematical function.
• Hash Index-The prefix of an entire hash value is taken as a hash index.
Every hash index has a depth value to signify how many bits are used
for computing a hash function. These bits can address 2n buckets.
When all these bits are consumed ? then the depth value is increased
linearly and twice the buckets are allocated.
• There are 2 types of Hash Static and dynamic. |Assignment
Hashing
Hashing
• When a record has to be received using the hash key columns,
then the address is generated, and the whole record is
retrieved using that address. In the same way, when a new
record has to be inserted, then the address is generated using
the hash key and record is directly inserted. The same process
is applied in the case of delete and update.
• In this method, there is no effort for searching and sorting the
entire file. In this method, each record will be stored randomly
in the memory.
4. B+ File Organization
• B+ tree file organization is the advanced method of an indexed
sequential access method. It uses a tree-like structure to store
records in File.
• It uses the same concept of key-index where the primary key is
used to sort the records. For each primary key, the value of the
index is generated and mapped with the record.
• The B+ tree is similar to a binary search tree (BST), but it can have
more than two children. In this method, all the records are stored
only at the leaf node. Intermediate nodes act as a pointer to the
leaf nodes. They do not contain any records.
4. B+ File Organization
• There is one root node of the tree, i.e., 25.
• There is an intermediary layer with nodes. They do not store
the actual record. They have only pointers to the leaf node.
• The nodes to the left of the root node contain the prior value
of the root and nodes to the right contain next value of the
root, i.e., 15 and 30 respectively.
• There is only one leaf node which has only values, i.e., 10, 12,
17, 20, 24, 27 and 29.
• Searching for any record is easier as all the leaf nodes are
balanced.
• In this method, searching any record can be traversed through
the single path and accessed easily.
Pros of B+ tree file organization
• In this method, searching becomes very easy as all the
records are stored only in the leaf nodes and sorted
the sequential linked list.
• Traversing through the tree structure is easier and
faster.
• The size of the B+ tree has no restrictions, so the
number of records can increase or decrease and the
B+ tree structure can also grow or shrink.
• It is a balanced tree structure, and any
insert/update/delete does not affect the performance
of tree.
• Cons of B+ tree file organization
• This method is inefficient for the static method.
5. Indexed sequential access
method (ISAM)
• ISAM method is an advanced sequential file organization. In this method,
records are stored in the file using the primary key. An index value is
generated for each primary key and mapped with the record. This index
contains the address of the record in the file.

• If any record has to be retrieved based on its index value, then the
address of the data block is fetched and the record is retrieved from the
memory.
Pros of ISAM:
• Each record has the address of its data block, searching a record
in a huge database is quick and easy.
• This method supports range retrieval and partial retrieval of
records. Since the index is based on the primary key values, we
can retrieve the data for the given range of value. In the same
way, the partial value can also be easily searched, i.e., the student
name starting with 'JA' can be easily searched.
Cons of ISAM
• This method requires extra space in the disk to store the index
value.
• When the new records are inserted, then these files have to be
reconstructed to maintain the sequence.
• When the record is deleted, then the space used by it needs to be
released. Otherwise, the performance of the database will slow
down.
6. Cluster file organization
• When the two or more records are stored in the same
file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes
which are used to map these tables together are stored
only once.
• This method reduces the cost of searching for various
records in different files.
• The cluster file organization is used when there is a
frequent need for joining the tables with the same
condition. These joins will give only a few records from
both tables. In the given example, we are retrieving the
record for only particular departments. This method can't
be used to retrieve the record for the entire department.
• In this method, we can directly insert, update or delete any record. Data is sorted
based on the key with which searching is done. Cluster key is a type of key with
which joining of the table is performed.
Types of Cluster file organization:
1. Indexed Clusters:
• In indexed cluster, records are grouped based on the
cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an
example of an indexed cluster. Here, all the records
are grouped based on the cluster key- DEP_ID and all
the records are grouped.
2. Hash Clusters:
• It is similar to the indexed cluster. In hash cluster,
instead of storing the records based on the cluster
key, we generate the value of the hash key for the
cluster key and store the records with the same hash
key value.
Pros of Cluster file organization
• The cluster file organization is used when there is a
frequent request for joining the tables with same
joining condition.
• It provides the efficient result when there is a 1:M
mapping between the tables.
Cons of Cluster file organization
• This method has the low performance for the very large
database.
• If there is any change in joining condition, then this
method cannot use. If we change the condition of
joining then traversing the file takes a lot of time.
• This method is not suitable for a table with a 1:1
condition.
Guidelines for the use of Cluster
tables
• Consider clustering tables when tables are often accessed in join statements.
• Do not cluster tables if they are joined only occasionally or their common column
values are modified frequently. (Modifying a row's cluster key value takes longer than
modifying the value in an unclustered table, because Oracle may have to migrate the
modified row to another block to maintain the cluster.)
• Do not cluster table if a full search of one of the tables is often required. (A full
search of a clustered table can take longer than a full search of an unclustered table.
Oracle is likely to read more blocks because the tables are stored together.)
• Consider clustering tables involved in a one-to-many (1: M) relationship if a row is
often selected from the parent table and then the corresponding rows from the child
table. (Child rows are stored in the same data block(s) as the parent row, so they are
likely to be in memory when selected, requiring Oracle to perform less I/O.)
• Do not cluster tables of the data from all tables with the same cluster key value
exceeds more than one or two Oracle blocks. (To access a row in a clustered table.
Oracle reads all blocks containing rows with that value. If these rows occupy multiple
blocks, accessing a single row could require more reads than accessing the same row
in an unclustered table.)
Indexing
• Indexing is a way to optimize the performance of a database by
minimizing the number of disk accesses required when a query is
processed. It is a data structure technique which is used to quickly
locate and access the data in a database.
• Indexes are created using a few database columns.
• The first column is the Search key that contains a copy of the primary
key or candidate key of the table. These values are stored in sorted
order so that the corresponding data can be accessed quickly.
Note: The data may or may not be stored in sorted order.
• The second column is the Data Reference or Pointer which contains a
set of pointers holding the address of the disk block where that
particular key value can be found.
Index structure:
• Indexes can be created using some database columns.

search-key pointer
• The first column of the database is the search key that
contains a copy of the primary key or candidate key of the
table. The values of the primary key are stored in sorted order
so that the corresponding data can be accessed easily.
• The second column of the database is the data reference. It
contains a set of pointers holding the address of the disk block
where the value of the particular key can be found.
Indexing Methods
1. Ordered indices
• The indices are usually sorted to make searching faster. The
indices which are sorted are known as ordered indices.
• Example: Suppose we have an employee table with thousands
of record and each of which is 10 bytes long. If their IDs start
with 1, 2, 3....and so on and we have to search student with
ID-543.
• In the case of a database with no index, we have to search the
disk block from starting till it reaches 543. The DBMS will read
the record after reading 543*10=5430 bytes.
• In the case of an index, we will search using indexes and the
DBMS will read the record after reading 542*2= 1084 bytes
which are very less compared to the previous case.
2. Primary Index
• If the index is created on the basis of the primary key
of the table, then it is known as primary indexing.
• These primary keys are unique to each record and
contain 1:1 relation between the records.
• As primary keys are stored in sorted order, the
performance of the searching operation is quite
efficient.
• The primary index can be classified into two types:
Dense index and Sparse index.
2.1 . Dense index
• The dense index contains an index record for every
search key value in the data file. It makes searching
faster.
• In this, the number of records in the index table is
same as the number of records in the main table.
• It needs more space to store index record itself. The
index records have the search key and a pointer to
the actual record on the disk.

2.2. Sparse index
• In the data file, index record appears only for a few
items. Each item points to a block.
• In this, instead of pointing to each record in the main
table, the index points to the records in the main
table in a gap.
3. Clustering Index
• A clustered index can be defined as an ordered data file.
Sometimes the index is created on non-primary key columns
which may not be unique for each record.
• In this case, to identify the record faster, we will group two or
more columns to get the unique value and create index out of
them. This method is called a clustering index.
• The records which have similar characteristics are grouped, and
indexes are created for these group.
• Example: suppose a company contains several employees in
each department. Suppose we use a clustering index, where all
employees which belong to the same Dept_ID are considered
within a single cluster, and index pointers point to the cluster as
a whole. Here Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by
records which belong to the different cluster. If we use separate disk block for
separate clusters, then it is called better technique.
4. Secondary Index
• In the sparse indexing, as the size of the table grows, the size of
mapping also grows. These mappings are usually kept in the
primary memory so that address fetch should be faster. Then the
secondary memory searches the actual data based on the
address got from mapping. If the mapping size grows then
fetching the address itself becomes slower. In this case, the
sparse index will not be efficient. To overcome this problem,
secondary indexing is introduced.
• In secondary indexing, to reduce the size of mapping, another
level of indexing is introduced. In this method, the huge range for
the columns is selected initially so that the mapping size of the
first level becomes small. Then each range is further divided into
smaller ranges. The mapping of the first level is stored in the
primary memory, so that address fetch is faster. The mapping of
the second level and actual data are stored in the secondary
memory (hard disk).
• For example:
• If you want to find the record of roll 111 in the diagram, then it will
search the highest entry which is smaller than or equal to 111 in the
first level index. It will get 100 at this level.
• Then in the second index level, again it does max (111) <= 111 and
gets 110. Now using the address 110, it goes to the data block and
starts searching each record till it gets 111.
• This is how a search is performed in this method. Inserting, updating
or deleting is also done in the same manner.
End of Chapter
2

You might also like