CH 2
CH 2
Advanced SQL
Learning Objectives
• In this chapter, the student will learn:
• Revision on what the SQL means,
• How to use the advanced SQL JOIN operator syntax,
• About the different types of subqueries and correlated
queries,
• How to use SQL functions to manipulate dates, strings, and
other data,
• About the relational set operators UNION, UNION
ALL, INTERSECT, and MINUS.
• How does a DBMS organize files?
• What is an index?
• What are different types of indexes?
What is SQL?
• SQL is Structured Query Language, which is a computer
language for storing, manipulating and retrieving data
stored in a relational database.
• SQL is the standard language for Relational Database
System.
• All the Relational Database Management Systems
(RDMS) like MySQL, MS Access, Oracle, Sybase,
Informix, Postgres and SQL Server use SQL as their
standard database language. Also, they are using
different dialects, such as:
• MS SQL Server using T-SQL,
• Oracle using PL/SQL,
• MS Access version of SQL is called JET SQL (native
format) etc.
Why SQL?
SQL is widely popular because it offers the following
advantages:
•Allows users to access data in the relational database
management systems.
•Allows users to describe the data.
•Allows users to define the data in a database and
manipulate that data.
•Allows to embed within other languages using SQL
modules, libraries & pre-compilers.
•Allows users to create and drop databases and tables.
•Allows users to create view, stored procedure, functions in
a database.
•Allows users to set permissions on tables, procedures and
views.
A Brief History of SQL
• 1970 – Dr. Edgar F. "Ted" Codd of IBM is known as the
father of relational databases. He described a
relational model for databases.
• 1974 – Structured Query Language appeared.
• 1978 – IBM worked to develop Codd's ideas and
released a product named System/R.
• 1986 – IBM developed the first prototype of
relational database and standardized by ANSI. The
first relational database was released by Relational
Software which later came to be known as Oracle.
What is RDBMS?
• RDBMS stands for Relational Database Management
System.
• RDBMS is the basis for SQL, and for all modern database
systems like MS SQL Server, IBM DB2, Oracle, MySQL, and
Microsoft Access.
• A Relational database management system (RDBMS) is a
database management system (DBMS) that is based on the
relational model as introduced by E. F. Codd.
What is a table?
• The data in an RDBMS is stored in database objects which
are called as tables. This table is basically a collection of
related data entries and it consists of numerous columns
and rows.
• Remember, a table is the most common and simplest form
of data storage in a relational database. The following
program is an example of a CUSTOMERS table:
What is RDBMS?
SQL Process
• When you are executing an SQL command for any
RDBMS, the system determines the best way to carry
out your request and SQL engine figures out how to
interpret the task. There are various components
included in this process. These components are –
Query Dispatcher
Optimization Engines
Classic Query Engine
SQL Query Engine, etc.
• A classic query engine handles all the non-SQL
queries, but a SQL query engine won't handle logical
files.
SQL Process
• The full definition of an SQL query engine is a piece of
software that Recognizes and interprets
the SQL language. Implements data access, both
reading and writing, for a relational database, in a way
that can be controlled by a user's SQL queries
• optimization Engine Query optimization is a feature
of many relational database management systems.
The query optimizer attempts to determine the most
efficient way to execute a given query by considering
the possible query plans
• Query dispatcher. The function of the dispatcher is to
route the query request to either CQE or SQE,
depending on the attributes of the query.
All queries are processed by the dispatcher.
SQL Architecture:|
Assignment
SQL Commands
• The standard SQL commands to interact with
relational databases are CREATE, SELECT, INSERT,
UPDATE, DELETE and DROP.
• These commands can be classified into the following
groups based on their nature:
• DDL - Data Definition Language
Command Description
Creates a new table, a
view of a table, or other
CREATE
object in the
database.
Modifies an existing
ALTER database object, such as a
table.
Deletes an entire table, a
view of a table or other
DROP
objects in the
database.
SQL Commands
• DML - Data Manipulation Language
Command Description
Retrieves certain records
SELECT
from one or more tables.
INSERT Creates a record.
UPDATE Modifies records.
DELETE Deletes records.
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Sub queries and
Correlated Queries
• Subquery is a query inside another query
Sub query can return:
• One single value - One column and one row
• A list of values - One column and multiple rows
• A virtual table - Multicolumn, multirow set of values
• No value - Output of the outer query might result in
an error or a null empty set
WHERE Sub queries
• Uses inner SELECT subquery on the right side of a
WHERE comparison expression
• Value generated by the subquery must be of a
comparable data type
• If the query returns more than a single value, the
DBMS will generate an error
• Can be used in combination with joins
IN subqueries
• Used to compare a single attribute to a list of values
HAVING subqueries
• HAVING clause is used to specify a search condition for a
group or an aggregate.
• Having is used in a GROUP BY clause. If you are not using
GROUP BY clause then you can use HAVING function like a
WHERE clause.
Syntax
• HAVING subquery • IN subquery
SELECT column-names SELECT column-names
FROM table-name FROM table-name1
WHERE condition WHERE value IN (SELECT column-name
GROUP BY column-names FROM table-name2
HAVING condition WHERE condition)
Eg: Eg: SELECT ProductName
FROM Product
WHERE Id IN (SELECT ProductId
FROM OrderItem
WHERE Quantity > 100)
Multirow Subquery
Operators:
ANY and ALL
•ALL operator: Allows comparison of a single value with a list
of values returned by the first subquery Uses a comparison
operator other than equals.
•ANY operator: Allows comparison of a single value to a list of
values and selects only the rows for which the value is greater
than or less than any value in the list.
FROM clause:
•Specifies the tables from which the data will be
drawn
•Can use SELECT subquery
Syntax
• ANY • ALL
SELECT column-names SELECT column-names
FROM table-name FROM table-name
WHERE column-name operator ANY WHERE column-name operator ALL
(SELECT column-name (SELECT column-name
FROM table-name FROM table-name
WHERE condition) WHERE condition)
• Eg: Eg:
Attribute List
Subqueries
• SELECT statement uses attribute list to indicate what
columns to project in the resulting set:
• Inline subquery
• Subquery expression included in the attribute list that must
return one value
• Column alias cannot be used in attribute list computation if
alias is defined in the same attribute list
Correlated Subquery| Assignment
• Executes once for each row in the outer query
• Inner query references a column of the outer
subquery
• Can be used with the EXISTS special operato
Syntax
Subqueries with the SELECT Statement • Subqueries with the UPDATE
SELECT column_name [, column_name ] Statement
FROM table1 [, table2 ] SET column_name = new_value
WHERE column_name OPERATOR [ WHERE OPERATOR [ VALUE ]
(SELECT column_name [, (SELECT COLUMN_NAME FROM
column_name ] TABLE_NAME)
FROM table1 [, table2 ] [ WHERE) ]
[WHERE]) • Subqueries with the DELETE
Statement
•Subqueries with the INSERT
Statement DELETE FROM TABLE_NAME
INSERT INTO table_name [ (column1 [, [ WHERE OPERATOR [ VALUE ]
column2 ]) ] (SELECT COLUMN_NAME FROM
SELECT [ *|column1 [, column2 ] FROM TABLE_NAME)
table1 [, table2 ] [ WHERE) ]
[ WHERE VALUE OPERATOR ]
Functions and Procedures
• SQL:1999 supports functions and procedures
• Functions/procedures can be written in SQL itself, or in an
external programming language (e.g., C, Java).
• Functions written in an external languages are particularly useful
with specialized data types such as images and geometric
objects.
• Example: functions to check if polygons overlap, or to compare images
for similarity.
• Some database systems support table-valued functions, which
can return a relation as a result.
• SQL:1999 also supports a rich set of imperative constructs,
including
• Loops, if-then-else, assignment
• Many databases have proprietary procedural extensions to
SQL that differ from SQL:1999.
SQL Functions
• A function is a predefined formula which takes one
or more arguments as input then process the
arguments and returns an output.
• Functions always use a numerical, date, or string
value
• Value may be part of a command or may be an
attribute located in a table
• Function may appear anywhere in an SQL
statement where a value or an attribute can be
used
SQL Functions
• Define a function that, given the name of a
department, returns the count of the number of
instructors in that department.
create function dept_count (dept_name varchar(20))
returns integer
begin
declare d_count integer;
select count (* ) into d_count
from instructor
where instructor.dept_name = dept_name
return d_count;
end
• Usage
select *
from table (instructor_of (‘Music’))
SQL Join Operators
SQL Set Operation |
Assignment
• The SQL Set operation is used to combine the two
or more SQL SELECT statements.
Types of Set Operation
• Union
• UnionAll
• Intersect
• Minus
Procedural SQL
• PL/SQL is a combination of SQL along with the
procedural features of programming languages. It
was developed by Oracle Corporation in the early
90's to enhance the capabilities of SQL.
• Performs a conditional or looping operation by
isolating critical code and making all application
programs call the shared code
• Yields better maintenance and logic control
• Standard SQL statements
• PL/SQL is one of three key programming languages
embedded in the Oracle Database, along with SQL
itself and Java.
Features of PL/SQL
• PL/SQL is tightly integrated with SQL.
• It offers extensive error checking.
• It offers numerous data types.
• It offers a variety of programming structures.
• It supports structured programming through
functions and procedures.
• It supports object-oriented programming.
• It supports the development of web applications
and server pages.
Advantages of PL/SQL
• SQL is the standard database language and PL/SQL is strongly integrated with
SQL. PL/SQL supports both static and dynamic SQL.
• Static SQL supports DML operations and transaction control from PL/SQL block.
In Dynamic SQL, SQL allows embedding DDL statements in PL/SQL blocks.
• PL/SQL allows sending an entire block of statements to the database at one
time. This reduces network traffic and provides high performance for the
applications.
• PL/SQL gives high productivity to programmers as it can query, transform, and
update data in a database.
• PL/SQL saves time on design and debugging by strong features, such as
exception handling, encapsulation, data hiding, and object-oriented data types.
• Applications written in PL/SQL are fully portable.
• PL/SQL provides high security level.
• PL/SQL provides access to predefined SQL packages.
• PL/SQL provides support for Object-Oriented Programming.
• PL/SQL provides support for developing Web Applications and Server Pages.
PL/SQL Stored Functions
• Stored function: Named group of procedural and
SQL statements that returns a value
• As indicated by a RETURN statement in its program
code
• Can be invoked only from within stored procedures
or triggers
PL/SQL Basic Data Types
Triggers
• Database Trigger is a statement or set of statements
which would be executed due to the modification in
a database.
• The modification would mean Insertion, Deletion, or
Updating of records.
• Trigger: A trigger is a stored procedure in database
which automatically invokes whenever a special
event in the database occurs. For example, a trigger
can be invoked when a row is inserted into a
specified table or when certain table columns are
being updated.
Triggers…
• A database trigger is special stored procedure that
is run when specific actions occur within a
database. Most triggers are defined to run when
changes are made to a table’s data.
• Triggers help the database designer ensure certain
actions, such as maintaining an audit file, are
completed regardless of which program or user
makes changes to the data.
• The programs are called triggers since an event,
such as adding a record to a table, fires their
execution.
Triggers
Purpose of Triggers
• To maintain database integrity.
• To safeguard a database from inconsistency,
especially in a large database.
• To prevent invalid transactions.
• Syntax:
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
What do we need to have
a Trigger?
• Event-Condition-Action Model
We need to,
• Specify “When you need your trigger gets executed?”
• Specify “What actions to be taken while executing Triggers?” In other
words, “you need to specify the actions that need to be caused by the
execution of Trigger”.
• The above said requirements for having a trigger is called Event-
Condition-Action model.
• Event – The event which causes the Trigger to be executed
• Condition – The condition which need to be satisfied by the event to
trigger an action
• Action – The actual modification to be done on database due to the
Event and Condition.
•
Triggers
• Procedural SQL code automatically invoked by
RDBMS when given data manipulation event occurs
• Parts of a trigger definition
• Triggering timing - Indicates when trigger’s PL/SQL
code executes
• Triggering event - Statement that causes the trigger
to execute
• Triggering level - Statement- and row-level
• Triggering action - PL/SQL code enclosed between
the BEGIN and END keywords
BEFORE and AFTER of
Trigger
• BEFORE triggers run the trigger action before the
triggering statement is run.
• AFTER triggers run the trigger action after the
triggering statement is run.
• Example:
Given Student Report Database, in which student
marks assessment is recorded. In such schema,
create a trigger so that the total and average of
specified marks is automatically inserted whenever a
record is insert.
• Here, as trigger will invoke before record is inserted
so, BEFORE Tag can be used.
BEFORE and AFTER of
Trigger
• Suppose the database Schema –
• Sorted File Method –In this method, As the name itself suggest
whenever a new record has to be inserted, it is always inserted in a
sorted (ascending or descending) manner.
• Sorting of records may be based on any primary key or any other key.
Sorted File Method
• If any record has to be retrieved based on its index value, then the
address of the data block is fetched and the record is retrieved from the
memory.
Pros of ISAM:
• Each record has the address of its data block, searching a record
in a huge database is quick and easy.
• This method supports range retrieval and partial retrieval of
records. Since the index is based on the primary key values, we
can retrieve the data for the given range of value. In the same
way, the partial value can also be easily searched, i.e., the student
name starting with 'JA' can be easily searched.
Cons of ISAM
• This method requires extra space in the disk to store the index
value.
• When the new records are inserted, then these files have to be
reconstructed to maintain the sequence.
• When the record is deleted, then the space used by it needs to be
released. Otherwise, the performance of the database will slow
down.
6. Cluster file organization
• When the two or more records are stored in the same
file, it is known as clusters. These files will have two or
more tables in the same data block, and key attributes
which are used to map these tables together are stored
only once.
• This method reduces the cost of searching for various
records in different files.
• The cluster file organization is used when there is a
frequent need for joining the tables with the same
condition. These joins will give only a few records from
both tables. In the given example, we are retrieving the
record for only particular departments. This method can't
be used to retrieve the record for the entire department.
• In this method, we can directly insert, update or delete any record. Data is sorted
based on the key with which searching is done. Cluster key is a type of key with
which joining of the table is performed.
Types of Cluster file organization:
1. Indexed Clusters:
• In indexed cluster, records are grouped based on the
cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an
example of an indexed cluster. Here, all the records
are grouped based on the cluster key- DEP_ID and all
the records are grouped.
2. Hash Clusters:
• It is similar to the indexed cluster. In hash cluster,
instead of storing the records based on the cluster
key, we generate the value of the hash key for the
cluster key and store the records with the same hash
key value.
Pros of Cluster file organization
• The cluster file organization is used when there is a
frequent request for joining the tables with same
joining condition.
• It provides the efficient result when there is a 1:M
mapping between the tables.
Cons of Cluster file organization
• This method has the low performance for the very large
database.
• If there is any change in joining condition, then this
method cannot use. If we change the condition of
joining then traversing the file takes a lot of time.
• This method is not suitable for a table with a 1:1
condition.
Guidelines for the use of Cluster
tables
• Consider clustering tables when tables are often accessed in join statements.
• Do not cluster tables if they are joined only occasionally or their common column
values are modified frequently. (Modifying a row's cluster key value takes longer than
modifying the value in an unclustered table, because Oracle may have to migrate the
modified row to another block to maintain the cluster.)
• Do not cluster table if a full search of one of the tables is often required. (A full
search of a clustered table can take longer than a full search of an unclustered table.
Oracle is likely to read more blocks because the tables are stored together.)
• Consider clustering tables involved in a one-to-many (1: M) relationship if a row is
often selected from the parent table and then the corresponding rows from the child
table. (Child rows are stored in the same data block(s) as the parent row, so they are
likely to be in memory when selected, requiring Oracle to perform less I/O.)
• Do not cluster tables of the data from all tables with the same cluster key value
exceeds more than one or two Oracle blocks. (To access a row in a clustered table.
Oracle reads all blocks containing rows with that value. If these rows occupy multiple
blocks, accessing a single row could require more reads than accessing the same row
in an unclustered table.)
Indexing
• Indexing is a way to optimize the performance of a database by
minimizing the number of disk accesses required when a query is
processed. It is a data structure technique which is used to quickly
locate and access the data in a database.
• Indexes are created using a few database columns.
• The first column is the Search key that contains a copy of the primary
key or candidate key of the table. These values are stored in sorted
order so that the corresponding data can be accessed quickly.
Note: The data may or may not be stored in sorted order.
• The second column is the Data Reference or Pointer which contains a
set of pointers holding the address of the disk block where that
particular key value can be found.
Index structure:
• Indexes can be created using some database columns.
search-key pointer
• The first column of the database is the search key that
contains a copy of the primary key or candidate key of the
table. The values of the primary key are stored in sorted order
so that the corresponding data can be accessed easily.
• The second column of the database is the data reference. It
contains a set of pointers holding the address of the disk block
where the value of the particular key can be found.
Indexing Methods
1. Ordered indices
• The indices are usually sorted to make searching faster. The
indices which are sorted are known as ordered indices.
• Example: Suppose we have an employee table with thousands
of record and each of which is 10 bytes long. If their IDs start
with 1, 2, 3....and so on and we have to search student with
ID-543.
• In the case of a database with no index, we have to search the
disk block from starting till it reaches 543. The DBMS will read
the record after reading 543*10=5430 bytes.
• In the case of an index, we will search using indexes and the
DBMS will read the record after reading 542*2= 1084 bytes
which are very less compared to the previous case.
2. Primary Index
• If the index is created on the basis of the primary key
of the table, then it is known as primary indexing.
• These primary keys are unique to each record and
contain 1:1 relation between the records.
• As primary keys are stored in sorted order, the
performance of the searching operation is quite
efficient.
• The primary index can be classified into two types:
Dense index and Sparse index.
2.1 . Dense index
• The dense index contains an index record for every
search key value in the data file. It makes searching
faster.
• In this, the number of records in the index table is
same as the number of records in the main table.
• It needs more space to store index record itself. The
index records have the search key and a pointer to
the actual record on the disk.
•
2.2. Sparse index
• In the data file, index record appears only for a few
items. Each item points to a block.
• In this, instead of pointing to each record in the main
table, the index points to the records in the main
table in a gap.
3. Clustering Index
• A clustered index can be defined as an ordered data file.
Sometimes the index is created on non-primary key columns
which may not be unique for each record.
• In this case, to identify the record faster, we will group two or
more columns to get the unique value and create index out of
them. This method is called a clustering index.
• The records which have similar characteristics are grouped, and
indexes are created for these group.
• Example: suppose a company contains several employees in
each department. Suppose we use a clustering index, where all
employees which belong to the same Dept_ID are considered
within a single cluster, and index pointers point to the cluster as
a whole. Here Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by
records which belong to the different cluster. If we use separate disk block for
separate clusters, then it is called better technique.
4. Secondary Index
• In the sparse indexing, as the size of the table grows, the size of
mapping also grows. These mappings are usually kept in the
primary memory so that address fetch should be faster. Then the
secondary memory searches the actual data based on the
address got from mapping. If the mapping size grows then
fetching the address itself becomes slower. In this case, the
sparse index will not be efficient. To overcome this problem,
secondary indexing is introduced.
• In secondary indexing, to reduce the size of mapping, another
level of indexing is introduced. In this method, the huge range for
the columns is selected initially so that the mapping size of the
first level becomes small. Then each range is further divided into
smaller ranges. The mapping of the first level is stored in the
primary memory, so that address fetch is faster. The mapping of
the second level and actual data are stored in the secondary
memory (hard disk).
• For example:
• If you want to find the record of roll 111 in the diagram, then it will
search the highest entry which is smaller than or equal to 111 in the
first level index. It will get 100 at this level.
• Then in the second index level, again it does max (111) <= 111 and
gets 110. Now using the address 110, it goes to the data block and
starts searching each record till it gets 111.
• This is how a search is performed in this method. Inserting, updating
or deleting is also done in the same manner.
End of Chapter
2