ADB - CH2 - Advanced SQL
ADB - CH2 - Advanced SQL
1
Outline
Features and basic architecture
Database Design and Querying Tools
Storage and Indexing
Query Processing, Evaluation and Optimization
Assertion and views
Cursors, triggers and stored procedures
Embedded SQL, dynamic SQL, SQLJ,
SQL Variations and Extensions
Transaction Management
2
Features and basic architecture
⚫ The design of a DBMS depends on its architecture. It can be centralized
or decentralized or hierarchical.
⚫ The architecture of a DBMS can be seen as either single tier or multi-tier.
⚫ An n-tier architecture divides the whole system into related but
independent n modules, which can be independently modified, altered,
changed, or replaced.
⚫ In 1-tier architecture, the DBMS is the only entity where the user
directly sits on the DBMS and uses it.
⚫ Any changes done here will directly be done on the DBMS itself.
⚫ It does not provide handy tools for end-users.
⚫ Database designers and programmers normally prefer to use single-tier
architecture.
3
Cont`d…
4
3-tier Architecture
Sparse Index
In sparse index, index records are not created for every search key. An index
record here contains a search key and an actual pointer to the data on the disk.
To search a record, we first proceed by index record
and reach at the actual location of the data. If the
data we are looking for is not where we directly
reach by following the index, then the system starts
sequential search until the desired data is found.
Cont`d…..
Multilevel Index
Index records comprise search-key values and
data pointers. Multilevel index is stored on the disk
along with the actual database files.
As the size of the database grows, so does the
size of the indices. There is an immense need to
keep the index records in the main memory so as
to speed up the search operations. If single-level
index is used, then a large size index cannot be same
kept in memory which leads to multiple disk
accesses.
Scanner identifies the query tokens such as SQL keywords, attribute names, and
relation names.
Parser checks the query syntax to determine whether it is formulated according to the
syntax rules of the query language. The primary job of the parser is to extract the
tokens from the raw string of characters and translate them into the corresponding
internal data elements (i.e. relational algebra operations and operands) and structures
(i.e. query tree, query graph).
The query must also be validated by checking that all attribute and relation names are
valid and semantically meaningful names in the schema of the particular database
being queried. An internal representation of the query is then created, usually as a tree
data structure called a query tree. It is also possible to represent the query using a
graph data structure called a query graph.
Phases of Query Processing … cont`d
The basic set of operations for the relational model is known as the relational
algebra. The algebra operations thus produce new relations, which can be further
manipulated using operations of the same algebra.
A sequence of relational algebra operations forms a relational algebra expression,
whose result will also be a relation that represents the result of a database query (or
retrieval request).
Relational algebra is a theoretical language with operations that work on one or
more relations to define another relation without changing the original relation.
The output from one operation can become the input to another operation (nesting
is possible)
Selection
Selects certain attributes while discarding the other from the base relation.
The PROJECT creates a vertical partitioning – one with the needed columns
(attributes) containing results of the operation.
Deletes attributes that are not in projection list.
UNION and INTERSECTION Operation
Union operation, denoted by R U S, is a relation that includes all tuples that are either
in R or in S or in both R and S. Duplicate tuples are eliminated.
Intersection denoted by R ∩ S, is a relation that includes all tuples that are in both R
and S.
CARTESIAN (cross product) Operation
This operation is used to combine tuples from two relations in a combinatorial fashion.
That means, every tuple in Relation1(R) one will be related with every other tuple in
Relation2 (S).
The resulting relation Q has one tuple for each combination of tuples—one from R and
one from S.
Hence, if R has n tuples, and S has m tuples, then | R x S | will have n* m tuples.
JOIN Operation
The sequence of Cartesian product followed by select is used quite commonly to identify
and select related tuples from two relations, a special operation, called JOIN. Thus in
JOIN operation, the Cartesian Operation and the Selection Operations are used together.
This operation is very important for any relational database with more than a single
relation, because it allows us to process relationships among relations.
This type of JOIN is called a THETA JOIN (θ - JOIN)
Where θ is the logical operator used in the join condition.
θ Could be { <, ≤ , >, ≥, ≠, = }
Example:
Thus in the above example we want to extract employee information about managers of the
departments, the algebra query using the JOIN operation will be.
JOIN Operation … example
Translating SQL Queries into Relational Algebra
SQL query is first translated into an equivalent extended relational algebra expression
that is represented as a query tree data structure. It is then optimized. Typically, SQL
queries are decomposed into query blocks, which form the basic units that can be
translated into the algebraic operators and optimized.
Example of translating SQL queries into relation Algebra 1: Consider the following SQL query on the
EMPLOYEE relation based on the above schema diagram for company relational database schema.
where c represents the result returned from the inner block. The inner block could
be translated into the following extended relational algebra expression:
The query optimizer would then choose an execution plan for each query block.
Notice that in the above example, the inner block needs to be evaluated only once to
produce the maximum salary of employees in department 5, which is then used as the
constant c by the outer block.
Consider the following SQL query: This can be translated into
either of the following relational algebra expressions, query
graphs or query tree:
select make
from vehicles
where make = “Ford”
Which can also be represented as either of the following query trees:
query trees
query graph
Heuristics Based Query Optimization
Heuristic rules that help to modify the internal representation of a query, which is usually in the form
of a query tree and a query graph data structure to improve its expected performance.
A query tree is a tree data structure that corresponds to a relational algebra expression. It represents
the input relations of the query as leaf nodes of the tree and represents the relational algebra
operations as internal nodes.
The order of execution of operations starts at the leaf nodes, which represents the input database
relations for the query, and ends at the root node, which represents the final operation of the query.
The execution terminates when the root node operation is executed and produces the result relation
for the query.
The scanner and parser of an SQL query first generate a data structure that corresponds to an initial
query representation, which is then optimized according to heuristic rules.
One of the main heuristic rules is to apply SELECT and PROJECT operations before applying the
JOIN or other binary operations, because the size of the file resulting from a binary operation such as
JOIN is usually a multiplicative function of the sizes of the input files. The SELECT and PROJECT
operations reduce the size of a file and hence should be applied before a join or other binary operation.
Transformation of Relational
Expressions
The heuristic query optimizer must include rules for equivalence among relational
algebra expressions to transform the initial tree into the final, optimized query tree.
The heuristic query optimizer will transform this initial query tree into an equivalent
final query tree that is efficient to execute.
Example of Transforming a Query 1. Consider the following query Q on the database in Figure
: Find the last names of employees born after 1957 who work on a project named .‘Aquarius’.
This query can be specified in SQL as follows:
The initial query tree for Q is shown in step (a). Executing this tree directly first creates a very
large file containing the CARTESIAN PRODUCT of the entire EMPLOYEE, WORKS_ON,
and PROJECT files. That is why the initial query tree is never executed but is transformed into
another equivalent tree that is efficient to execute. This particular query needs only one record
from the PROJECT relation for the ‘Aquarius’ project and only the EMPLOYEE records for
those whose date of birth is after ‘1957-12-31’.
Steps in converting a query tree during heuristic optimization.
1. Initial (canonical) query tree for SQL query Q.
2. Moving SELECT operations down the query tree to reduce the number of attributes.
3. Applying the more restrictive SELECT operation first. This is done by reordering the leaf nodes of the
tree among themselves and adjusting the rest of the tree appropriately.)
4. Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.
5. Moving PROJECT operations down the query tree.
(a). Initial (canonical) query tree for SQL query Q.
Home take exercise: Show the query transformation process for the above example .
Assertions
As
Select * from stud_course where coursetitle=@course