Advanced DB Chapter One
Advanced DB Chapter One
processing.
• History of database
• 1960s: Computers become cost effective for private
companies, and storage capacity increases.
• 1970-72: E. F. Codd proposes the relational model for
databases, disconnecting the logical organization from the
physical storage.
• 1976: P. Chen proposes the entity relationship model (ERM)
for database design.
04/09/2021
• Early 1980s: The first commercially-available relational
database systems start to appear at the beginning of the
1980s with Oracle Version 2.
04/09/2021
• Mid-1990s: Kaboom! The usable Internet/World Wide Web
(WWW) appears. The database born in 1995.
• Late 1990s: The large investment in Internet companies helps
create a tools-market boom for Web/Internet/DB connectors.
• Early 21st century: Solid growth of DB applications continues.
Examples: commercial websites (yahoo.com, amazon.com,
google.com), government systems (Bureau of Citizenship and
Immigration Services, Bureau of the Census), art museums,
hospitals, schools, etc.
04/09/2021
Advanced SQL query
SQL is also used to populate, access, and manipulate the data within the relational database.
• The four main categories of SQL
1. DML (Data Manipulation Language)
4
2. DDL (Data Definition Language)
3. DCL (Data Control Language)
4. TCL (Transaction Control Language)
04/09/2021
Categories of SQL Statements
• Data manipulation language (DML)
– DML statements begin with INSERT, UPDATE, DELETE, or MERGE and are
used to modify the table data by entering new rows, changing existing
rows, or removing existing rows.
04/09/2021
Cont…
• Transaction control language (TCL)
– TCL statements are used to manage the changes made by
DML statements.
– Changes to the data are executed using COMMIT,ROLLBACK,
and SAVEPOINT.
– TCL changes can be grouped together into logical
transactions.
• Data control language (DCL)
– DCL keywords GRANT and REVOKE are used to give or
remove access rights to the database and the structures
within it.
04/09/2021
Query processing
• Query languages are used to make queries in a database, and
Structured Query Language (SQL) is the standard.
• Refers to the range of activities involved in extracting data from a
database.
• The activities include translation of queries in high-level database
languages into expressions that can be used at the physical level of
the file system, a variety of query-optimizing transformations, and
actual evaluation of queries
• The aim of query processing is to find information in one or more
databases and deliver it to the user quickly and efficiently.
04/09/2021
Cont.…..
• The steps involved in processing a query appear. The basic steps are:
1. Parsing and translation.
2. Optimization.
3. Evaluation.
04/09/2021
• Before query processing can begin, the system must translate the query
into a usable form.
• A language such as SQL is suitable for human use, but is ill suited to be
the system’s internal representation of a query.
• A more useful internal representation is one based on the extended
relational algebra.
• Thus, the first action the system must take in query processing is to
translate a given query into its internal form.
• This translation process is similar to the work performed by the parser
of a compiler.
04/09/2021
• In generating the internal form of the query, the parser checks the
syntax of the user’s query,
verifies that the relation names appearing in the query are names of
the relations in the database, and so on.
• The system constructs a parse-tree representation of the query, which
it then translates into a relational-algebra expression.
• If the query was expressed in terms of a view, the translation phase
also replaces all uses of the view by the relational-algebra expression
that defines the view.
04/09/2021
04/09/2021
SQL queries and Relational Algebra
• Relational Algebra
• Relational Algebra is a procedural query language used to query the
database tables to access data in different ways.
• It provides a formal foundation for relational model operations.
• It is used as a basis for implementing and optimizing queries in
relational database management system.
04/09/2021
• The fundamental operations of relational algebra are as follows −
• Select
• Project
• Union
• Set different
• Cartesian product
04/09/2021
Selection (or Restriction)
s predicate (R)
– Works on a single relation R and defines a relation
that contains only those tuples (rows) of R that
satisfy the specified condition (predicate).
04/09/2021
• Selection – An Example
• List all staff with a salary greater than £10,000.
04/09/2021
Projection
col1, . . . , coln(R)
– Works on a single relation R and defines a relation
that contains a vertical subset of R, extracting the
values of specified attributes and eliminating
duplicates.
04/09/2021
• Produce a list of salaries for all staff, showing
only staffNo, fName, lName, and salary
details.
04/09/2021
Union
•Ru S
– Union of two relations R and S defines a relation
that contains all the tuples of R, or S, or both R and
S, duplicate tuples being eliminated.
– R and S must be union-compatible.
• If R and S have I and J tuples, respectively,
union is obtained by concatenating them into
one relation with a maximum of (I + J) tuples.
04/09/2021
• Union – An Example
• List all cities where there is either a branch
office or a property for rent.
04/09/2021
Set Difference
•
•R–S
– Defines a relation consisting of the tuples that are
in relation R, but not in S.
– R and S must be union-compatible.
04/09/2021
• Set Difference – An Example
• List all cities where there is a branch office
but no properties for rent.
04/09/2021
Cartesian Product
•RXS
– Defines a relation that is the concatenation of
every tuple of relation R with every tuple of
relation S.
04/09/2021
• Cartesian Product – An Example
• List the names and comments of all clients who have
viewed a property for rent.
04/09/2021
• Cartesian Product and Selection – An
Example
• Use selection operation to extract those tuples where
Client.clientNo = Viewing.clientNo.
s Client.clientNo = Viewing.clientNo((pclientNo, fName, lName(Client)) X
(pclientNo, propertyNo, comment(Viewing)))
04/09/2021
Typical stages in query decomposition are:
•Analysis: lexical and syntactical analysis of the query (correctness).
Normalization: convert the query into a normalized form. The predicate
WHERE will be converted to Conjunctive () or Disjunctive () Normal
form.
•Semantic Analysis: to reject normalized queries that are not correctly
formulated or contradictory. Incorrect if components do not contribute to
generate result. Contradictory if the predicate cannot be satisfied by any
tuple.
04/09/2021
•Simplification: to detect redundant qualifications, eliminate common sub-expressions, and
transform the query to a semantically equivalent but more easily and effectively computed
form.
•Query Restructuring More than one translation is possible Use transformation rules
•Most real-world data is not well structured. Today's databases typically contain
much non-structured data such as text, images, video, and audio, often distributed
across computer networks. In this complex environment, efficient and accurate
query processing becomes quite challenging.
There could be tons of tricks (not only in storage and query processing, but also in concurrency
control, recovery, etc.) Different tricks may work better in different usage scenarios.
Same tricks get used over and over again in different applications.
Query processing: Execute transactions in behalf of this query and print the result. Steps in query
processing:
04/09/2021
04/09/2021
Example:
•
04/09/2021
04/09/2021
Query optimization
• Query optimization is a function of many relational database
management systems. The query optimizer attempts to determine
the most efficient way to execute a given query by considering the
possible query plans.
• Is the process of selecting the most efficient query-evaluation plan
from among the many strategies usually possible for processing a
given query, especially if the query is complex.
04/09/2021
Optimizations…
04/09/2021
Con.….
• Everyone wants the performance of their database to be optimal.
• In particular, there is often a requirement for a specific query or
object that is query based, to run faster.
• Problem of query optimization is to find the sequence of steps that
produces the answer to user request in the most efficient manner,
given the database structure.
• The performance of a query is affected by the tables or queries that
underlies the query and by the complexity of the query.
04/09/2021
Approaches to Query Optimization
• READING ASIGNMENT
• Heuristics Approach
• Properties of individual operators
• Association between operators
• Query Tree
• Transformation rules
• Cost Estimation Approach
• I/O cost
• Access Cost of Secondary Storage
• Storage Cost
• Computation Cost
• Communication Cost
• Pipelining
04/09/2021