Basic Structure: Unit II Relational Model: Structure of Relational Databases - Relational Algebra - Extended
Basic Structure: Unit II Relational Model: Structure of Relational Databases - Relational Algebra - Extended
Basic Structure: Unit II Relational Model: Structure of Relational Databases - Relational Algebra - Extended
Unit II
1. The first database systems were based on the network and hierarchical models.
These are covered briefly in appendices in the text. The relational model was first
proposed by E.F. Codd in 1970 and the first such systems (notably INGRES and
System/R) was developed in 1970s. The relational model is now the dominant
model for commercial data processing applications.
2. Note: Attribute Name Abbreviations
The text uses fairly long attribute names which are abbreviated in the notes as
follows.
Basic Structure
1. Figure 3.1 shows the deposit and customer tables for our banking example.
Let denote the domain of bname, and , and the remaining attributes'
domains respectively.
We will use the terms relation and tuple in place of table and row from now on.
Database Scheme
1. We distinguish between a database scheme (logical design) and a database
instance (data in the database at a point in time).
2. A relation scheme is a list of attributes and their corresponding domains.
3. The text uses the following conventions:
o italics for all names
o lowercase names for relations and attributes
o names beginning with an uppercase for relation schemes
Note that customers are identified by name. In the real world, this would not be
allowed, as two or more customers might share the same name.
4. The relation schemes for the banking example used throughout the text are:
o Branch-scheme = (bname, assets, bcity)
Note: some attributes appear in several relation schemes (e.g. bname, cname).
This is legal, and provides a way of relating tuples of distinct relations.
Keys
1. The notions of superkey, candidate key and primary key all apply to the
relational model.
2. For example, in Branch-scheme,
o {bname} is a superkey.
o {bname, bcity} is a superkey.
o {bname, bcity} is not a candidate key, as the superkey {bname} is
contained in it.
o {bname} is a candidate key.
o {bcity} is not a superkey, as branches may be in the same city.
o We will use {bname} as our primary key.
3. The primary key for Customer-scheme is {cname}.
4. More formally, if we say that a subset of is a superkey for , we are
restricting consideration to relations in which no two distinct tuples have the
same values on all attributes in . In other words,
o If and are in , and
o ,
o then .
Query Languages
1. A query language is a language in which a user requests information from a
database. These are typically higher-level than programming languages.
Fundamental Operations
1. The Select Operation
For example, to select tuples (rows) of the borrow relation where the branch is
``SFU'', we would write
Let Figure 3.3 be the borrow and branch relations in the banking example.
The new relation created as the result of this operation consists of one tuple:
.
We also allow the logical connectives (or) and (and). For example:
Suppose there is one more relation, client, shown in Figure 3.4, with the scheme
we might write
Project copies its argument relation for the specified attributes only. Since a
relation is a set, duplicate rows are eliminated.
For example, to obtain a relation showing customers and branches, but ignoring
amount and loan#, we write
We can perform these operations on the relations resulting from other operations.
To get the names of customers having the same name as their bankers,
The result of is a new relation with a tuple for each possible pairing of
tuples from and .
In order to avoid ambiguity, the attribute names have attached to them the name
of the relation from which they came. If no ambiguity will result, we drop the
relation name.
To find the clients of banker Johnson and the city in which they live, we need
information in both client and customer relations. We can get this by writing
Finally, to get just the customer's name and city, we need a projection:
The rename operation solves the problems that occurs with naming when
performing the cartesian product of a relation with itself.
Suppose we want to find the names of all the customers who live on the same
street and in the same city as Smith.
To find other customers with the same information, we need to reference the
customer relation again:
Problem: how do we distinguish between the two street values appearing in the
Cartesian product, as both come from a customer relation?
Solution: use the rename operator, denoted by the Greek letter rho ( ).
We write
If we use this to rename one of the two customer relations we are using, the
ambiguities will disappear.
The union operation is denoted as in set theory. It returns the union (set union)
of two compatible relations.
To find all customers of the SFU branch, we must find everyone who has a loan
or an account or both at the branch.
As in all set operations, duplicates are eliminated, giving the relation of Figure
3.5(a).
Set difference is denoted by the minus sign ( ). It finds tuples that are in one
relation, but not in another.
To find customers of the SFU branch who have an account there but no loan, we
write
We can do more with this operation. Suppose we want to find the largest account
balance in the bank.
Strategy:
To find , we write
This resulting relation contains all balances except the largest one. (See Figure
3.6(a)).
Additional Operations
1. Additional operations are defined in terms of the fundamental operations. They do
not add power to the algebra, but are useful to simplify common queries.
2. The Set Intersection Operation
Set intersection is denoted by , and returns a relation that contains tuples that are
in both of its argument relations.
To find all customers having both a loan and an account at the SFU branch, we
write
For example, to find all customers having a loan at the bank and the cities in
which they live, we need borrow and customer relations:
Our selection predicate obtains only those tuples pertaining to only one cname.
This type of operation is very common, so we have the natural join, denoted by a
sign. Natural join combines a cartesian product and a selection into one
operation. It performs a selection forcing equality on those attributes that appear
in both relation schemes. Duplicates are removed as in all relation operations.
Formally,
where .
To find the assets and names of all branches which have depositors living in
Stamford, we need customer, deposit and branch relations:
To find all customers who have both an account and a loan at the SFU branch:
This is equivalent to the set intersection version we wrote earlier. We see now that
there can be several ways to write a query in the relational algebra.
Division, denoted , is suited to queries that include the phrase ``for all''.
Suppose we want to find all the customers who have an account at all branches
located in Brooklyn.
We can also find all cname, bname pairs for which the customer has an account
by
Now we need to find all customers who appear in with every branch name in
.
which is simply .
Formally,
No extra relation is added to the database, but the relation variable created can be
used in subsequent expressions. Assignment to a permanent relation would
constitute a modification to the database.
Example Queries
1. For example, to find the branch-name, loan number, customer name and amount
for loans over $1200:
This gives us all attributes, but suppose we only want the customer names. (We
would use project in the algebra.)
In English, we may read this equation as ``the set of all tuples such that there
exists a tuple in the relation borrow for which the values of and for the cname
attribute are equal, and the value of for the amount attribute is greater than
1200.''
How did we get the above expression? We needed tuples on scheme cname such
that there were tuples in borrow pertaining to that customer name with amount
attribute .
The tuples get the scheme cname implicitly as that is the only attribute is
mentioned with.
Find all customers having a loan from the SFU branch, and the the cities in which
they live:
In English, we might read this as ``the set of all (cname,ccity) tuples for which
cname is a borrower at the SFU branch, and ccity is the city of cname''.
Tuple variable ensures that the customer is a borrower at the SFU branch.
Tuple variable is restricted to pertain to the same customer as , and also ensures
that ccity is the city of the customer.
The logical connectives (AND) and (OR) are allowed, as well as (negation).
1. Find all customers having a loan, an account, or both at the SFU branch:
2. Find all customers who have both a loan and an account at the SFU branch.
3. Find customers who have an account, but not a loan at the SFU branch.
4. Find all customers who have an account at all branches located in Brooklyn.
(We used division in relational algebra.)
For this example we will use implication, denoted by a pointing finger in the text,
but by here.
In English: the set of all cname tuples such that for all tuples in the branch
relation, if the value of on attribute bcity is Brooklyn, then the customer has an
account at the branch whose name appears in the bname attribute of .
Formal Definitions
where is a formula.
Safety of Expressions
1. A tuple relational calculus expression may generate an infinite expression, e.g.
2. There are an infinite number of tuples that are not in borrow! Most of these tuples
contain values that do not appear in the database.
3. Safe Tuple Expressions
o So, the domain of is the set of all values explicitly appearing in or that
appear in relations mentioned in .
o is the set of all values appearing
in borrow.
o is the set of all values appearing in borrow.
We may say an expression is safe if all values that appear in the result
are values from dom( ).
1. Domain variables take on values from an attribute's domain, rather than values for
an entire tuple.
Formal Definitions
1. An expression is of the form
Example Queries
Prepared by Mrs.D.Maladhy (AP/IT/RGCET) Page 17
UNIT-II DBMS
1. Find branch name, loan number, customer name and amount for loans of over
$1200.
2. Find all customers who have a loan for an amount > than $1200.
3. Find all customers having a loan from the SFU branch, and the city in which they
live.
4. Find all customers having a loan, an account or both at the SFU branch.
5. Find all customers who have an account at all branches located in Brooklyn.
If you find this example difficult to understand, try rewriting this expression using
implication, as in the tuple relational calculus example. Here's my attempt:
I've used two letter variable names to get away from the problem of having to
remember what stands for.
Safety of Expressions
1. As in the tuple relational calculus, it is possible to generate infinite expressions.
The solution is similar for domain relational calculus-restrict the form to safe
expressions involving values in the domain of the formula.
Deletion
1. Deletion is expressed in much the same way as a query. Instead of displaying, the
selected tuples are removed from the database. We can only delete whole tuples.
2. Some examples:
2. Delete all loans with loan numbers between 1300 and 1500.
Insertions
1. To insert data into a relation, we either specify a tuple, or write a query whose
result is the set of tuples to be inserted. Attribute values for inserted tuples must
be members of the attribute's domain.
2. An insertion is expressed by
3. Some examples:
1. To insert a tuple for Smith who has $1200 in account 9372 at the SFU branch.
2. To provide all loan customers in the SFU branch with a $200 savings account.
Updating
1. Updating allows us to change some values in a tuple without necessarily changing
all.
Some examples:
Views
1. We have assumed up to now that the relations we are given are the actual
relations stored in the database.
2. For security and convenience reasons, we may wish to create a personalized
collection of relations for a user.
3. We use the term view to refer to any relation, not part of the conceptual model,
that is made visible to the user as a ``virtual relation''.
4. As relations may be modified by deletions, insertions and updates, it is generally
not possible to store views. (Why?) Views must then be recomputed for each
query referring to them.
View Definition
1. A view is defined using the create view command:
3. Having defined a view, we can now use it to refer to the virtual relation it creates.
View names can appear anywhere a relation name can.
4. We can now find all customers of the SFU branch by writing
3. Since SQL allows a view name to appear anywhere a relation name may appear,
the clerk can write:
This insertion is represented by an insertion into the actual relation borrow, from
which the view is constructed.
The symbol null represents a null or place-holder value. It says the value is
unknown or does not exist.
This view lists the cities in which the borrowers of each branch live.
Using nulls is the only possible way to do this (see Figure 3.22 in the textbook).
If we do this insertion with nulls, now consider the expression the view actually
corresponds to:
As comparisons involving nulls are always false, this query misses the inserted
tuple.
To understand why, think about the tuples that got inserted into borrow and
customer. Then think about how the view is recomputed for the above query.
Sql
Basic structure
Set operations
Aggregate functions
Null values
Nested sub queries
Derived relations
Views
Modification of the database
Joined relations
Data definition language
Basic structure
Sql is based on set and relational operations with certain modifications and
enhancements
A typical sql query has the form:
select a1, a2, ..., an
from r1, r2, ..., rm
where p
o Ais represent attributes
o Ris represent relations
o P is a predicate.
This query is equivalent to the relational algebra expression.
a1, a2, ..., an(p (r1 x r2 x ... X rm))
The result of an sql query is a relation.
The select clause
The select clause list the attributes desired in the result of a query
o Corresponds to the projection operation of the relational algebra
E.g. Find the names of all branches in the loan relation
select branch-name
from loan
1. Find the loan number of those loans with loan amounts between $90,000 and
$100,000 (that is, $90,000 and $100,000)
select loan-number from loan
where amount between 90000 and 100000
2. Find the name, loan number and loan amount of all customers
having a loan at the perryridge branch.
select customer-name, borrower.loan-number, amount
from borrower, loan
where borrower.loan-number = loan.loan-number an branch-name =
‘perryridge’
The rename operation
The sql allows renaming relations and attributes using the as clause:
old-name as new-name
1. Find the name, loan number and loan amount of all customers; rename the
column name loan-number as loan-id.
select customer-name, borrower.loan-number as loan-id,
amount
from borrower, loan
where borrower.loan-number = loan.loan-number
Tuple variables
Tuple variables are defined in the from clause via the use of the as clause.
1. Find the customer names and their loan numbers for all customers having a loan
at some branch
select customer-name, t.loan-number, s.amount
from borrower as t, loan as s
where t.loan-number = s.loan-number
2. Find the names of all branches that have greater assets than some branch
located in brooklyn.
select distinct t.branch-name
from branch as t, branch as s
where t.assets > s.assets and s.branch-city = ‘brooklyn’
String operations
Sql includes a string-matching operator for comparisons on character strings.
Patterns are described using two special characters:
Percent (%). The % character matches any substring.
Underscore (_). The _ character matches any character.
1. Find the names of all customers whose street includes the substring “main”.
Select customer-name
from customer
where customer-street like ‘%main%’
Match the name “main%”
Like ‘main\%’ escape ‘\’
Set operations
The set operations union, intersect, and except operate on relations and
correspond to the relational algebra operations
Each of the above operations automatically eliminates duplicates; to retain all
duplicates use the corresponding multiset versions union all, intersect all and
except all.
Suppose a tuple occurs m times in r and n times in s, then, it occurs:
o M + n times in r union all s
o Min(m,n) times in r intersect all s
o Max(0, m – n) times in r except all s
1. Find all customers who have a loan, an account, or both:
(select customer-name from depositor)
union
(select customer-name from borrower)
2. Find all customers who have both a loan and an account.
(select customer-name from depositor)
intersect
(select customer-name from borrower)
3. find all customers who have an account but no loan
(select customer-name from depositor)
except
(select customer-name from borrower)
Aggregate functions
These functions operate on the multiset of values of a column of a relation, and
return a value
Outer join
An extension of the join operation that avoids loss of information.
Computes the join and then adds tuples form one relation that does not match
tuples in the other relation to the result of the join.
Uses null values:
o Null signifies that the value is unknown or does not exist
o All comparisons involving null are (roughly speaking) false by definition.
Relation loan
loan-number branch-name amount
Relation borrower
customer-name loan-number
Jones L-170
Smith L-230
Hayes L-155
Nested subqueries
Sql provides a mechanism for the nesting of subqueries.
A subquery is a select-from-where expression that is nested within another
query.
A common use of subqueries is to perform tests for set membership, set
comparisons, and set cardinality.
Example query
Find all customers who have both an account and a loan at the bank.
select distinct customer-name
from borrower
where customer-name in (select customer-name
from depositor)
Find all customers who have a loan at the bank but do not have
an account at the bank
select distinct customer-name
from borrower
where customer-name not in (select customer-name
from depositor)
Find all customers who have both an account and a loan at the perryridge
branch
Embedded sql:
The sql standard defines embeddings of sql in a variety of programming
languages such as pascal, pl/i, fortran, c, and cobol.
A language to which sql queries are embedded is referred to as a host language,
and the sql structures permitted in the host language comprise embedded sql.
The basic form of these languages follows that of the system r embedding of sql
into pl/i.
Exec sql statement is used to identify embedded sql request to the preprocessor
Exec sql <embedded sql statement > end-exec
o Note: this varies by language. E.g. The java embedding uses
# sql { …. } ;
Example query:
From within a host language, find the names and cities of customers with more than the
variable amount dollars in some account.
Specify the query in sql and declare a cursor for it
Exec sql
Dynamic sql:
Allows programs to construct and submit sql queries at run time.
Example of the use of dynamic sql from within a c program.