DBMS Unit 3
DBMS Unit 3
Relational ALGEBRA
• Structure of Relational Databases
• Relational Algebra
• Extended Relational-Algebra-Operations
• Views
Example of a Relation
Basic Structure
• Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus a relation is a set of n-tuples (a1, a2, …, an) where
each ai Di
• Example: if
customer-name = {Jones, Smith, Curry, Lindsay}
customer-street = {Main, North, Park}
customer-city = {Harrison, Rye, Pittsfield}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield)}
is a relation over customer-name x customer-street x customer-city
Attribute Types
• Each attribute of a relation has a name
• The set of allowed values for each attribute is called the
domain of the attribute
• Attribute values are (normally) required to be atomic, that is,
indivisible
• E.g. multivalued attribute values are not atomic
• E.g. composite attribute values are not atomic
• The special value null is a member of every domain
• The null value causes complications in the definition of many
operations
Relation Schema
• A1, A2, …, An are attributes
• R = (A1, A2, …, An ) is a relation schema
E.g. Customer-schema =
(customer-name, customer-street, customer-city)
• r(R) is a relation on the relation schema R
E.g. customer (Customer-schema)
Relation Instance
• The current values (relation instance) of a relation are
specified by a table
• An element t of r is a tuple, represented by a row in a table
attributes
(or columns)
customer-name customer-street customer-city
customer
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
E.g. account relation with unordered tuples
Database
•A database consists of multiple relations
• Information about an enterprise is broken up into parts, with
each relation storing one part of the information
E.g.: account : stores information about accounts
depositor : stores information about which customer
owns which account
customer : stores information about customers
• Storing all information as a single relation such as
bank(account-number, balance, customer-name, ..)
results in
• repetition of information (e.g. two customers own an account)
• the need for null values (e.g. represent a customer without an
account)
• Normalization theory deals with how to design relational
schemas
The customer Relation
The depositor Relation
E-R Diagram for the Banking
Enterprise
Keys
• Let K R
• K is a superkey of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
• by “possible r” we mean a relation r that could exist in the
enterprise we are modeling.
• Example: {customer-name, customer-street} and
{customer-name}
are both superkeys of Customer, if no two customers can possibly
have the same name.
• K is a candidate key if K is minimal
Example: {customer-name} is a candidate key for Customer,
since it is a superkey (assuming no two customers can possibly
have the same name), and no subset of it is a superkey.
Determining Keys from E-R
Sets
• Strong entity set. The primary key of the entity set becomes
the primary key of the relation.
• Weak entity set. The primary key of the relation consists of
the union of the primary key of the strong entity set and the
discriminator of the weak entity set.
• Relationship set. The union of the primary keys of the related
entity sets becomes a super key of the relation.
• For binary many-to-one relationship sets, the primary key of the
“many” entity set becomes the relation’s primary key.
• For one-to-one relationship sets, the relation’s primary key can be
that of either entity set.
• For many-to-many relationship sets, the union of the primary
keys becomes the relation’s primary key
Schema Diagram for the Banking Enterprise
Query Languages
• Language in which user requests information from the
database.
• Categories of languages
• procedural
• non-procedural
• “Pure” languages:
• Relational Algebra
• Tuple Relational Calculus
• Domain Relational Calculus
• Pure languages form underlying basis of query languages that
people use.
Relational Algebra
• Procedural language
• Six basic operators
• select
• project
• union
• set difference
• Cartesian product
• rename
• The operators take one or more relations as inputs and give a
new relation as a result.
Select Operation – Example
• Relation r A B C D
1 7
5 4
12 3
23 10
1 7
23 10
Select Operation
• Notation: p(r)
• p is called the selection predicate
• Defined as:
p(r) = {t | t r and p(t)}
Where p is a formula in propositional calculus consisting
of terms connected by : (and), (or), (not)
Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, , >, . <.
• Example of selection:
branch-name=“Perryridge”(account)
Project Operation – Example
• Relation r: A B C
10 1
20 1
30 1
40 2
A,C (r) A C A C
1 1
1 = 1
1 2
2
Project Operation
• Notation:
1 2
2 3
1 s
r
r s:
A B
1
2
1
3
Union Operation
• Notation: r s
• Defined as:
r s = {t | t r or t s}
• For r s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (e.g., 2nd
column
of r deals with the same type of values as does the 2nd
column of s)
• E.g. to find all customers with either an account or a loan
customer-name (depositor) customer-name (borrower)
Set Difference Operation –
Example
• Relations r, s:
A B A B
1 2
2 3
1 s
r
r – s:
A B
1
1
Set Difference Operation
• Notation r – s
• Defined as:
r – s = {t | t r and t s}
• Set differences must be taken between compatible relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible
Cartesian-Product Operation-
Example
Relations r, s: A B C D E
1 10 a
10 a
2 20 b
r 10 b
s
r x s:
A B C D E
1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
Cartesian-Product Operation
• Notation r x s
• Defined as:
r x s = {t q | t r and q s}
• Assume that attributes of r(R) and s(S) are disjoint. (That is,
R S = ).
• If attributes of r(R) and s(S) are not disjoint, then renaming
must be used.
Composition of Operations
• Can build expressions using multiple operations
• Example: A=C(r x s) A B C D E
• rxs 1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
A B C D E
• A=C(r x s)
1 10 a
2 20 a
2 20 b
Rename Operation
• Allows us to name, and therefore to refer to, the results of
relational-algebra expressions.
• Allows us to refer to a relation by more than one name.
Example:
x (E)
returns the expression E under the name X
If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E)
returns the result of expression E under the name X, and with
the
attributes renamed to A1, A2, …., An.
Banking Example
branch (branch-name, branch-city, assets)
Find the loan number for each loan of an amount greater than
$1200
loan-number (amount > 1200 (loan))
Example Queries
• Find the names of all customers who have a loan, an account, or
both, from the bank
account at bank.
Query 2
customer-name(loan.loan-number = borrower.loan-number(
(branch-name = “Perryridge”(loan)) x borrower))
Example Queries
Find the largest account balance
• Rename account relation as d
• The query is:
balance(account) - account.balance
• Set intersection
• Natural join
• Division
• Assignment
Set-Intersection Operation
• Notation: r s
• Defined as:
• r s ={ t | t r and t s }
• Assume:
• r, s have the same arity
• attributes of r and s are compatible
• Note: r s = r - (r - s)
Set-Intersection Operation -
Example A B A B
• Relation r, s: 1 2
2 3
1
r s
A B
• rs
2
Natural-Join Operation
Notation: r s
• Let r and s be relations on schemas R and S respectively.
Then, r s is a relation on schema R S obtained as follows:
• Consider each pair of tuples tr from r and ts from s.
• If tr and ts have the same value on each of the attributes in R S, add a
tuple t to the result, where
• t has the same value as tr on r
• t has the same value as ts on s
• Example:
R = (A, B, C, D)
S = (E, B, D)
• Result schema = (A, B, C, D, E)
• r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
Natural Join Operation –
Example
• Relations r, s:
A B C D B D E
1 a 1 a
2 a 3 a
4 b 1 a
1 a 2 b
2 b 3 b
r s
r s
A B C D E
1 a
1 a
1 a
1 a
2 b
Division Operation
rs
• Suited to queries that include the phrase “for all”.
• Let r and s be relations on schemas R and S respectively
where
• R = (A1, …, Am, B1, …, Bn)
• S = (B1, …, Bn)
The result of r s is a relation on schema
R – S = (A1, …, Am)
r s = { t | t R-S(r) u s ( tu r ) }
Division Operation – Example
Relations r, s: A B
B
1
2
s
1
2
r s: A r
Assignment Operation
• The assignment operation () provides a convenient way to express
complex queries.
• Write query as a sequential program consisting of
• a series of assignments
• followed by an expression whose value is displayed as a result of the
query.
• Assignment must always be made to a temporary relation variable.
• Example: Write r s as
temp1 R-S (r)
temp2 R-S ((temp1 x s) – R-S,S (r))
result = temp1 – temp2
• The result to the right of the is assigned to the relation variable on the left
of the .
• May use variable in subsequent expressions.
Extended Relational-Algebra-
Operations
• Generalized Projection
• Outer Join
• Aggregate Functions
Generalized Projection
• Extends the projection operation by allowing arithmetic
functions to be used in the projection list.
7
7
3
10
sum-C
g sum(c) (r)
27
Aggregate Operation –
Example
• Relation account grouped by branch-name:
Relation borrower
customer-name loan-number
Jones L-170
Smith L-230
Hayes L-155
Outer Join – Example
• Inner Join
loan Borrower
customer-name
Find the loan number for each loan of an amount greater than $1200
Find the names of all customers who have a loan and an account
at the bank
{t | s borrower( t[customer-name] = s[customer-name])
u depositor( t[customer-name] = u[customer-
name])
Example Queries
• Find the names of all customers having a loan at the Perryridge
branch
{t | s borrower(t[customer-name] = s[customer-name]
u loan(u[branch-name] = “Perryridge”
u[loan-number] = s[loan-number]))}
{t | s loan(s[branch-name] = “Perryridge”
u borrower (u[loan-number] = s[loan-number]
t [customer-name] = u[customer-name])
v customer (u[customer-name] = v[customer-name]
t[customer-city] = v[customer-city])))}
Safety of Expressions
• It is possible to write tuple calculus expressions that generate
infinite relations.
• An expression {t | P(t)} in the tuple relational calculus is safe if
every component of t appears in one of the relations, tuples,
or constants that appear in P
• NOTE: this is more than just a syntax condition.
• E.g. { t | t[A]=5 true } is not safe --- it defines an infinite set with
attribute values that do not appear in any relation or tuples or
constants in P.
Domain Relational Calculus
• A nonprocedural query language equivalent in power to the
tuple relational calculus
• Each query is an expression of the form:
Find the names of all customers who have a loan of over $1200
Find the names of all customers who have a loan from the
Perryridge branch and the loan amount:
{ c | l ({ c, l borrower
b,a( l, b, a loan b = “Perryridge”))
a( c, a depositor
b,n( a, b, n account b = “Perryridge”))}
Safety of Expressions
{ x1, x2, …, xn | P(x1, x2, …, xn)}
Redundancy:
Data for branch-name, branch-city, assets are repeated for each loan
that a branch makes
Wastes space
Complicates updating, introducing possibility of inconsistency of
assets value
Null values
Cannot store information about a branch if no loans exist
Can use null values, but they are difficult to handle.
Decomposition
Decompose the relation schema Lending-schema into:
Branch-schema = (branch-name, branch-city,assets)
Loan-info-schema = (customer-name, loan-number,
branch-name, amount)
All attributes of an original schema (R) must appear in the
decomposition (R1, R2):
R = R1 R2
Lossless-join decomposition.
For all possible relations r on schema R
r = R1 (r) R2 (r)
Example of Non Lossless-Join
Decomposition
Decomposition of R = (A, B)
R1 = (A) R2 = (B)
A B
1 A
B
2
1
1
2
Decomposition
A functional decomposition is the process of breaking down
the functions of an organization into progressively greater
(finer and finer) levels of detail.
In decomposition, one function is described in greater detail
by a set of other supporting functions.
The decomposition of a relation scheme R consists of
replacing the relation schema by two or more relation
schemas that each contain a subset of the attributes of R and
together include all attributes in R.
Decomposition helps in eliminating some of the problems of
bad design such as redundancy, inconsistencies and
anomalies.
There are two types of decomposition :
Lossy Decomposition
Lossless Join Decomposition
Lossy Decomposition :
"The decompositio of relation R into R1 and R2 is lossy when the
join of R1 and R2 does not yield the same relation as in R."
One of the disadvantages of decomposition into two or more
relational schemes (or tables) is that some information is lost
during retrieval of original relation or table.
Consider that we have table STUDENT with three attribute
roll_no , sname and department.
STUDENT
ROLL NO S NAME DEPT