Database Management System: by Hemant Tulsani
Database Management System: by Hemant Tulsani
Lecture 07
By Hemant Tulsani
Assistant Professor
ECE Department
1
Till now we have covered…
Unit 1
• Basic concepts in DBMS
• ER Model and Diagrams
• Reduction of ER models to Relational Schemas
2
In this lecture...
• Structure of Relational Databases
• Fundamental Relational-Algebra-Operations
• Additional Relational-Algebra-Operations
• Extended Relational-Algebra-Operations
• Null Values
• Modification of the Database
Example of a Relation
Basic Structure
• Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di
• Example: If
– customer_name = {Jones, Smith, Curry, Lindsay, …} /* Set of all customer names */
– customer_street = {Main, North, Park, …} /* set of all street names*/
– customer_city = {Harrison, Rye, Pittsfield, …} /* set of all city names */
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) }
is a relation over
customer_name x customer_street x customer_city
Attribute Types
• Each attribute of a relation has a name
• The set of allowed values for each attribute is called the domain of the attribute
• Attribute values are (normally) required to be atomic; that is, indivisible
– E.g. the value of an attribute can be an account number,
but cannot be a set of account numbers
• Domain is said to be atomic if all its members are atomic
• The special value null is a member of every domain
• The null value causes complications in the definition of many operations
Relation Schema
• A1, A2, …, An are attributes
attributes
(or columns)
customer_name customer_street customer_city
Jones Main Harrison
Smith North Rye tuples
Curry North Rye (or rows)
Lindsay Park Pittsfield
customer
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
Example: account relation with unordered tuples
Relational Database
• A database consists of multiple relations
• Information about an enterprise is broken up into parts, with each relation storing one part of the
information
account : stores information about accounts
depositor : stores information about which customer
owns which account
customer : stores information about customers
• Storing all information as a single relation such as bank (account_number, balance,
customer_name, ..) results in
– repetition of information e.g.,if two customers own an account (What gets repeated?)
– the need for null values e.g., to represent a customer without an account
• Normalization theory deals with how to design relational schemas
The customer Relation
The depositor Relation
Keys
• Let K R
• K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible
relation r(R)
– by “possible r ” we mean a relation r that could exist in the enterprise we are modeling.
– Example: {customer_name, customer_street } and
{customer_name}
are both superkeys of Customer, if no two customers can possibly have the same name
• In real life, an attribute such as customer_id would be used instead of customer_name
to uniquely identify customers
Keys (Cont.)
• K is a candidate key if K is minimal
Example: {customer_name} is a candidate key for Customer, since it is a superkey and no
subset of it is a superkey.
• Primary key: a candidate key chosen as the principal means of identifying tuples
within a relation
– Should choose an attribute whose value never, or very rarely, changes.
– E.g. email address is unique, but may change
• An alternate key is a key associated with one or more columns whose values
uniquely identify every row in the table, but which is not the primary key.
Foreign Keys
• A relation schema may have an attribute that corresponds to the primary key of
another relation. The attribute is called a foreign key.
– Only values occurring in the primary key attribute of the referenced
relation may occur in the foreign key attribute of the referencing relation.
Schema diagram
Query Languages
• Language in which user requests information from the database.
• Categories of languages
– Procedural
– Non-procedural, or declarative
• “Pure” languages:
– Relational algebra
– Tuple relational calculus
– Domain relational calculus
• Pure languages form underlying basis of query languages that people use.
Relational Algebra
• Procedural language
• Six basic operators
– select:
– project:
– union:
– set difference: –
– Cartesian product: x
– rename:
• The operators take one or two relations as inputs and produce a new relation as a
result.
Select Operation
• Notation: p(r)
• p is called the selection predicate
• Defined as:
p(r) = {t | t r and p(t)}
• Example of selection:
branch_name=“Perryridge”(account)
Select Operation – Example
A B C D
Relation r
1 7
5 7
12 3
23 10
1 7
23 10
Project Operation
• Notation:
A 1 , A 2 , , A k (r )
where A1, A2 are attribute names and r is a relation name.
• The result is defined as the relation of k columns obtained by erasing the
columns that are not listed
• Duplicate rows removed from result, since relations are sets
• Example: To eliminate the branch_name attribute of account
A B C A C
A C
10 1 1
1
20 1 1 =
1
30 1 1
2
40 2 2
Union Operation
• Notation: r s
• Defined as:
r s = {t | t r or t s}
• For r s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd column
of r deals with the same type of values as does the 2nd
column of s)
• Example: to find all customers with either an account or a loan
customer_name (depositor) customer_name (borrower)
Union Operation – Example
Relations r, s:
rs
A B
A B A B
1
1 2
2
2 3
1
1
s 3
r
Set Difference Operation
• Notation r – s
• Defined as:
r – s = {t | t r and t s}
A B
A B A B
1
1 2
1
2 3
1
s
r
Cartesian-Product Operation
• Notation r x s
• Defined as:
r x s = {t q | t r and q s}
• Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ).
• If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
Cartesian-Product Operation – Example
Relations r, s rxs
A B C D E A B C D E
1 10 a 1 10 a
10 a 1 10 a
2 20 b 1 20 b
r 10 b 1 10 b
2 10 a
s 2 10 a
2 20 b
2 10 b
Composition of Operations
• Can build expressions using multiple operations
• Example: A=C(r x s)
A B C D E
• rxs
1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
A B C D E
2 10 a
1 10 a 2 20 b
2 10 a 2 10 b
2 20 b
• A=C(r x s)
Rename Operation
• Allows us to name, and therefore to refer to, the results of relational-algebra
expressions.
• Allows us to refer to a relation by more than one name.
• Example:
x (E)
returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An .
Banking Example
branch (branch_name, branch_city, assets)
Find the loan number for each loan of an amount greater than
$1200
loan_number (amount > 1200 (loan))
Find the names of all customers who have a loan, an account, or both, from
the bank
customer_name (branch_name=“Perryridge”
customer_name(depositor)
Example Queries
• Find the names of all customers who have a loan at the Perryridge branch.
Query 1
customer_name(loan.loan_number = borrower.loan_number (
balance(account) - account.balance
• Set intersection
• Natural join
• Division
• Assignment
Set-Intersection Operation
• Notation: r s
• Defined as:
• r s = { t | t r and t s }
• Assume:
– r, s have the same arity
– attributes of r and s are compatible
• Note: r s = r – (r – s)
Set-Intersection Operation – Example
• Relation r, s:
A B A B
1 2
2 3
1
r s
• rs
A B
2
Natural-Join Operation
Notation: r s
• Let r and s be relations on schemas R and S respectively.
Then, r s is a relation on schema R S obtained as follows:
– Consider each pair of tuples tr from r and ts from s.
– If tr and ts have the same value on each of the attributes in R S, add a tuple t to the result,
where
• t has the same value as tr on r
• t has the same value as ts on s
• Example:
R = (A, B, C, D)
S = (E, B, D)
– Result schema = (A, B, C, D, E)
– r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
Natural Join Operation – Example
• Relations r, s:
A B C D B D E
1 a 1 a
2 a 3 a
4 b 1 a
1 a 2 b
2 b 3 b
r s
r s A B C D E
1 a
1 a
1 a
1 a
2 b
Division
r
Operation
• Notation:
•
s
Suited to queries that include the phrase “for all”.
• Let r and s be relations on schemas R and S respectively where
– R = (A1, …, Am , B1, …, Bn )
– S = (B1, …, Bn)
The result of r s is a relation on schema
R – S = (A1, …, Am)
r s = { t | t R-S (r) u s ( tu r ) }
Where tu means the concatenation of tuples t and u to produce a
single tuple
Division
Relations r, s:
Operation – Example
A
B
1 B
2
1
3
1 2
1 s
1
3
4
6
1
r s: A r 2
Another
Relations r, s:
Division Example
A B C D E D E
a a 1
a a 1 a 1
a b 1 b 1
a a 1 s
a b 3
a a 1
a b 1
a r b 1
r s:
A B C
a
a
•
Division Operation (Cont.)
Property
– Let q = r s
– Then q is the largest relation satisfying q x s r
• Definition in terms of the basic algebra operation
Let r(R) and s(S) be relations, and let S R
To see why
– R-S,S (r) simply reorders attributes of r
• Find the name of all customers who have a loan at the bank and the loan
amount
customer_name, loan_number, amount (borrower loan)
Bank Example Queries
• Find all customers who have an account from at least the “Downtown” and the Uptown”
branches.
Query 1
Relation borrower
customer_name loan_number
Jones L-170
Smith L-230
Hayes L-155
• Join
Outer Join – Example
loan borrower
• Each Fi is either
– the I th attribute of r, if the I th attribute is not updated, or,
– if the attribute is to be updated Fi is an expression, involving only
constants and the attributes of r, which gives the new value for the attribute
Update Examples
• Make interest payments by increasing all balances by 5 percent.
account account_number, branch_name, balance * 1.05 (account)