Unit 2
Unit 2
Unit 2
Unit-2
1
RELATIONAL DATA MODEL
1 Introduction
Relational data model represents the database as a collection of relations. A relation is
nothing but a table of values. Every row in the table represents a collection of related
data values. These rows in the table denote a real-world entity or relationship.
Tuple: It is nothing but a single row of a table, which contains a single record.
Relation schema: A relation schema represents the name of the relation with its
attributes. If A1, A2, ........, An are attributes then R= (A1, A2, ...................., An) is a re-
lation schema.
Example: A relation schema student with their attributes are like the followings:-
student=(rollNo, name,branch,contactNo).
Relation Instance: Relation instance is a finite set of tuples in the RDBMS sys-
tem. Relation instances never have duplicate tuples.
Attribute domain: The set of all possible values of a relation is said to be domain of
an attribute.
Degree: The total number of attributes exist in the relation is called the degree of
the relation.
Note: Domain of an attribute is said to be atomic if all its possible values are atomic
i.e. not divisible.
3 Integrity Constraints
• Integrity constraints are a set of rules. It is used to maintain the quality of infor-
mation.
2
• Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
• Thus, integrity constraint is used to guard against accidental damage to the database.
4. Key Constraints
• Domain constraints can be defined as the definition of a valid set of values for an
attribute.
• The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.
• This is because the primary key value is used to identify individual rows in relation
and if the primary key has a null value, then we can’t identify those rows.
• A table can contain a null value other than the primary key field.
3
3.1.3 Referential Integrity Constraints
• A referential integrity constraint is specified between two tables.
• An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique and null value in the relational table.
4
4 Foreign key
• Consider R and S are two tables. An attribute A of table R is said to be foreign
key of R if A is the primary key in S.
• A foreign key is a column (or combination of columns) in a table whose values must
match values of a column in some other table.
• FOREIGN KEY constraints enforce referential integrity, which essentially says that
if column value A refers to column value B, then column value B must exist.
Example: Consider two tables Employee(ID, Name, Dept-ID) and Department( Dept-ID,
Dept-name).
Here, attribute Dept-ID in Employee table is a foreign key because Dept-ID in Depart-
ment table is a primary key.
5 Schema Diagram
• A database schema, along with primary key and foreign key dependencies, can be
depicted pictorially by schema diagrams.
• Each relation appears as a box, with the attributes listed inside it and the relation
name above it. If there are primary key attributes, a horizontal line crosses the box,
with the primary key attributes listed above the line. Foreign key dependencies
appear as arrows from the foreign key attributes of the referencing relation to the
primary key of the referenced relation.
Example: Following figure shows the schema diagram for our banking enterprise.
6 Query Languages
A query language is a language in which a user requests information from the database.
Query languages can be categorized as either procedural or non-procedural.
5
In a non-procedural language, the user describes the desired information without
giving a specific procedure for obtaining that information.
1. Relational algebra
In these languages, relational algebra is a procedural but tuple and domain relational
calculus are non-procedural languages.
Consider the following banking database. We will write all the queries for this database.
7 Relational Algebra
• The relational algebra is a procedural query language.
• It consists of a set of operations that take one or two relations as input and produce
a new relation as their result.
• The fundamental operations in the relational algebra are select, project, union,
set difference, Cartesian product, and rename. In addition to the fundamental
operations, there are several other operations—namely, set intersection, natural
join, division, and assignment.
• The other three operations operate on pairs of relations and are, therefore, called
binary operations.
Example: Select those tuples of the loan relation where the branch is “Perryridge”.
Solution: σbranch−name=”Perryridge”(loan)
Example: Find all tuples in which the amount lent is more than $1200.
Solution: σamount>1200(loan)
6
Note: In general, we allow comparisons using =, =,/ <, ≤, >, ≥ in the selection pred-
icate. Furthermore, we can combine several predicates into a larger predicate by using
the connectives and (∧), or (∨), and not ().
Example: Find those tuples pertaining to loans of more than $1200 made by the
”Perryridge” branch.
Solution: σ(branch−name=”Perryridge”)∧(amount>1200)(loan)
Example: List all loan numbers and the amount of the loan.
Solution: Πloan−number,amount(loan)
Example: Find those customers who live in Harrison.
Solution: Πcustomer−name(σcity=”Harrison”(Customer))
2. The domains of the ith attribute of r and the ith attribute of s must be the same,
for all i.
Example: Find the names of all bank customers who have either an account or a loan
or both.
Solution: Πcustomer(depositer) ∪ Πcustomer(borrower)
7
Figure 1: Banking database
8
Example: Find all customers of the bank who have an account but not a loan.
Solution: Πcustomer(depositer) −Πcustomer(borrower)
As with the union operation, we must ensure that set differences are taken between com-
patible relations. Therefore, for a set difference operation r - s to be valid, we require
that the relations r and s be of the same arity, and that the domains of the ith attribute
of r and the ith attribute of s be the same.
9
10
11
Figure 2: Result of Πaccount.balance(σaccount.balance<d.balance(account × ρd(account)))
12
7.2 Additional Operations
7.2.1 Intersection Operation
Example: Find all customers who have both a loan and an account.
Solution: Πcustomer(depositer) ∩ Πcustomer(borrower)
Note: r ∩ s = r-(r-s)
7.2.2 Join
Join is a combination of a Cartesian product followed by a selection process. A Join
operation pairs two tuples from different relations, if and only if a given join condition is
satisfied.
7.2.5 Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The
above example corresponds to equijoin.
7.2.6 Natural-Join
Natural join does not use any comparison operator. It does not concatenate the way
a Cartesian product does. We can perform a Natural Join only if there is at least one
common attribute that exists between two relations. In addition, the attributes must
have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the
relations are same.
Example: Find the names of all customers who have a loan at the bank, along with
the loan number and the loan amount.
Solution: Query without using natural join is
Πcustomer−name,loan−number,amount(σborrower.loan−number=loan.loan−number(borrower × loan))
13
14
Equialent query using natural join is
Πcustomer−name,loan−number,amount(borrower da loan)
Example: Find the names of all branches with customers who have an account in the
bank and who live in Harrison.
Solution: Πbranch−name(σcustomer−city=”Harrison”(customer da depositor da account))
Note: If there is no common attributes between two relations, then natural-join and
Cartesian product is equal.
Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with
that x value. If this set contains (all y values in) B, then the x value is in the result of A/B.
Example: Find all customers who have an account at all the branches located in
Brooklyn.
Solution: In this query, we will apply the division operator. For this we have to find
numerator and denominator of the query. If numerator is N and denominator is D then
final query will be N÷D.
In this query, denominator is all the branches located in ”Brooklyn”. Query for this is
D = Πbranch−name(σbranch−city=”Brooklyn”(branch))
And numerator is all the customers who have an account with their branch name. Query
for this is
N = Πcustomer−name,branch−name(depositor da account)
7.3 Example:
Consider the following database which consists of three tables. Write the following queries
15
16
17
in relational algebra:
4. Find the names of sailors who have reserved at least one boat.
Solution: Πsname(Sailors da Reserves)
5. Find the names of sailors who have reserved a red or a green boat.
Solution: temp ← Πsname,color(Sailors da Reserves da Boat)
Πsname(σcolor=”red”(temp)) ∪ Πsname(σcolor=”green”(temp))
6. Find the names of sailors who have reserved a red and a green boat.
Solution: temp ← Πsname,color(Sailors da Reserves da Boat)
Πsname(σcolor=”red”(temp)) ∩ Πsname(σcolor=”green”(temp))
7. Find the names of sailors who have reserved at least two boats.
Solution: temp ← Πsid,sname,bid(Sailors da Reserves)
Πsname(σtemp1.sid=r.sid∧temp1.bid/=r.bid(temp × ρr(temp)))
8. Find the sids of sailors with age over 20 who have not reserved a red boat.
Solution: Πsid(σage>20(Sailors))−Πsid((σcolor=”red”(Boats)) da Reserves da Sailors)
9. Find the names of sailors who have reserved all boats.
Solution: N ← Πsname,bid(Sailors da Reserves)
D ← Πbid(Boat)
Therefore, final query is N ÷ D
10. Find the names of sailors who have reserved all boats called Interlake.
Solution: N ← Πsname,bid(Sailors da Reserves)
D ← Πbid(σbname=”Interlake”(Boat))
Therefore, final query is N ÷ D
18
7.4.2 Aggregate Functions
Aggregate functions take a collection of values and return a single value as a result. For
example, the aggregate function sum takes a collection of values and returns the sum of
the values. Following aggregate functions are used.
1. sum
2. avg
3. count
4. min
5. max
The symbol isG the letter G in calligraphic font; read it as “calligraphic G.” The relational-
G aggregation is to be applied, and its sub- script specifies
algebra operation signifies that
the aggregate operation to be applied.
The left outer join ( ) takes all tuples in the left relation that did not match with any
tuple in the right relation, pads the tuples with null values for all other attributes from
the right relation, and adds them to the result of the natural join.
The right outer join ( ) is symmetric with the left outer join: It pads tuples from the
right relation that did not match any from the left relation with nulls and adds them to
the result of the natural join.
The full outer join( ) does both of those operations, padding tuples from the left
relation that did not match any from the right relation, as well as tuples from the right
19
relation that did not match any from the left relation, and adding them to the result of
the join.
Example: Consider the following two relations Employee and FT-works:-
Natural join and left outer join are the followings:-
Right outer join and Full outer join are the followings:-
7.5.2 Insertion
To insert data into a relation, we either specify a tuple to be inserted or write a query
whose result is a set of tuples to be inserted. Obviously, the attribute values for inserted
tuples must be members of the attribute’s domain. Similarly, tuples inserted must be of
the correct arity. The relational algebra expresses an insertion by
r ← r ∪E
where r is a relation and E is a relational-algebra expression.
Example: Suppose that we wish to insert the fact that Smith has $1200 in account
A-973 at the Perryridge branch.
Solution:
depositor ← depositor ∪ (”Smith”, ”A − 973”)
account ← account ∪ (”A − 973”, ”Perryridge”, 1200)
20
21
7.5.3 Updating
To update value of a particular row into a relation, we write the following type of query:-
r ← ΠF1,F2,...,Fn (r)
where each Fi is either the ith attribute of r, if the ith attribute is not updated, or, if the
attribute is to be updated, Fi is an expression, involving only constants and the attributes
of r, that gives the new value for the attribute.
If we want to select some tuples from r and to update only them, we can use the following
expression; here, P denotes the selection condition that chooses which tuples to update:
r ← ΠF1,F2,...,Fn (σP (r)) ∪ (r − σP (r))
Example: Suppose that interest payments are being made, and that all balances
are to be increased by 5 percent.
Solution: account ← Πaccount−number,branch−name,balance∗1.05(account)
Example: Suppose that accounts with balances over $10,000 receive 6 percent in-
terest, whereas all others receive 5 percent.
Solution: account ← Πaccount−number,branch−name,balance∗1.06(σbalance>10000(account)) ∪
Πaccount−number,branch−name,balance∗1.05(σbalance≤10000(account))
7.6 Views
Any relation that is not part of the logical model, but is made visible to a user as a
virtual relation, is called a view.
We define a view by using the create view statement. To define a view, we must give the
view a name, and must state the query that computes the view. The form of the create
view statement is
create view v as ¡query expression¿ where¡query expression¿is any legal relational-
algebra query expression. The view name is represented by v.
Example: Consider the view consisting of branches and their customers. We wish
this view to be called all-customer. We define this view as follows:
create view all-customer as Πbranch−name,customer−name(depositor da account)
∪Πbranch−name,customer−name(borrower da loan)
Once we have defined a view, we can use the view name to refer to the virtual rela-
tion that the view generates. Using the view all-customer, we can find all customers of
the Perryridge branch by writing
Picustomer−name(σbranch−name=\Perryridge”(all − customer))
22
8.1 Example Queries
• Find the branch-name, loan-number, and amount for loans of over $1200.
Solution: { t | t ∈ loan ∧ t[amount] > 1200 }
• Find the loan number for each loan of an amount greater than $1200.
Solution: { t | ∃ s ∈ loan(t[loan − number] = s[loan − number] ∧ s[amount] >
1200) }
• Find the names of all customers who have a loan from the Perryridge branch.
Solution: { t | ∃ s ∈ borrower(t[customer − name] = s[customer − name] ∧
∃ s ∈ loan(u[loan−number] = s[loan−number]∧u[branch−name] = ”Perryridge”))}
• Find all customers who have a loan, an account, or both at the bank.
Solution: { t | ∃s ∈ borrower(t[customer − name] = s[customer − name])
∨ ∃u ∈ depositor(t[customer − name] = u[customer − name])}
• Find those customers who have both an account and a loan at the bank.
Solution: { t | ∃s ∈ borrower(t[customer − name] = s[customer − name])
∧ ∃u ∈ depositor(t[customer − name] = u[customer − name])}
• Find all customers who have an account at the bank but do not have a loan from
the bank.
Solution: { t | ∃s ∈ depositor(t[customer − name] = s[customer − name])
∧ ∃u ∈ borrower(t[customer − name] = u[customer − name])}
• Find all customers who have an account at all branches located in Brooklyn.
Solution: { t | ∀u ∈ branch(u[branch−city] = ”Brooklyn” ⇒ ∃s ∈ depositor(t[customer−
name] = s[customer − name] ∧ ∃w ∈ account(w[account − number] = s[account −
number] ∧ w[branch − name] = u[branch − name])))}
Note:
1. The formula P ⇒ Q means “P implies Q”; that is, “if P is true, then Q must be
true.”
23
9.1 Example Queries
• Find the branch-name, loan-number, and amount for loans of over $1200.
Solution: { < l, b, a > | < l, b, a >∈ loan ∧ a > 1200 }
• Find the loan number for each loan of an amount greater than $1200.
Solution: { < l > | ∃ b, a (< l, b, a >∈ loan ∧ a > 1200) }
• Find the names of all customers who have a loan from the Perryridge branch and
find the loan amount.
Solution: { < c, a > | ∃ l(< c, l >∈ borrower ∧ ∃b (< l, b, a >∈ loan ∧ b =
”Perryridge”))}
• Find all customers who have a loan, an account, or both at the Perryridge branch.
Solution: {< c > |∃l(< c, l >∈ borrower ∧ ∃b, a(< l, b, a >∈ loan ∧ b =
”Perryridge”)) ∨ ∃a(< c, a >∈ depositor ∧ ∃b, n(< a, b, n >∈ account ∧ b =
”Perryridge”))}
• Find all customers who have an account at all branches located in Brooklyn.
Solution: {< c > | ∀x, y, z(< x, y, z >∈ branch ∧ y = ”Brooklyn” ⇒
∃a, b(< a, x, b >∈ account∧ < c, a >∈ depositor))}
10 Exercise
1. Consider the following relational database, where the primary keys are underlined.
employee (person-name, street, city)
works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)
Give an expression in the relational algebra to express each of the following queries:
(a) Find the names of all employees who work for First Bank Corporation.
(b) Find the names and cities of residence of all employees who work for First
Bank Corporation.
(c) Find the names, street address, and cities of residence of all employees who
work for First Bank Corporation and earn more than $10,000 per annum.
(d) Find the names of all employees in this database who live in the same city as
the company for which they work.
(e) Find the names of all employees who live in the same city and on the same
street as do their managers.
(f) Find the names of all employees in this database who do not work for First
Bank Corporation.
(g) Find the names of all employees who earn more than every employee of Small
Bank Corporation.
24
(h) Assume the companies may be located in several cities. Find all companies
located in every city in which Small Bank Corporation is located.
(i) Find the company with the most employees.
(j) Find the company with the smallest payroll.
(k) Find those companies whose employees earn a higher salary, on average, than
the average salary at First Bank Corporation.
Solution:
Solution:
(a) employee ← Πperson−name,street,”Newtown” (σperson−name=”Jones” (Employee))∪(employee−
σpersonname=”Jones”(employee))
25
(b) works ← Πperson−name,company−name,salary∗1.1(σcompany−name=”FirstBankCorporation”(works))∪
(works − σcompany−name=”FirstBankCorporation”(works)
(c) temp← Πworks.person−name,company−name,salary(σworks.person−name=manages.manages−name(works ×
manages))
works ← (works − temp) ∪ Πworks.person−name,comapny−name,salary∗1.1(temp)
(d) temp1← Πworks.person−name,company−name,salary(σworks.person−name=manages.manages−name(works ×
manages))
temp2 ← Πworks.person−name,company−name,salary∗1.03(σsalary∗1.1>100000(temp1))
temp2 ← temp2∪Πworks,person−name,company−name,salary∗1.1(σsalary∗1.1≤100000(temp1))
works ← (works − temp1) ∪ temp2
(e) works ← works − σcompany−name=”Small Bank Corporation”(works)
(a) ΠA(r)
(b) σB=17(r)
(c) r×s
(d) ΠA,F (σC=D(r × s))
Solution:
4. Let R = (A, B, C), and let r1 and r2 both be relations on schema R. Give an expres-
sion in the domain relational calculus that is equivalent to each of the following:
(a) ΠA(r1)
(b) σB=17(r1)
(c) r1 ∪ r2
(d) r1 ∩ r2
(e) r1 − r2
(f) ΠA,B(r1) da ΠB,C(r2)
Solution:
26
(d) {< a, b, c > | < a, b, c >∈ r1∧ < a, b, c >∈ r2}
(e) {< a, b, c > | < a, b, c >∈ r1 ∧ < a, b, c >∈
/ r2 }
(f) {< a, b, c > | ∃p, q(< a, b, p >∈ r1∧ < q, b, c >∈ r2)}
5. Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations. Write relational-
algebra expressions equivalent to the following domain-relational calculus expres-
sions:
Solution:
(a) ΠA(σB=17(r))
(b) r da s
(c) ΠA(r) ∪ (s ÷ ΠC(s))
(d) Πr.A(σr.B>r1.B((r da s) × (ρr1(r))))
6. Given two relations R1 and R2, where R1 contains N1 tuples,R2 contains N2 tuples,
and N2 > N1 > 0, give the minimum and maximum possible sizes (in tuples) for the
result relation produced by each of the following relational algebra expressions. In
each case, state any assumptions about the schemas for R1 and R2 that are needed
to make the expression meaningful:
(a) R1 ∪ R2
(b) R1 ∩ R2
(c) R1 − R2
(d) R1 × R2
(e) σa=5(R1)
(f) Πa(R1)
(g) R1 ÷ R2
Solution:
27
(e) Assume attribute a in R1 is a primary key. In this case
Minimum number of tuples = 0
Maximum number of tuples = 1
The key fields are underlined, and the domain of each field is listed after the field
name. Thus sid is the key for Suppliers, pid is the key for Parts, and sid and pid
together form the key for Catalog. The Catalog relation lists the prices charged for
parts by Suppliers. Write the following queries in relational algebra, tuple relational
calculus, and domain relational calculus:
(a) Find the names of suppliers who supply some red part.
(b) Find the sids of suppliers who supply some red or green part.
(c) Find the sids of suppliers who supply some red part or are at 221 Packer Street.
(d) Find the sids of suppliers who supply some red part and some green part.
(e) Find the sids of suppliers who supply every part.
(f) Find the sids of suppliers who supply every red part.
(g) Find the sids of suppliers who supply every red or green part.
(h) Find the sids of suppliers who supply every red part or supply every green
part.
(i) Find pairs of sids such that the supplier with the first sid charges more for
some part than the supplier with the second sid.
(j) Find the pids of parts that are supplied by at least two different suppliers.
(k) Find the pids of the most expensive parts supplied by suppliers named Yosemite
Sham.
Solution:
28
(a) Relational algebra query is
Πsname(Suppliers da Catalog da Πpid(σcolor=”red”(Parts)))
29
Tuple relational calculus query is
{ t | ∀s ∈ Parts ⇒ ∃u ∈ Catalog(t[sid] = u[sid] ∧ s[pid] = u[pid])}
30
8. Consider the Supplier-Parts-Catalog schema from the previous question. State what
the following queries compute:
(a) Find the Supplier names of the suppliers who supply a red part that costs less
than 100 dollars.
(b) This Relational Algebra statement does not return anything because of the
sequence of projection operators. Once the sid is projected, it is the only field
in the set. Therefore, projecting on sname will not return anything.
(c) Find the Supplier names of the suppliers who supply a red part that costs less
than 100 dollars and a green part that costs less than 100 dollars.
(d) Find the Supplier ids of the suppliers who supply a red part that costs less
than 100 dollars and a green part that costs less than 100 dollars.
(e) Find the Supplier names of the suppliers who supply a red part that costs less
than 100 dollars and a green part that costs less than 100 dollars.
Note that the Employees relation describes pilots and other kinds of employees
as well; every pilot is certified for some aircraft (otherwise, he or she would not
qualify as a pilot), and only pilots are certified to fly.
Write the following queries in relational algebra, tuple relational calculus, and do-
main relational calculus. Note that some of these queries may not be expressible
in relational algebra (and, therefore, also not expressible in tuple and domain rela-
tional calculus)! For such queries, informally explain why they cannot be expressed.
(a) Find the eids of pilots certified for some Boeing aircraft.
(b) Find the names of pilots certified for some Boeing aircraft.
(c) Find the aids of all aircraft that can be used on non-stop flights from Bonn to
Madras.
31
(d) Identify the flights that can be piloted by every pilot whose salary is more
than $100,000. (Hint: The pilot must be certified for at least one plane with
a sufficiently large cruising range.)
(e) Find the names of pilots who can operate some plane with a range greater
than 3,000 miles but are not certified on any Boeing aircraft.
(f) Find the eids of employees who make the highest salary.
(g) Find the eids of employees who make the second highest salary.
(h) Find the eids of pilots who are certified for the largest number of aircraft.
(i) Find the eids of employees who are certified for exactly three aircraft.
(j) Find the total amount paid to employees as salaries.
Solution:
32
Solution: An unsafe query is a query in relational calculus that has an infinite
number of results. An example of such a query is:
{ S ! (S ∈Sailors) }
The query is for all things that are not sailors which of course is everything else.
Clearly there is an infinite number of answers, and this query is unsafe. It is
important to disallow unsafe queries because we want to be able to get back to
users with a list of all the answers to a query after a finite amount of time.
33
STRUCTURED QUERY LANGUAGE (SQL)
1 Basic Structure
The basic structure of an SQL expression consists of three clauses: select, from, and
where.
• The select clause corresponds to the projection operation of the relational algebra.
It is used to list the attributes desired in the result of a query.
• The where clause corresponds to the selection predicate of the relational algebra. It
consists of a predicate involving attributes of the relations that appear in the from
clause.
create table r(A1 D1, A2 D2, ..., An Dn, < integrity − constraint1 >, ..., < integrity −
constraintk >) where r is the name of the relation, each Ai is the name of an attribute in
the schema of relation r, and Di is the domain type of values in the domain of attribute
Ai. The allowed integrity constraints include
• primary key (Aj1 , Aj2 , ..., Ajm ): The primary key specification says that attributes
Aj1 , Aj2 , ..., Ajm form the primary key for the relation. The primary key attributes
are required to be non-null and unique; that is, no tuple can have a null value for
a primary key attribute, and no two tuples in the relation can be equal on all the
primary-key attributes. Although the primary key specification is optional, it is
generally a good idea to specify a primary key for each relation.
34
• check(P): The check clause specifies a predicate P that must be satisfied by every
tuple in the relation.
Example: Consider the following definition of tables:-
35
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
Account-schema = (account-number, branch-name, balance)
Depositor-schema = (customer-name, account-number)
• Find all loan numbers for loans made at the Perryridge branch with loan amounts
greater that $1200.
Solution:
select loan-number
from loan
where branch-name = ’Perryridge’ and amount > 1200
• Find the loan number of those loans with loan amounts between $90,000 and
$100,000.
Solution:
select loan-number
from loan
where amount ≤ 100000 and amount ≥ 90000
• For all customers who have a loan from the bank, find their names,loan numbers
and loan amount.
Solution:
select customer-name, borrower.loan-number, amount
from borrower, loan
where borrower.loan-number = loan.loan-number
• Find the customer names, loan numbers, and loan amounts for all loans at the
Perryridge branch.
Solution:
select customer-name, borrower.loan-number, amount
from borrower, loan
where borrower.loan-number = loan.loan-number and branch-name = ’Perryridge’
36
Example:
• For all customers who have a loan from the bank, find their names, loan numbers,
and loan amount
Solution:
select customer-name, T.loan-number, S.amount
from borrower as T, loan as S
where T.loan-number = S.loan-number
• Find the names of all branches that have assets greater than at least one branch
located in Brooklyn.
Solution:
select distinct T.branch-name
from branch as T, branch asS
where T.assets > S.assets and S.branch-city = ’Brooklyn’
• ’%idge%’ matches any string containing “idge” as a substring, for example, ’Per-
ryridge’, ’Rock Ridge’, ’Mianus Bridge’, and ’Ridgeway’.
37
1.5 Ordering the Display of Tuples
To display the result in the sorted order, we use the order by clause.
Example: To list in alphabetic order all customers who have a loan at the Perryridge
branch
Solution:
select distinct customer-name
from borrower, loan
where borrower.loan-number = loan.loan-number and
branch-name = ’Perryridge’
order by customer-name
Example:
select *
from loan
order by amount desc, loan-number asc
• Find all customers who have both a loan and an account at the bank.
Solution:
(select customer-name
from depositor)
intersect
(select customer-name
from borrower)
• Find all customers who have an account but no loan at the bank.
Solution:
(select customer-name
from depositor)
except
(select customer-name
from borrower)
38
1.7 Aggregate Functions
Aggregate functions are functions that take a collection (a set or multiset) of values as
input and return a single value. SQL offers five built-in aggregate functions:
1. Average: avg
2. Minimum: min
3. Maximum: max
4. Total: sum
5. Count: count
Example:
• Find the branches where the average account balance is more than $1200.
Solution:
select branch-name, avg (balance)
from account
group by branch-name
having avg (balance) > 1200
• Find the average balance for each customer who lives in Harrison and has at least
three accounts.
Solution:
select depositor.customer-name, avg (balance)
from depositor, account, customer
39
where depositor.account-number = account.account-number and depositor.customer-
name = customer.customer-name and customer-city = ’Harrison’
group by depositor.customer-name
having count (distinct depositor.account-number) ≥ 3
• Find all customers who have both an account and a loan at the Perryridge branch.
Solution:
select distinct customer-name
from borrower, loan
where borrower.loan-number = loan.loan-number and
branch-name = ’Perryridge’ and (branch-name, customer-name)
in (select branch-name, customer-name
from depositor, account
where depositor.account-number = account.account-number)
• Find all customers who do have a loan at the bank, but do not have an account at
the bank.
Solution:
select distinct customer-name
from borrower
where customer-name not in (select customer-name from depositor)
• Find the names of all branches that have assets greater than those of at least one
branch located in Brooklyn.
Solution:
select branch-name
from branch
where assets > some (select assets
from branch
where branch-city = ’Brooklyn’)
• Find the names of all branches that have an asset value greater than that of each
branch in Brooklyn.
Solution:
select branch-name
from branch
where assets > all (select assets
40
from branch
where branch-city = ’Brooklyn’)
• Finds those branches for which the average balance is greater than or equal to all
average balances.
Solution:
select branch-name
from account
group by branch-name
having avg (balance) ≥ all (select avg (balance)
from account
group by branch-name)
Example: Find all customers who have both an account and a loan at the bank.
Solution:
select customer-name
from borrower
where exists (select *
from depositor
where depositor.customer-name = borrower.customer-name)
We can test for the nonexistence of tuples in a subquery by using the not exists con-
struct. We can use the not exists construct to simulate the set containment (that is,
superset) operation: We can write “relation A contains relation B” as “not exists (B
except A).”
Example: Find all customers who have an account at all the branches located in
Brooklyn. Solution:
select distinct S.customer-name
from depositor as S
where not exists ((select branch-name
from branch
where branch-city = ’Brooklyn’)
except
(select R.branch-name
from depositor as T, account as R
where T.account-number = R.account-number and
S.customer-name = T.customer-name))
41
1.10 Test for the Absence of Duplicate Tuples
SQL includes a feature for testing whether a subquery has any duplicate tuples in its
result. The unique construct returns the value true if the argument subquery contains
no duplicate tuples.
Example: Find all customers who have at most one account at the Perryridge branch.
Solution:
select T.customer-name
from depositor as T
where unique (select R.customer-name
from account, depositor as R
where T.customer-name = R.customer-name and
R.account-number = account.account-number and
account.branch-name = ’Perryridge’)
We can test for the existence of duplicate tuples in a subquery by using the not unique
construct.
Example: Find all customers who have at least two accounts at the Perryridge branch.
Solution:
select distinct T.customer-name
from depositor T
where not unique (select R.customer-name
from account, depositor as R
where T.customer-name = R.customer-name and
R.account-number = account.account-number and
account.branch-name = ’Perryridge’)
• Find the maximum across all branches of the total balance at each branch.
Solution
select max(tot-balance)
from (select branch-name, sum(balance)
from account
group by branch-name) as branch-total (branch-name, tot-balance)
42
1.12 Example
Consider the following database schemas and corresponding its database:-
1. Find the names of sailors who have reserved boat number 103.
Solution:
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid = R.sid AND R.bid=103
Solution:
SELECT B.color
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND S.sname = ‘Lubber’
6. Find the names of sailors who have reserved at least one boat.
43
44
Solution:
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid = R.sid
7. Compute increments for the ratings of persons who have sailed two different boats
on the same day.
Solution:
SELECT S.sname, S.rating+1 AS rating
FROM Sailors S, Reserves R1, Reserves R2
WHERE S.sid = R1.sid AND S.sid = R2.sid
AND R1.day = R2.day AND R1.bid <> R2.bid
8. Find the ages of sailors whose name begins and ends with B and has at least three
characters.
Solution:
SELECT S.age
FROM Sailors S
WHERE S.sname LIKE ‘B %B’
9. Find the names of sailors who have reserved a red or a green boat. Solution:
SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid
AND (B.color = ‘red’ OR B.color = ‘green’)
10. Find the names of sailors who have reserved both a red and a green boat. Solution:
SELECT S.sname
FROM Sailors S, Reserves R1, Boats B1, Reserves R2, Boats B2
WHERE S.sid = R1.sid AND R1.bid = B1.bid
AND S.sid = R2.sid AND R2.bid = B2.bid
AND B1.color=‘red’ AND B2.color = ‘green’
45
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = ‘green’
11. Find the sids of all sailors who have reserved red boats but not green boats.
Solution:
SELECT S.sid
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = ‘red’
EXCEPT
SELECT S2.sid
FROM Sailors S2, Reserves R2, Boats B2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = ‘green’
12. Find the names of sailors who have not reserved a red boat.
Solution:
SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = ‘red’ ))
13. Find sailors whose rating is better than some sailor called Horatio.
Solution:
SELECT S.sid
FROM Sailors S
WHERE S.rating > ANY ( SELECT S2.rating
FROM Sailors S2
WHERE S2.sname = ‘Horatio’ )
15. Find the names of sailors who have reserved all boats.
Solution:
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (( SELECT B.bid
46
FROM Boats B )
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid = S.sid ))
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS ( SELECT B.bid
FROM Boats B
WHERE NOT EXISTS ( SELECT R.bid
FROM Reserves R
WHERE R.bid = B.bid
AND R.sid = S.sid ))
20. Find the names of sailors who are older than the oldest sailor with a rating of 10.
Solution:
SELECT S.sname
FROM Sailors S
47
WHERE S.age > ( SELECT MAX ( S2.age )
FROM Sailors S2
WHERE S2.rating = 10 )
21. Find the age of the youngest sailor for each rating level.
Solution:
SELECT S.rating, MIN (S.age)
FROM Sailors S
GROUP BY S.rating
22. Find the age of the youngest sailor who is eligible to vote (i.e., is at least 18 years
old) for each rating level with at least two such sailors.
Solution:
SELECT S.rating, MIN (S.age) AS minage
FROM Sailors S
WHERE S.age >= 18
GROUP BY S.rating
HAVING COUNT (*) > 1
23. For each red boat, find the number of reservations for this boat.
Solution:
SELECT B.bid, COUNT (*) AS sailorcount
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = ‘red’
GROUP BY B.bid
24. Find the average age of sailors for each rating level that has at least two sailors.
Solution:
SELECT S.rating, AVG (S.age) AS average
FROM Sailors S
GROUP BY S.rating
HAVING COUNT (*) > 1
25. Find the average age of sailors who are of voting age (i.e., at least 18 years old) for
each rating level that has at least two sailors.
Solution:
SELECT S.rating, AVG ( S.age ) AS average
FROM Sailors S
WHERE S. age >= 18
GROUP BY S.rating
HAVING 1 < ( SELECT COUNT (*)
48
FROM Sailors S2
WHERE S.rating = S2.rating )
26. Find the average age of sailors who are of voting age (i.e., at least 18 years old) for
each rating level that has at least two such sailors.
Solution:
SELECT S.rating, AVG ( S.age ) AS average
FROM Sailors S
WHERE S. age > 18
GROUP BY S.rating
HAVING 1 < ( SELECT COUNT (*)
FROM Sailors S2
WHERE S.rating = S2.rating AND S2.age >= 18 )
27. Find those ratings for which the average age of sailors is the minimum over all
ratings.
Solution:
SELECT S.rating
FROM Sailors S
WHERE AVG (S.age) = ( SELECT MIN (AVG (S2.age))
FROM Sailors S2
GROUP BY S2.rating )
1.13 Cursor
We can declare a cursor on any relation or on any SQL query (because every query returns
a set of rows). Once a cursor is declared, we can open it (which positions the cursor just
before the first row); fetch the next row; move the cursor (to the next row, to the row
after the next n, to the first row, or to the previous row, etc., by specifying additional
parameters for the FETCH command); or close the cursor. Thus, a cursor essentially
allows us to retrieve the rows in a table by positioning the cursor at a particular row and
reading its contents.
49
executed as a side effect of their program.
A condition in a trigger can be a true/false statement (e.g., all employee salaries are less
than $100,000) or a query. A query is interpreted as true if the answer set is nonempty,
and false if the query has no answers. If the condition part evaluates to true, the action
associated with the trigger is executed.
A trigger action can examine the answers to the query in the condition part of the trigger,
refer to old and new values of tuples modified by the statement activating the trigger,
execute new queries, and make changes to the database. In fact, an action can even exe-
cute a series of data-definition commands (e.g., create new tables, change authorizations)
and transaction-oriented commands (e.g., commit), or call host language procedures.
1.15 Exercise
1. Consider the following employee database:-
where the primary keys are underlined. Give an expression in SQL for each of the
following queries.
(a) Find the names of all employees who work for First Bank Corporation.
(b) Find the names and cities of residence of all employees who work for First
Bank Corporation.
(c) Find the names, street addresses, and cities of residence of all employees who
work for First Bank Corporation and earn more than $10,000.
(d) Find all employees in the database who live in the same cities as the companies
for which they work.
(e) Find all employees in the database who live in the same cities and on the same
streets as do their managers.
(f) Find all employees in the database who do not work for First Bank Corpora-
tion.
(g) Find all employees in the database who earn more than each employee of Small
Bank Corporation.
(h) Assume that the companies may be located in several cities. Find all companies
located in every city in which Small Bank Corporation is located.
(i) Find all employees who earn more than the average salary of all employees of
their company.
(j) Find the company that has the most employees.
(k) Find the company that has the smallest payroll.
(l) Find those companies whose employees earn a higher salary, on average, than
the average salary at First Bank Corporation.
50
Solution:
(c) select *
from employee
where employee-name in
(select employee-name
from works
where company-name = ’First Bank Corporation’ and salary > 10000)
(d) select e.employee-name
from employee e, works w, company c
where e.employee-name = w.employee-name and e.city = c.city and
w.company-name = c.company-name
(e) select P.employee-name
from employee P, employee R, manages M
where P.employee-name = M.employee-name and M.manager-name = R.employee-
name and P.street = R.street and P.city = R.city
(f) select employee-name
from works
where company-name /= ’First Bank Corporation’
(g) select employee-name
from works
where salary > all
(select salary
from works
where company-name = ’Small Bank Corporation’)
(h) select S.company-name
from company S
where not exists ((select city
from company
where company-name = ’Small Bank Corporation’)
except
(select city
from company T
where S.company-name = T.company-name))
(i) select employee-name
from works T
where salary > (select avg (salary)
51
from works S
where T.company-name = S.company-name)
Solution:
(a) update employee
set city = ’Newton’
where person-name = ’Jones’
52
and company-name = ’First Bank Corporation’
update works
set salary = salary * 1.1
where employee-name in (select manager-name from manages)
and salary * 1.1 <= 100000
and company-name = ’First Bank Corporation’
where the primary keys are underlined. Construct the following SQL queries for
this relational database.
(a) Find the total number of people who owned cars that were involved in accidents
in 1989.
(b) Find the number of accidents in which the cars belonging to “John Smith”
were involved.
(c) Add a new accident to the database; assume any values for required attributes.
(d) Delete the Mazda belonging to “John Smith”.
(e) Update the damage amount for the car with license number “AABB2000” in
the accident with report number “AR2197” to $3000.
Solution:
53
where exists
(select *
from participated, person
where participated.driver-id = person.driver-id
and person.name = ’John Smith’
and accident.report-number = participated.report-number)
(c) We assume the driver was “Jones,” although it could be someone else. Also,
we assume “Jones” owns one Toyota. First we must find the license of the
given car. Then the participated and accident relations must be updated in
order to both record the accident and tie it to the given car. We assume values
“Berkeley” for location, ’2001-09-01’ for date and date, 4007 for reportnumber
and 3000 for damage amount.
Student(snum: integer, sname: string, major: string, level: string, age: integer)
Class( cname: string, meets at: time, room: string, fid: integer)
Enrolled( snum: integer, cname: string)
Faculty( fid: integer, fname: string, deptid: integer)
54
(a) Find the names of all Juniors (Level = JR) who are enrolled in a class taught
by I. Teach.
(b) Find the age of the oldest student who is either a History major or is enrolled
in a course taught by I. Teach.
(c) Find the names of all classes that either meet in room R128 or have five or
more students enrolled.
(d) Find the names of all students who are enrolled in two classes that meet at
the same time.
(e) Find the names of faculty members who teach in every room in which some
class is taught.
(f) Find the names of faculty members for whom the combined enrollment of the
courses that they teach is less than five.
(g) Print the Level and the average age of students for that Level, for each Level.
(h) Print the Level and the average age of students for that Level, for all Levels
except JR.
(i) Find the names of students who are enrolled in the maximum number of
classes.
(j) Find the names of students who are not enrolled in any class.
(k) For each age value that appears in Students, find the level value that appears
most often.
For example, if there are more FR level students aged 18 than SR, JR, or SO
students aged 18, you should print the pair (18, FR).
Solution:
55
(d) SELECT DISTINCT S.sname
FROM Student S
WHERE S.snum IN (SELECT E1.snum
FROM Enrolled E1, Enrolled E2, Class C1, Class C2
WHERE E1.snum = E2.snum AND E1.cname <> E2.cname
AND E1.cname = C1.name
AND E2.cname = C2.name AND C1.meets at = C2.meets at)
56
GROUP BY E2.snum ))
The Catalog relation lists the prices charged for parts by Suppliers. Write the
following queries in SQL:
(a) Find the pnames of parts for which there is some supplier.
(b) Find the snames of suppliers who supply every part.
(c) Find the snames of suppliers who supply every red part.
(d) Find the pnames of parts supplied by Acme Widget Suppliers and by no one
else.
(e) Find the sids of suppliers who charge more for some part than the average cost
of that part (averaged over all the suppliers who supply that part).
(f) For each part, find the sname of the supplier who charges the most for that
part.
(g) Find the sids of suppliers who supply only red parts.
(h) Find the sids of suppliers who supply a red part and a green part.
(i) Find the sids of suppliers who supply a red part or a green part.
Solution:
57
(a) SELECT pname
FROM Parts, Catalog
WHERE Parts.pid = Catalog.pid
(b) SELECT Sname
FROM Suppliers
WHERE NOT EXISTS ( SELECT pid
FROM Part)
EXCEPT
( SELECT pid
FROM Catalog
WHERE Suppliers.sid = Catalog.sid)
(c) SELECT Sname
FROM Suppliers
WHERE NOT EXISTS ( SELECT pid
FROM Part
WHERE color = ’red’)
EXCEPT
( SELECT pid
FROM Catalog
WHERE Suppliers.sid = Catalog.sid)
(d) SELECT pname FROM Parts, Catalog, Suppliers
WHERE Parts.pid = Catalog.pid AND Catalog.sid = Suppliers.sid AND
sname = ’Acme Widget’ AND
pid NOT IN (SELECT pid
FROM Catalog, Suppliers
WHERE Catalog.sid = Suppliers.sid AND
sname <> ’Acme Widget’)
58
FROM Parts
WHERE color = ’red’))
(h) (SELECT sid
FROM Catalog, Parts
WHERE Catalog.pid = Parts.pid AND color = ’red’)
INTERSECT
(SELECT sid
FROM Catalog, Parts
WHERE Catalog.pid = Parts.pid AND color = ’green’)
(i) (SELECT sid
FROM Catalog, Parts
WHERE Catalog.pid = Parts.pid AND color = ’red’)
UNION
(SELECT sid
FROM Catalog, Parts
WHERE Catalog.pid = Parts.pid AND color = ’green’)
6. The following relations keep track of airline flight information:
Flights(flno: integer, from: string, to: string, distance: integer, departs: time,
arrives: time, price: integer)
Aircraft( aid: integer, aname: string, cruisingrange: integer)
Certified( eid: integer, aid: integer)
Employees( eid: integer, ename: string, salary: integer)
Note that the Employees relation describes pilots and other kinds of employees
as well; every pilot is certified for some aircraft, and only pilots are certified to fly.
Write each of the following queries in SQL.
(a) Find the names of aircraft such that all pilots certified to operate them earn
more than 80,000.
(b) For each pilot who is certified for more than three aircraft, find the eid and
the maximum cruisingrange of the aircraft that he (or she) is certified for.
(c) Find the names of pilots whose salary is less than the price of the cheapest
route from Los Angeles to Honolulu.
(d) For all aircraft with cruisingrange over 1,000 miles, find the name of the aircraft
and the average salary of all pilots certified for this aircraft.
(e) Find the names of pilots certified for some Boeing aircraft.
(f) Find the aids of all aircraft that can be used on routes from Los Angeles to
Chicago.
(g) Identify the flights that can be piloted by every pilot who makes more than$100,000.
(Hint: The pilot must be certified for at least one plane with a sufficiently large
cruisingrange.)
(h) Print the enames of pilots who can operate planes with cruisingrange greater
than 3,000 miles, but are not certified on any Boeing aircraft.
59
(i) A customer wants to travel from Madison to New York with no more than
two changes of flight. List the choice of departure times from Madison if the
customer wants to arrive in New York by 6 p.m.
(j) Compute the difference between the average salary of a pilot and the average
salary of all employees (including pilots).
(k) Print the name and salary of every nonpilot whose salary is more than the
average salary for pilots.
Solution:
(a) SELECT DISTINCT A.aname
FROM Aircraft A
WHERE A.Aid IN (SELECT C.aid
FROM Certified C, Employees E
WHERE C.eid = E.eid AND
NOT EXISTS ( SELECT *
FROM Employees E1
WHERE E1.eid = E.eid AND E1.salary < 80000))
60
(g) SELECT DISTINCT F.from, F.to
FROM Flights F
WHERE NOT EXISTS ( SELECT *
FROM Employees E
WHERE E.salary > 100000 AND
NOT EXISTS (SELECT *
FROM Aircraft A, Certified C
WHERE A.cruisingrange > F.distance
AND E.eid = C.eid AND A.aid = C.aid))
61
(j) SELECT Temp1.avg - Temp2.avg
FROM (SELECT AVG (E.salary) AS avg
FROM Employees E
WHERE E.eid IN (SELECT DISTINCT C.eid
FROM Certified C )) AS Temp1,
(SELECT AVG (E1.salary) AS avg
FROM Employees E1 ) AS Temp2
7. Consider the following relational schema. An employee can work in more than one
department; the pct time field of the Works relation shows the percentage of time
that a given employee works in a given department.
(a) Print the names and ages of each employee who works in both the Hardware
department and the Software department.
(b) For each department with more than 20 full-time-equivalent employees (i.e.,
where the part-time and full-time employees add up to at least that many full-
time employees), print the did together with the number of employees that
work in that department.
(c) Print the name of each employee whose salary exceeds the budget of all of the
departments that he or she works in.
(d) Find the managerids of managers who manage only departments with budgets
greater than $1,000,000.
(e) Find the enames of managers who manage the departments with the largest
budget.
(f) If a manager manages more than one department, he or she controls the sum
of all them budgets for those departments. Find the managerids of managers
who control more than $5,000,000.
(g) Find the managerids of managers who control the largest amount.
62
Solution:
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
63