INTRODUCTION TO THE RELATIONAL MODEL
The main construct for representing data in the relational
model is a relation
A relation consists of a relation schema and a relation
instance
The schema specifies the relation's name, the name of each
field (or column, or attribute), and the type of each field
Example:
Students(sid: string, name: string, login: string, age: integer,
gpa: real)
An instance of the Students relation appears in Figure 3.1
The degree, also called arity, of a relation is the number
of fields
The cardinality of a relation instance is the number of
tuples in it
In Figure 3.1, the degree of the relation (the number of
columns) is five, and the cardinality of this instance is six
A relational database is a collection of relations with
distinct relation names
The relational database schema is the collection of
schemas for the relations in the database
Creating and Modifying Relations Using SQL
To create the Students relation, use the following
statement:
CREATE TABLE Students
(
sid NUMBER(5),
name VARCHAR2(10),
login VARCHAR2(20),
age NUMBER(2),
gpa NUMBER(2,1)
)
Tuples are inserted using the INSERT command
We can insert a single tuple into the Students table as
follows:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, 'Smith', 'smith@ee', 18, 3.2)
We can delete tuples using the DELETE command
We can delete all Students tuples with name equal to Smith
using the command:
DELETE FROM Students
WHERE name = 'Smith'
We can modify the column values in an existing row using
the UPDATE command
For example, we can increment the age and decrement the
gpa of the student with sid 53688:
UPDATE Students
SET age = age + 1, gpa = gpa - 1
WHERE sid = 53688
INTEGRITY CONSTRAINTS OVER RELATIONS
An integrity constraint (IC) is a condition that is
specified on a database schema, and restricts the data that
can be stored in an instance of the database
If a database instance satisfies all the integrity constraints
specified on the database schema, it is a legal instance
A DBMS enforces integrity constraints, in that it permits
only legal instances to be stored in the database
Integrity constraints are specified and enforced at different
times:
1.When the DBA or end user defines a database schema
2.When a database application is run, the DBMS checks for
violations and disallows changes to the data that violate
the specified ICs
Key Constraints
Consider the Students relation and the constraint that no
two students have the same sid
This Integrity Constraint is an example of a key
constraint
A key constraint is a minimal subset of the fields of a
A set of fields that uniquely identifies a tuple is called a
candidate key for the relation. we often abbreviate this to
just key
In the Students relation, the sid field is a candidate key
The set {sid, name} is an example of a superkey, which is
a set of fields that contains a key
Example: look at Figure 3.1
A relation may have several candidate keys
For example, the login and age fields of the Students
relation may, taken together, also identify students uniquely.
That is, {login, age} is also a key
login is also a candidate key, since no two rows in the
example instance have the same login value
Out of all the available candidate keys, a database designer
can identify a primary key
Specifying Key Constraints
SQL UNIQUE Constraint
The UNIQUE constraint uniquely identifies each record in
a database table
Note that you can have many UNIQUE constraints per
table, but only one PRIMARY KEY constraint per table
Example:
CREATE TABLE Students
(
sid NUMBER(5),
name VARCHAR2(10),
login VARCHAR2(20),
age NUMBER(2),
gpa NUMBER(2,1)
CONSTRAINT uc_nameage UNIQUE (name,age)
)
SQL PRIMARY KEY Constraint
The PRIMARY KEY constraint uniquely identifies each
record in a database table
Example:
CREATE TABLE Students
(
sid NUMBER(5) PRIMARY KEY,
name VARCHAR2(10),
login VARCHAR2(20),
age NUMBER(2),
gpa NUMBER(2,1)
)
SQL FOREIGN KEY Constraint
A FOREIGN KEY in one table points to a PRIMARY
KEY in another table
If one of the relations is modified, the other must be
checked, and perhaps modified, to keep the data consistent
The sid field of Enrolled is called a foreign key and refers
to Students
The foreign key in the referencing relation (Enrolled, in our
example) must match the primary key of the referenced
relation (Students)
However, every sid value that appears in the Enrolled table
appears in the primary key column of a row in the Students
table
If we try to insert the tuple <55555, Art104, A> into
Enrolled, the IC is violated because there is no tuple in
Students with the id 55555
Similarly, if we delete the tuple <53666, Jones, jones@cs,
18, 3.4> from Students, we violate the foreign key
constraint because the tuple <53666, History105, B> in
Enrolled contains sid value 53666, the sid of the deleted
Students tuple
Example:
CREATE TABLE Enrolled
(
cid VARCHAR2(20) PRIMARY KEY,
grade VARCHAR2(2),
sid NUMBER(5),
FOREIGN KEY(sid) REFERENCES Students(sid)
)
General Constraints
We may require that student ages be within a certain range
of values
Given such an IC specification, the DBMS will reject
inserts and updates that violate the constraint
The CHECK constraint is used to limit the value range that
can be placed in a column
Example: All students must be at least 16 years old (Fig 3.5)
CREATE TABLE Students
(
sid NUMBER(5) PRIMARY KEY,
name VARCHAR2(10),
login VARCHAR2(20),
age NUMBER(2) CHECK (age>16),
gpa NUMBER(2,1)
)
ENFORCING INTEGRITY CONSTRAINTS
ICs are specified when a relation is created and enforced
when a relation is modified
Potential IC violation is generally checked at the end of
each SQL statement execution
Consider the Students relation shown in Figure 3.1
The following insertion violates the primary key constraint
because there is already a tuple with the sid 53688, and it
will be rejected by the DBMS:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (53688, 'Mike', 'mike@ee', 17, 3.4)
The following insertion violates the constraint that the
primary key cannot contain null:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (null, 'Mike', 'mike@ee', 17, 3.4)
An update can cause violations, similar to an insertion:
UPDATE Students SET sid = 50000 WHERE S.sid =
53688
This update violates the primary key constraint because
there is already a tuple with sid 50000
referential integrity enforcement(on foreign key)
Deletions of Enrolled tuples do not violate referential
integrity, but insertions of Enrolled tuples could
The following insertion is illegal because there is no
student with sid 51111:
INSERT INTO Enrolled (cid, grade, sid)
VALUES ('Hindi101', 'B', 51111)
On the other hand, insertions of Students tuples do not
violate referential integrity although deletions could
Further, updates on either Enrolled or Students that change
the sid value could potentially violate referential integrity
CREATE TABLE Enrolled
(
cid VARCHAR2(20) PRIMARY KEY,
grade VARCHAR2(2),
sid NUMBER(5),
FOREIGN KEY(sid) REFERENCES Students(sid)
ON DELETE CASCADE
ON UPDATE NO ACTION
)
The options are specified as part of the foreign key
declaration
The default option is NO ACTION, which means that the
action (DELETE or UPDATE) is to be rejected
The CASCADE keyword says that if a Students row is
deleted, all Enrolled rows that refer to it are to be deleted as
well
If the UPDATE clause specified CASCADE, and the sid
column of a Students row is updated, this update is also
carried out in each Enrolled row that refers to the updated
Students row
If a Students row is deleted, we can switch the enrollment to
a 'default' student by using ON DELETE SET DEFAULT
The default student is specified as part of the definition of
the sid field in Enrolled; for example, sid NUMBER(5)
DEFAULT '53666'
It is really not appropriate to switch enrollments to a default
student
The correct solution in this example is to also delete all
enrollment tuples for the deleted student (that is,
CASCADE), or to reject the update
SQL also allows the use of null as the default value by
specifying ON DELETE SET NULL
QUERYING RELATIONAL DATA
A relational database query (query, for short) is a
question about the data, and the answer consists of a new
relation containing the result
For example, find all students younger than 18 or all
students enrolled in History105
A query language is a specialized language for writing
queries
SQL is the most popular commercial query language for a
relational DBMS
We can retrieve rows corresponding to students who are
younger than 18 with the following SQL query:
SELECT * FROM Students S WHERE S.age < 18
The symbol * means that we retain all fields of selected
tuples in the result
To understand this query, think of S as a variable that takes
on the value of each tuple in Students, one tuple after the
other
The condition S.age < 18 in the WHERE clause specifies
that we want to select only tuples in which the age field has
a value less than 18
Result is shown in Fig 3.6
In addition to selecting a subset of tuples, a query can
extract a subset of the fields of each selected tuple
We can compute the names and logins of students who are
younger than 18 with the following query:
SELECT S.name, S.login FROM Students S
WHERE S.age < 18
Figure 3.7 shows the result
We can also combine information in the Students and
Enrolled relations
If we want to obtain the names of all students who obtained
grade A and the cid in which they got grade A, we could
write the following query:
SELECT S.name, E.cid FROM Students S, Enrolled E
WHERE S.sid = E.sid AND E.grade = 'A'
Answer is <Smith, Topology112>
LOGICAL DATABASE DESIGN: ER TO RELATIONAL
The ER model is convenient for representing an initial,
high-level database design
There is a standard approach to generating a relational
database schema that closely approximates the ER design
Entity Sets to Tables
Each attribute of the entity set becomes an attribute of the
table
Note that we know both the domain of each attribute and
the (primary) key of an entity set
Consider the Employees entity set with attributes ssn,
name, and lot shown in Figure 3.8
A possible instance of the Employees entity set, containing
three Employees entities, is shown in Figure 3.9 in a tabular
format
The following SQL statement captures the preceding
information, including the domain constraints and key
information:
CREATE TABLE Employees
(
ssn VARCHAR2(11) PRIMARY KEY,
name VARCHAR2(30),
lot NUMBER(2)
)
Relationship Sets (without Constraints) to Tables
A relationship set, like an entity set, is mapped to a relation
in the relational model
To represent a relationship, we must be able to identify
each participating entity and give values to the descriptive
attributes of the relationship
Thus, the attributes of the relation include:
--The primary key attributes of each participating entity set,
as foreign key fields
--The descriptive attributes of the relationship set
Consider the Works_In2 relationship set shown in Figure
3.10
Each department has offices in several locations and we
want to record the locations at which each employee works
All the available information about the Works_In2 table is
captured by the following SQL definition:
CREATE TABLE Works_In2
(
ssn VARCHAR2(11),
did NUMBER(2),
address VARCHAR2(20),
since DATE,
PRIMARY KEY(ssn, did, address),
FOREIGN KEY(ssn) REFERENCES Employees(ssn),
FOREIGN KEY(did) REFERENCES Departments(did),
FOREIGN KEY(address) REFERENCES
Locations(address) )
Consider the Reports_To relationship set shown in Figure
3.11
The role indicators supervisor and subordinate are used to
create meaningful field names in the Reports_To table:
CREATE TABLE Reports_To
(
supervisor_ssn VARCHAR2(11),
subordinate_ssn VARCHAR2(11),
PRIMARY KEY(supervisor_ssn, subordinate_ssn),
FOREIGN KEY(supervisor_ssn) REFERENCES
Employees(ssn),
FOREIGN KEY(subordinate_ssn) REFERENCES
Employees(ssn)
)
Translating Relationship Sets with Key Constraints
If a relationship set involves n entity sets and some m of
them are linked via arrows in the ER diagram, the key for
any one of these m entity sets constitutes a key for the
relation to which the relationship set is mapped
Thus we have m candidate keys, and one of these should be
designated as the primary key
Consider the relationship set Manages shown in Figure3.12
The table corresponding to Manages has the attributes ssn,
did, since
However, because each department has at most one
manager, no two tuples can have the same did value but
differ on the ssn value
A consequence of this observation is that did is itself a key
for Manages
First Method
CREATE TABLE Manages
(
did NUMBER(2),
ssn VARCHAR2(11),
since DATE,
PRIMARY KEY(did),
FOREIGN KEY(did) REFERENCES Departments(did),
FOREIGN KEY(ssn) REFERENCES Employees(ssn)
)
Second Method
CREATE TABLE Dept_Mgr
(
did NUMBER(2) PRIMARY KEY,
dname VARCHAR2(20),
budget NUMBER(10,2),
since DATE,
ssn VARCHAR2(11),
FOREIGN KEY(ssn) REFERENCES Employees(ssn)
)
Second method eliminates the need for a separate Manages
relation
Translating Relationship Sets with Participation
Constraints
Consider the ER diagram in Figure 3.13, which shows two
relationship sets, Manages and Works_In
Every department is required to have a manager, due to the
participation constraint, and at most one manager, due to the
key constraint
CREATE TABLE Dept_Mgr
(
did NUMBER(2) PRIMARY KEY,
dname VARCHAR2(20),
budget NUMBER(10,2),
since DATE,
ssn VARCHAR2(11) NOT NULL,
FOREIGN KEY(ssn) REFERENCES Employees(ssn) ON
DELETE NO ACTION
)
Translating Weak Entity Sets
A weak entity set always participates in a one-to-many
binary relationship and has a key constraint and total
participation
Consider the Dependents weak entity set shown in Figure
3.14, with partial key pname
CREATE TABLE Dep_Policy
(
pname VARCHAR2(20),
age NUMBER(2),
cost NUMBER(10,2),
ssn VARCHAR2(11),
PRIMARY KEY(ssn,pname)
FOREIGN KEY(ssn) REFERENCES Employees(ssn) ON
DELETE CASCADE
)
Translating Class Hierarchies
The two basic approaches to handling ISA hierarchies by
applying them to the ER diagram shown in Figure 3.15
Approach1:
The Employees relation contain fields ssn, name and lot
with ssn as primary key
The relation for Hourly_Emps includes the hourly_wages
and hours_worked attributes plus the key attributes of the
superclass (ssn, in this example), which serve as the primary
key for Hourly_Emps, as well as a foreign key referencing
the superclass (Employees)
The relation for Contract_Emps is similar to Hourly_Emps
For each Hourly_Emps entity, the value of the name and
lot attributes are stored in the corresponding row of the
superclass (Employees)
Note that if the superclass tuple is deleted, the delete must
be cascaded to Hourly_Emps
Approach2:
Create two relations, corresponding to Hourly_Emps and
Contract_Emps
The relation for Hourly_Emps includes all the attributes of
Hourly_Emps as well as all the attributes of Employees (i.e.,
ssn, name, lot, hourly_wages, hours_worked)
The relation for Contract_Emps includes all the attributes
of Contract_Emps as well as all the attributes of Employees
(i.e., ssn, name, lot, contractid)
Translating ER Diagrams with Aggregation
Consider the ER diagram shown in Figure 3.16
The Employees relation contain fields ssn, name and lot
with ssn as primary key
The Projects relation contain fields pid, started_on and
pbudget with pid as primary key
The Departments relation contain fields did, dname and
budget with did as primary key
The Sponsors relation contain fields did, pid and since with
(pid,did) as primary key
The Monitors relation contain attributes ssn, did, pid and
until with (ssn,pid,did) as primary key
INTRODUCTION TO VIEWS
A view is a table whose rows are not explicitly stored in
the database but are computed as needed from a view
definition
A view contains no data of it’s own
Consider the Students and Enrolled relations
In finding the names, cid's and sid's of students who got a
grade B in some course
We can define a view for this purpose
CREATE VIEW B-Students(name, sid, course)
AS SELECT S.sname, S.sid, E.cid
FROM Students S, Enrolled E
The view B-Students has three fields called name, sid, and
course with the same domains as the fields sname and sid in
Students and cid in Enrolled
If the optional arguments name, sid, and course are omitted
from the CREATE VIEW statement, the column names
sname, sid, and cid are inherited
This view can be used just like a base table, or explicitly
stored table, in defining new queries or views
Given the instances of Enrolled and Students shown in
Figure 3.4, B-students contains the tuples shown in Figure
3.18
Consider the view RegionalSales, defined below, which
computes sales of products by category and state:
CREATE VIEW RegionalSales(category, sales, state)
AS SELECT P.category, S.sales, L.state
FROM Products P, Sales S, Locations L
WHERE P.pid = S.pid AND S.locid =
L.locid
The following query computes the total sales for each
SELECT R.category, R.state, SUM(R.sales)
FROM RegionalSales R
GROUP BY R.category, R.state
Views, Data Independence, Security
The physical schema for a relational database describes
how the relations in the conceptual schema are stored, in
terms of the file organizations and indexes used
The view mechanism provides the support for logical data
independence in the relational model
For example, if the schema of a stored relation is changed,
we can define a view with the old schema, and applications
that expect to see the old schema can now use this view
Views are also valuable in the context of security
We can define views that give a group of users access to
just the information they are allowed to see
For example, we can define a view that allows students to
access other students name and age but not their gpa, and
not allowed to access underlying Students table
RELATIONAL ALGEBRA
Relational algebra is a formal query language associated
with the relational model
Queries in algebra are composed using a collection of
operators
We describe the basic operators of the algebra (selection,
projection, union, cross-product, and difference)
Selection and Projection
Relational algebra includes operators to select rows from a
relation(σ) and to project columns(π)
The selection operator σ specifies the tuples to retrieve
through a selection condition
Consider the instance of the Sailors relation shown in
Figure 4.2, denoted as S2
We can retrieve rows corresponding to sailors whose
rating is above 8 by using the σ operator
The expression σrating>8(S2)
evaluates to the relation shown in Figure 4.4
The projection operator π allows us to extract columns
from a relation
For example, we can find out all sailor names and ratings
by using π
The expression πsname,rating(S2) evaluates to the relation
shown in Figure 4.5
The subscript sname,rating specifies the fields to be
retained in the result
Find out only the ages of sailors
The expression πage(S2) evaluates to the relation shown
in Figure 4.6
Find sailor names and ratings, whose rating is above 8
The expression πsname,rating(σrating>8(S2)) produces the
result shown in Figure 4.7
Set Operations
Union
Intersection
Set-difference
Cross-product
Union Operation( )
returns a relation instance containing all tuples that
occur in either relation instance R or relation instance S (or
both)
The relations R and S must be compatible
Intersection Operation( )
returns a relation instance containing all tuples that
occur in both R and S
The relations R and S must be compatible
Set-difference Operation( )
returns a relation instance containing all tuples that
occur in R but not in S
The relations R and S must be compatible
Cross-product Operation( )
returns a relation instance contains all the fields of R
(in the same order as they appear in R) followed by all the
fields of S (in the same order as they appear in S)
The cross-product operation is also called Cartesian
product
Examples
sid
22
1. Find the sid's who are present in S1 and S2
28
πsid(S1) πsid(S2) result: 31
44
2. Find the sid's who are present in both S1 and S2
sid
πsid(S1) πsid(S2) result: 31
58
3. Find the sid's who are present in S1 but not in S2
sid
πsid(S1) πsid(S2) result:
22
4. The result of the cross-product is shown in Figure
4.11
Renaming
Field name conflicts can arise in some cases; for example,
sid in
This problem can be solved using a renaming operator( )
The expression returns an instance of a relation
contains field names in relation R are the same as in E,
except for fields renamed in the list F
No two fields in the result must have the same name
For example, the expression
returns a relation that contains the tuples shown in Figure
4.11 and has the following schema:
C(sid1:integer, sname: string, rating: integer, age: real, sid2:
integer, bid: integer, day: dates)
Note: Renaming is also used for naming intermediate
relations
Joins
Joins are used to combine two or more relations
Join can be defined as a cross-product followed by
selections and projections
Condition Joins
Join operation accepts a join condition c and a pair of
relation instances as arguments, and returns a relation
instance
The operation is defined as follows:
As an example, the result is shown
in Figure 4.12
Equijoin
In Equijoin, condition consists of equalities of the form
R.name1 = S.name2, that is, equalities between two fields in
R and S
The result is shown in Figure 4.13
Notice that only one field called sid appears in the result
Natural Join
is an equijoin in which equalities are specified on
all fields having the same name in R and S
The equijoin expression is actually a
natural join and can simply be denoted as
If the two relations have no attributes in common, is
simply the cross-product
Note: left outer join is denoted as right outer join is
denoted as full outer join is denoted as
Division
Consider two relation instances A and B in which A has
(exactly) two fields x and y and B has just one field y, with
the same domain as in A
We define the division operation as the set of all x
values such that for every y value in B, there is a tuple
Examples of Relational Algebra Queries
(Q1) Find the names of sailors who have reserved boat 103
Result: Dustin, Lubber and Horatio
(OR)
Using renaming operator
Here is another way to write this query:
(Q2) Find the names of sailors who have reserved a red
boat
Result: Dustin, Lubber and Horatio
An equivalent expression is
(Q3) Find the colors of boats reserved by Lubber
(Q4) Find the names of sailors who have reserved at least
one boat
(Q5) Find the names of sailors who have reserved a red or a
green boat
(Q6) Find the sids of sailors with age over 20 who have not
reserved a red boat
(Q7) Find the names of sailors who have reserved all boats
(Q8) Find the names of sailors who have reserved all boats
called Interlake
RELATIONAL CALCULUS
Relational calculus is an alternative to relational algebra
In contrast to the algebra, which is procedural language,
the calculus is nonprocedural language
Relational calculus is of two types
1.Tuple Relational Calculus(TRC)
2.Domain Relational Calculus(DRC)
TRC has had more of an influence on SQL, while DRC has
strongly influenced QBE
Tuple Relational Calculus
A tuple variable is a variable that takes on tuples of a
relation as values
A tuple relational calculus query has the form ,
where T is a tuple variable and p(T) denotes a formula that
describes T
The result of this query is the set of all tuples t for which
the formula p(T) evaluates to true with T = t
Example:
(Q) Find all sailors with a rating above 7
Syntax of TRC Queries
Let Rel be a relation name, R and S be tuple variables, a an
attribute of R, and b an attribute of S
Let op denote an operator in the set
An atomic formula is one of the following:
A formula is recursively defined to be one of the following,
where p and q are themselves formulas, and p(R) denotes a
formula in which the variable R appears:
In the last two clauses above, the quantifiers are
said to bind the variable R
A variable is said to be free in a formula if the formula
does not contain an occurrence of a quantifier that binds it
A TRC query is defined to be expression of the
form ,where T is the only free variable in the
formula p
Examples of TRC Queries
Consider the instances B1 of Boats, R2 of Reserves, and S3
of Sailors shown in Figures 4.15, 4.16, and 4.17
(Q)Find the names and ages of sailors with a rating above 7
(Q) Find the sailor name, boat id, and reservation date for
each reservation
(Q) Find the names of sailors who have reserved boat 103
(Q) Find the names of sailors who have reserved a red boat
(Q) Find the names of sailors who have reserved all boats
(Q) Find sailors who have reserved all red boats
Domain Relational Calculus
A domain variable is a variable that ranges over the
values in the domain of some attribute
Example: the variable can be assigned an integer if it
appears in an attribute whose domain is the set of integers
A DRC query has the form
where each xi is either a domain variable or a constant and
denotes a DRC formula
The result of this query is the set of all tuples
for which the formula evaluates to true
A DRC formula is defined in a manner that is very similar
to the definition of a TRC formula
The main difference is that the variables are now domain
variables
Let op denote an operator in the set
and let X and Y be domain variables
An atomic formula in DRC is one of the following:
A formula is recursively defined to be one of the
following, where p and q are themselves formulas, and
p(X) denotes a formula in which the variable X appears:
Examples of DRC Queries
(Q) Find all sailors with a rating above 7
(Q) Find the names of sailors who have reserved boat 103
(Or)