Relational Model
Relational Model
1
History
• The relational model was proposed by Codd in
1970. (At that time most databases were based on
the hierarchical model or the network model.)
• Prototype relational DBMSs were developed in
IBM and UC-Berkeley by the mid-1970s.
• The relational model attracted a lot of attention at
that time because it is simple and elegant.
• Today, the relational model is by far the dominant
data model and the foundation for the leading
DBMS.
2
Introduction
• The main construct for representing data in
the relational model is a set of relations.
• A relation consists of relation schema and
relation instance.
• Relation instance
– Simply a table with rows and columns.
• Row = tuple or record
• Column = field or attribute
3
• Relation schema
– Describes the column heads for the table.
– Specifies the relation’s name, the name of each
field, and the domain of each field.
• A domain is referred to by the domain name and has
a set of associated values.
Example:
Students(sid: string, name: string, login: string,
age: integer, gpa: real)
– We can also represent the schema in the
following way
The primary key
Students sid name login age gpa
4
is underlined
• An example of the corresponding relational
instance is as follows:
sid name login age gpa
5
• The relation model requires that no two rows are
identical. (In practice, some commercial systems
may allow tables to have duplicate rows).
• Each relation is defined to be a set of unique
tuples or rows.
• The order in which the rows are listed is not
important.
• The order or the fields is not important too.
• However, an alternative convention is to list fields
in a specific order and refer to a field by its
position.
6
• The values that appear in a column must be drawn from the domain
(domain constraints) associated with that column.
• Thus, the domain of a field is essentially the type of that field, and
restricts the values that can appear in the field.
• Formally
• Using this notation (named fields convention) the first tuple in the
previous table is written as:
<sid:50000, name:Dave, login: dave@cs,age:19,gpa:3.3>
7
• The degree, also called arity, of a relation is the
number of fields.
• The cardinality of a relation instance is the number
of tuples.
• A relational database is a collection of relations
with distinct relation names.
• The relational database schema is the collection of
schemas for the relations in the database.
• An instance of a relational database is a collection
of relation instances (one per relation schema).
8
Creating and Modifying
Relations using SQL
• The subset of SQL that supports the
creation, deletion, and modification of
tables is called the Data Definition
Language (DDL).
• To create the Student relation:
Create table Students
(sid CHAR(20),
name CHAR(30),
login CHAR(20),
age INTEGER,
gpaREAL)
9
• To insert a single tuple:
Insert
into Students (sid, name, login, age, gpa)
Values (53688, ‘Smith’, ‘smith@ee’, 18, 3.2) The list of column
names can be omitted.
• Note that we can optimally omit the list of
column names in the INTO clause and list
the values in the appropriate order.
• It is a good style to be explicit about
column names.
10
• Delete tuples using the DELETE command:
DELETE
FROM Students S
WHERE S.name = ‘Smith’
UPDATE Students S
SET S.age = S.age + 1, S.gpa = S.gpa – 1
WHERE S.sid = 53688
11
• Note that
– The WHERE clause is applied first and
determines which rows are to be modified.
– The set clause then determines how the rows
are to be modified.
– The expression on the right side of equals is the
old value.
12
sid name login age gpa
50000 Dave dave@cs 19 3.3
53666 Jones jones@cs 18 3.4
53688 Smith smith#@ee 18 3.2
53650 Smith smith@math 19 3.8
53831 Madayan madayan@music 11 1.8
53832 Guldu guldu@music 12 2.0
UPDATE Students S
SET S.gpa = S.gpa – 0.1
WHERE S.gpa >= 3.3
15
• In SQL, we can declare
– a key by the UNIQUE command
– a primary key by the PRIMARY KEY
constraint.
CREATE TABLE Students
(sid CHAR(20), Primary key
name CHAR(30),
login CHAR(20),
key
age INTEGER,
gpaREAL,
UNIQUE (name, age),
CONSTRAINT StudentsKey PRIMARY KEY ( sid))
17
• Besides Students, suppose we have the
following relation:
Enrolled(sid: string, cid: string, grade: string)
20
Primary key
sid name login age gpa
Foreign key
5000 Dave dave@cs 19 3.3
cid grade sid 0
CSC101 C 53831 5366 Jones jones@cs 18 3.4
6
ERG202 B 53832
5368 Smith smith#@ee 18 3.2
IEE321 A 53650 8
PSY203 B 53666 5365 Smith smith@math 19 3.8
0
5383 Madayan madayan@music 11 1.8
1
5383 Guldu guldu@music 12 2.0
• Cannot insert <55555, ART104,A>2 into Enroll, as there is
no tuple in Students with sid = 55555.
• Cannot delete <53666, Jones, jones@cs, 18,3.4> from
Students; since there is a tuple in Enroll with sid = 53666.
21
• We can specify the foreign key constraints
in SQL as follows:
CREATE TABLE Enrolled (
sid CHAR(20),
cid CHAR(20),
grade CHAR(10),
PRIMARY KEY (sid, cid),
FOREIGN KEY (sid) REFERENCES Students)
22
• SQL Provides several alternative ways to handle
foreign key violations:
– What should we do if an Enrolled row is inserted,
with a sid column value that does not appear in any
row of the Students table?
• The INSERT is simply reject.
– What should we do if a Students row is deleted?
1. Delete all Enrolled rows that refer to the deleted Students
row.
2. Disallow the deletion of the Students row if an Enrolled row
refers to it.
3. Set the sid column to the sid of some existing ‘default’
student, for every Enrolled row that refers to the deleted
Student row.
23
4. For every Enrolled row that refers to it, set the sid column to
null.
– What should we do if the primary key value of a
Students row is updated?
• The options here are similar to the previous case.
• SQL allows us to choose any of the four options
on DELETE and UPDATE.
CREATE TABLE Enrolled (
sid CHAR(20),
cid CHAR(20),
grade CHAR(10),
PRIMARY KEY (sid, cid),
FOREIGN KEY (sid) REFERENCES Students
ON DELETE CASCADE
ON UPDATE NO ACTION)
24
• The options are specified as part of the foreign key
declaration:
– NO ACTION (default)
• The action (DELETE or UPDATE) is to be rejected.
– CASCADE
• If a Students row is deleted, all Enrolled rows that refer to it
are to be deleted as well.
– SET DEFAULT
• Switch the enrollment to a ‘default’ student.
• The default student is specified as part of the definition of the
sid field in Enrolled
E.g., sid CHAR(20) DEFAULT ‘53666’.
– SET NULL
• Allows the use of null as the default value.
25
Logical Database Design:
ER to Relational
• The ER model is convenient for
representing an initial, high-level database
design.
• Given an ER diagram describing a database,
a standard approach is taken to generate a
relational database schema that closely
approximates the ER design.
26
• Entity Sets to Tables
– An entity set is mapped to a relation in a
straightforward way:
• attribute of entity set attribute of table.
name
CREATE TABLE Employees (
ID lot
id CHAR(11),
name CHAR(30),
lot INTEGER,
Employees PRIMARY KEY (id))
27
• Relationship Sets (without Constraints) to tables
– Suppose there is no key and participation constraints.
– Must be able to identify each participating entity and
give values to the descriptive attributes of the
relationship.
– The attributes of the relation include:
• The primary key attributes of each participating entity set, as
foreign key fields.
• The descriptive attributes of the relationship set.
– The set of nondescriptive attributes is a superkey for
the relation.
• If there is no key constraints, this set of attributes is a
candidate key.
28
Since
name dname
id lot
Employees
supervisor subordinate
Reports_to
We need to explicitly name the referenced field of Employees because the field 30
name differs from the name(s) of the referring field(s).
• Translating Relationship Sets with Key
Constraints
– If a relationship set involves n entity sets and
some m of them are linked via arrows in the ER
diagram.
• The key of any one of these m entity sets constitutes
a key for the relation to which the relationship set is
mapped.
• Hence, we have m candidate keys, and one of these
should be designated as the primary key.
31
since
name dname
As each department has at most one manager, no two tuples can have the same
did value but differ on the id value. Therefore, did is itself a key for Manages;
indeed, the set {did, id} is not a key because it is not minimal.
32
• The other approach is to include the information
about the relationship set in the table
corresponding to the entity set with the key.
– Avoids creating a distinct table for the relationship.
– The drawback to this approach is that space could be
waste if a lot of departments have no managers.
33
• Translating Relationship Sets with Participation Constraints
since
name dname
34
• Translating weak entity sets
– The weak entity set always
• participates in a one-to-many binary relationship,
• has a key constraint
• has total participation.
– The 2nd approach (p.33) for translation
relationship with key constraint can be used.
– We must take into account that the weak entity
has only a partial key.
– Also when an owner entity is deleted, we may
want all owned weak entities to be deleted.
35
cost
name
name
ID lot
Employees
Hour_worked
Contract_ID
Hour_wages ISA
Hourly-Emps Contract_Emps
37
• There are two approaches:
1. Map each of the entity sets Employees,
Hourly-Emps, and Contract-Emps to a
distinct relation.
CONTRACT-EMP id Contract_id
38
2. Create two relations, corresponding to
Hourly_Emps and Contract_Emps.
39
• The first method is more general.
– Disadvantage: an extra relation is needed.
– More operation may be necessary when we need to get
the employee information
– (e.g. looking up two relations).
40
• Translating ER diagrams with aggregation
name
ID address
Employees
Monitors until
started-on since
dname
pid pbudget did budget
41
– The Employees, Projects and Departments entity sets
and the Sponsors relationship set are mapped as
described previously.
– For the Monitors relationship set, we create relation with
(id, did, pid, until).
– Consider the sponsors relation: it has attributes pid, did
and since. We need this relation because
• To record the descriptive attributes of the sponsor relationship
(since).
• Not every sponsorship has a monitor, and thus some <pid,did>
pairs in the Sponsors relation may not appear in the Monitor
relation.
– If Sponsors has no descriptive attributes and has total
participation in Monitors, every possible instance of the
Sponsors relation can be obtained from the <pid, did>,
columns of Monitors sponsors can be dropped.
42