0% found this document useful (0 votes)
29 views

Lecture 4

The document discusses the relational data model and relational database constraints. It covers topics like entity-relationship diagrams, subclasses and superclasses, specialization and generalization, constraints on specialization and generalization, and categories or union types.

Uploaded by

zomukoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Lecture 4

The document discusses the relational data model and relational database constraints. It covers topics like entity-relationship diagrams, subclasses and superclasses, specialization and generalization, constraints on specialization and generalization, and categories or union types.

Uploaded by

zomukoza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

CSE221: Database Systems

Lecture 4: The Relational Data Model and


Relational Database Constraints
Professor Shaker El-Sappagh
[email protected]
Spring 2023
Outline
• EER diagram in a glance: Generalization/Specialization and union.
• Relational Model Concepts
• Relational Model Constraints and Relational Database Schemas
• Update Operations and Dealing with Constraint Violations
EER diagram
EER diagram

Subclasses and Superclasses (1)


• An entity type may have additional meaningful subgroupings of its entities
• Example: EMPLOYEE may be further grouped into:
• SECRETARY, ENGINEER, TECHNICIAN, …
• Based on the EMPLOYEE’s Job
• MANAGER
• EMPLOYEEs who are managers (the role they play)
• SALARIED_EMPLOYEE, HOURLY_EMPLOYEE
• Based on the EMPLOYEE’s method of pay
• EER diagrams extend ER diagrams to represent these additional subgroupings,
called subclasses or subtypes
EER diagram
Subclasses and Superclasses
EER diagram

Subclasses and Superclasses (2)


• Each of these subgroupings is a subset of EMPLOYEE entities
• Each is called a subclass of EMPLOYEE
• EMPLOYEE is the superclass for each of these subclasses
• These are called superclass/subclass relationships:
• EMPLOYEE/SECRETARY
• EMPLOYEE/TECHNICIAN
• EMPLOYEE/MANAGER
• …
EER diagram

Subclasses and Superclasses (3)


• These are also called IS-A relationships
• SECRETARY IS-A EMPLOYEE, TECHNICIAN IS-A EMPLOYEE, ….
• Note: An entity that is member of a subclass represents the same real-world
entity as some member of the superclass:
• The subclass member is the same entity in a distinct specific role
• An entity cannot exist in the database merely by being a member of a subclass; it
must also be a member of the superclass
• A member of the superclass can be optionally included as a member of any number
of its subclasses
EER diagram

Subclasses and Superclasses (4)


• Examples:
• A salaried employee who is also an engineer belongs to the two subclasses:
• ENGINEER, and
• SALARIED_EMPLOYEE
• A salaried employee who is also an engineering manager belongs to the three subclasses:
• MANAGER,
• ENGINEER, and
• SALARIED_EMPLOYEE
• It is not necessary that every entity in a superclass be a member of some subclass
EER diagram

Representing Specialization in EER Diagrams


EER diagram

Attribute Inheritance in Superclass / Subclass Relationships

• An entity that is member of a subclass inherits


• All attributes of the entity as a member of the superclass
• All relationships of the entity as a member of the superclass
• Example:
• In the previous slide, SECRETARY (as well as TECHNICIAN and ENGINEER)
inherit the attributes Name, SSN, …, from EMPLOYEE
• Every SECRETARY entity will have values for the inherited attributes
EER diagram

Specialization (1)
• Specialization is the process of defining a set of subclasses of a
superclass
• The set of subclasses is based upon some distinguishing
characteristics of the entities in the superclass
• Example: {SECRETARY, ENGINEER, TECHNICIAN} is a specialization of
EMPLOYEE based upon job type.
• Example: MANAGER is a specialization of EMPLOYEE based on the role
the employee plays
• May have several specializations of the same superclass
EER diagram

Specialization (2)
• Example: Another specialization of EMPLOYEE based on method of pay is
{SALARIED_EMPLOYEE, HOURLY_EMPLOYEE}.
• Attributes of a subclass are called specific or local attributes.
• For example, the attribute TypingSpeed of SECRETARY
• The subclass can also participate in specific relationship types.
• For example, a relationship BELONGS_TO of HOURLY_EMPLOYEE
EER diagram
Specialization (3)
EER diagram

Generalization

• Generalization is the reverse of the specialization process


• Several classes with common features are generalized into a superclass;
• original classes become its subclasses
• Example: CAR, TRUCK generalized into VEHICLE;
• both CAR, TRUCK become subclasses of the superclass VEHICLE.
• We can view {CAR, TRUCK} as a specialization of VEHICLE
• Alternatively, we can view VEHICLE as a generalization of CAR and TRUCK
EER diagram

Generalization (2)
EER diagram

Constraints on Specialization and Generalization (3)

• Two basic constraints can apply to a specialization/generalization:


• Disjointness Constraint:
• Completeness Constraint:
EER diagram

Constraints on Specialization and Generalization (4)

• Disjointness Constraint:
• Specifies that the subclasses of the specialization must be disjoint:
• an entity can be a member of at most one of the subclasses of the specialization
• Specified by d in EER diagram
• If not disjoint, specialization is overlapping:
• that is the same entity may be a member of more than one subclass of the specialization
• Specified by o in EER diagram
EER diagram

Constraints on Specialization and Generalization (5)

• Completeness (Exhaustiveness) Constraint:


• Total specifies that every entity in the superclass must be a member of some
subclass in the specialization/generalization
• Shown in EER diagrams by a double line
• Partial allows an entity not to belong to any of the subclasses
• Shown in EER diagrams by a single line
EER diagram

Constraints on Specialization and Generalization (6)

• Hence, we have four types of specialization/generalization:


• Disjoint, total
• Disjoint, partial
• Overlapping, total
• Overlapping, partial
• Note: Generalization usually is total because the superclass is derived
from the subclasses.
EER diagram

Example of disjoint partial Specialization


EER diagram

Example of overlapping total Specialization


EER diagram

Specialization/Generalization Hierarchies, Lattices & Shared


Subclasses (1)
• A subclass may itself have further subclasses specified on it
• forms a hierarchy or a lattice
• Hierarchy has a constraint that every subclass has only one superclass
(called single inheritance); this is basically a tree structure
• In a lattice, a subclass can be subclass of more than one superclass
(called multiple inheritance)
EER diagram
Shared Subclass “Engineering_Manager”
EER diagram

Specialization/Generalization Hierarchies, Lattices & Shared


Subclasses (2)
• In a lattice or hierarchy, a subclass inherits attributes not only of its direct
superclass, but also of all its predecessor superclasses
• A subclass with more than one superclass is called a shared subclass (multiple
inheritance)
• Can have:
• specialization hierarchies or lattices, or
• generalization hierarchies or lattices,
• depending on how they were derived
EER diagram

Specialization/Generalization Hierarchies, Lattices & Shared


Subclasses (3)
• In specialization, start with an entity type and then define subclasses
of the entity type by successive specialization
• called a top down conceptual refinement process
• In generalization, start with many entity types and generalize those
that have common properties
• Called a bottom up conceptual synthesis process
• In practice, a combination of both processes is usually employed
EER diagram
Specialization / Generalization Lattice Example (UNIVERSITY)
EER diagram

Categories (UNION TYPES) (1)


• Most superclass/subclass relationships we have seen thus far have a single superclass
• A shared subclass is a subclass in:
• more than one distinct superclass/subclass relationships
• each relationships has a single superclass
• shared subclass leads to multiple inheritance
• In some cases, we need to model a single superclass/subclass relationship with more
than one superclass
• Superclasses can represent different entity types
• Such a subclass is called a category or UNION TYPE
EER diagram

Categories (UNION TYPES) (2)


• Example: In a database for vehicle registration, a vehicle owner can be a PERSON,
a BANK (holding a lien on a vehicle) or a COMPANY.
• A category (UNION type) called OWNER is created to represent a subset of the union
of the three superclasses COMPANY, BANK, and PERSON
• A category member must exist in at least one (typically just one) of its superclasses
• Difference from shared subclass, which is a:
• subset of the intersection of its superclasses
• shared subclass member must exist in all of its superclasses
EER diagram
Two categories (UNION types): OWNER, REGISTERED_VEHICLE
Relational model
Relational Model Concepts
• The relational Model of Data is based on the concept of a
Relation
• The strength of the relational approach to data management comes
from the formal foundation provided by the theory of relations
• A Relation is a mathematical concept based on the ideas of sets
• The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in
the following paper:
• "A Relational Model for Large Shared Data Banks," Communications of the ACM,
June 1970
• The above paper caused a major revolution in the field of database
management and earned Dr. Codd the coveted ACM Turing Award
Informal Definitions

• Informally, a relation looks like a table of values.

• A relation typically contains a set of rows.

• The data elements in each row represent certain facts that correspond to a real-world
entity or relationship
• In the formal model, rows are called tuples

• Each column has a column header that gives an indication of the meaning of the data
items in that column
• In the formal model, the column header is called an attribute name (or just attribute)
Example of a Relation
Informal Definitions
• Key of a Relation:
• Each row has a value of a data item (or set of items) that uniquely
identifies that row in the table
• Called the key
• In the STUDENT table, SSN is the key

• Sometimes row-ids or sequential numbers are assigned as keys to identify


the rows in a table
• Called artificial key or surrogate key
Formal Definitions - Schema
• The Schema (or description) of a Relation:
• Denoted by R(A1, A2, .....An)
• R is the name of the relation
• The attributes of the relation are A1, A2, ..., An
• Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
• CUSTOMER is the relation name
• Defined over the four attributes: Cust-id, Cust-name, Address, Phone#
• Each attribute has a domain or a set of valid values.
• For example, the domain of Cust-id is 6 digit numbers.
Formal Definitions - Tuple
• A tuple is an ordered set of values (enclosed in angled brackets ‘< … >’)
• Each value is derived from an appropriate domain.
• A row in the CUSTOMER relation is a 4-tuple and would consist of four values, for
example:
• <632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000">
• This is called a 4-tuple as it has 4 values
• A tuple (row) in the CUSTOMER relation.
• A relation is a set of such tuples (rows)
Formal Definitions - Domain
• A domain has a logical definition:
• Example: “USA_phone_numbers” are the set of 10 digit phone numbers valid in the U.S.
• A domain also has a datatype or a format defined for it.
• The USA_phone_numbers may have a format: (ddd)ddd-dddd where each d is a decimal digit.
• Dates have various formats such as year, month, date formatted as yyyy-mm-dd, or as dd
mm,yyyy etc.

• The attribute name designates the role played by a domain in a relation:


• Used to interpret the meaning of the data elements corresponding to that attribute
• Example: The domain Date may be used to define two attributes named “Invoice-date” and
“Payment-date” with different meanings
Formal Definitions - State
• The relation state is a subset of the Cartesian product of the domains
of its attributes
• each domain contains the set of all possible values the attribute can take.
• Example: attribute Cust-name is defined over the domain of character
strings of maximum length 25
• dom(Cust-name) is varchar(25)
• The role these strings play in the CUSTOMER relation is that of the
name of a customer.
Formal Definitions - Summary
• Formally,
• Given R(A1, A2, .........., An)
• r(R)  dom (A1) X dom (A2) X ....X dom(An)
• R(A1, A2, …, An) is the schema of the relation
• R is the name of the relation
• A1, A2, …, An are the attributes of the relation
• r(R): a specific state (or "value" or “population”) of relation R – this is a set of
tuples (rows)
• r(R) = {t1, t2, …, tn} where each ti is an n-tuple
• ti = <v1, v2, …, vn> where each vj element-of dom(Aj)
Formal Definitions - Example
• Let R(A1, A2) be a relation schema:
• Let dom(A1) = {0,1}
• Let dom(A2) = {a,b,c}
• Then: dom(A1) X dom(A2) is all possible combinations:
{<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> }

• The relation state r(R)  dom(A1) X dom(A2)


• For example: r(R) could be {<0,a> , <0,b> , <1,c> }
• this is one possible state (or “population” or “extension”) r of the relation R, defined
over A1 and A2.
• It has three 2-tuples: <0,a> , <0,b> , <1,c>
Definition Summary
Informal Terms Formal Terms
Table Relation
Column Header Attribute
All possible Column Domain
Values
Row Tuple

Table Definition Schema of a Relation


Populated Table State of the Relation
Example – A relation STUDENT
Characteristics Of Relations
• Ordering of tuples in a relation r(R):
• The tuples are not considered to be ordered, even though they appear to be
in the tabular form.
• Ordering of attributes in a relation schema R (and of values within each tuple):
• We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2,
..., vn> to be ordered .
• (However, a more general alternative definition of relation does not require this
ordering. It includes both the name and the value for each of the attributes ).
• Example: t= { <name, “John” >, <SSN, 123456789> }
• This representation may be called as “self-describing”.
Same state as previous Figure (but with
different order of tuples)
Characteristics Of Relations
• Values in a tuple:
• All values are considered atomic (indivisible).
• Each value in a tuple must be from the domain of the attribute for that
column
• If tuple t = <v1, v2, …, vn> is a tuple (row) in the relation state r of R(A1, A2, …, An)
• Then each vi must be a value from dom(Ai)

• A special null value is used to represent values that are unknown or not
available or inapplicable in certain tuples.
Characteristics Of Relations
• Notation:
• We refer to component values of a tuple t by:
• t[Ai] or t.Ai
• This is the value vi of attribute Ai for tuple t
• Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of
attributes Au, Av, ..., Aw, respectively in t
CONSTRAINTS
Constraints determine which values are permissible and which are not in the
database.
They are of three main types:
1. Inherent or Implicit Constraints: These are based on the data model itself. (E.g.,
relational model does not allow a list as a value for any attribute)
2. Schema-based or Explicit Constraints: They are expressed in the schema by using
the facilities provided by the model. (E.g., max. cardinality ratio constraint in the ER
model)
3. Application based or semantic constraints: These are beyond the expressive power
of the model and must be specified and enforced by the application programs.
Relational Integrity Constraints
• Constraints are conditions that must hold on all valid relation states.
• There are three main types of (explicit schema-based) constraints that can be
expressed in the relational model:
• Key constraints
• Entity integrity constraints
• Referential integrity constraints
• Another schema-based constraint is the domain constraint
• Every value in a tuple must be from the domain of its attribute (or it could be null, if
allowed for that attribute)
Key Constraints
• Superkey of R:
• Is a set of attributes SK of R with the following condition:
• No two tuples in any valid relation state r(R) will have the same value for SK
• That is, for any distinct tuples t1 and t2 in r(R), t1[SK]  t2[SK]
• This condition must hold in any valid state r(R)
• Key can not be NULL.
• Key of R:
• A "minimal" superkey
• That is, a key is a superkey K such that removal of any attribute from K results in a set
of attributes that is not a superkey (does not possess the superkey uniqueness
property)

• A Key is a Superkey but not vice versa


Key Constraints (continued)
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• CAR has two keys:
• Key1 = {State, Reg#}
• Key2 = {SerialNo}
• Both are also superkeys of CAR
• {SerialNo, Make} is a superkey but not a key.
• In general:
• Any key is a superkey (but not vice versa)
• Any set of attributes that includes a key is a superkey
• A minimal superkey is also a key
Key Constraints (continued)
• If a relation has several candidate keys, one is chosen arbitrarily to be the primary
key.
• The primary key attributes are underlined.
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• We chose SerialNo as the primary key
• The primary key value is used to uniquely identify each tuple in a relation
• Provides the tuple identity
• Also, used to reference the tuple from another tuple
• General rule: Choose as primary key the smallest of the candidate keys (in terms of
size)
CAR table with two candidate keys – LicenseNumber chosen as
Primary Key
Relational Database Schema
• Relational Database Schema:
• A set S of relation schemas that belong to the same database.
• S is the name of the whole database schema
• S = {R1, R2, ..., Rn} and a set IC of integrity constraints.
• R1, R2, …, Rn are the names of the individual relation schemas within the
database S
• Following slide shows a COMPANY database schema with 6 relation
schemas
COMPANY Database Schema
Relational Database State
• A relational database state DB of S is a set of relation states DB = {r1, r2, ..., rm}
such that each ri is a state of Ri and such that the ri relation states satisfy the
integrity constraints specified in IC.
• A relational database state is sometimes called a relational database snapshot
or instance.
• We will not use the term instance since it also applies to single tuples.
• A database state that does not meet the constraints is an invalid state
Populated database state
• Each relation will have many tuples in its current relation state
• The relational database state is a union of all the individual relation states
• Whenever the database is changed, a new state arises
• Basic operations for changing the database:
• INSERT a new tuple in a relation
• DELETE an existing tuple from a relation
• MODIFY an attribute of an existing tuple
• Next slide (Fig. 5.6) shows an example state for the COMPANY database schema
shown in Fig. 5.5.
Populated database state for COMPANY
Entity Integrity
• Entity Integrity:
• The primary key attributes PK of each relation schema R in S cannot have null
values in any tuple of r(R).
• This is because primary key values are used to identify the individual tuples.
• t[PK]  null for any tuple t in r(R)
• If PK has several attributes, null is not allowed in any of these attributes
• Note: Other attributes of R may be constrained to disallow null values, even
though they are not members of the primary key.
Referential Integrity
• A constraint involving two relations
• The previous constraints involve a single relation.
• Used to specify a relationship among tuples in two relations:
• The referencing relation and the referenced relation.
Referential Integrity
• Tuples in the referencing relation R1 have attributes FK (called
foreign key attributes) that reference the primary key attributes PK of
the referenced relation R2.
• A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
• A referential integrity constraint can be displayed in a relational
database schema as a directed arc from R1.FK to R2.
Referential Integrity (or foreign key) Constraint
• Statement of the constraint
• The value in the foreign key column (or columns) FK of the the referencing
relation R1 can be either:
(1) a value of an existing primary key value of a corresponding primary key PK in the
referenced relation R2, or
(2) a null.
• In case (2), the FK in R1 should not be a part of its own primary key.
Referential Integrity Constraints for COMPANY database
Other Types of Constraints
• Semantic Integrity Constraints:
• based on application semantics and cannot be expressed by
the model per se
• Example: “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week”
• A constraint specification language may have to be used to
express these
• SQL-99 allows CREATE TRIGGER and CREATE ASSERTION to
express some of these semantic constraints
• Keys, Permissibility of Null values, Candidate Keys (Unique in
SQL), Foreign Keys, Referential Integrity etc. are expressed by
the CREATE TABLE statement in SQL.
Update Operations on Relations
• INSERT a tuple.
• DELETE a tuple.
• MODIFY a tuple.
• Integrity constraints should not be violated by the update operations.
• Several update operations may have to be grouped together.
• Updates may propagate to cause other updates automatically. This
may be necessary to maintain integrity constraints.
Update Operations on Relations
• In case of integrity violation, several actions can be taken:
• Cancel the operation that causes the violation (RESTRICT or REJECT option)
• Perform the operation but inform the user of the violation
• Trigger additional updates so the violation is corrected (CASCADE option, SET
NULL option)
• Execute a user-specified error-correction routine

Slide 5- 70
Possible violations for each operation
• INSERT may violate any of the constraints:
• Domain constraint:
• if one of the attribute values provided for the new tuple is not of the specified attribute
domain
• Key constraint:
• if the value of a key attribute in the new tuple already exists in another tuple in the
relation
• Referential integrity:
• if a foreign key value in the new tuple references a primary key value that does not exist
in the referenced relation
• Entity integrity:
• if the primary key value is null in the new tuple
Possible violations for each operation
• DELETE may violate only referential integrity:
• If the primary key value of the tuple being deleted is referenced from other tuples in
the database
• Can be remedied by several actions: RESTRICT, CASCADE, SET NULL (see Chapter 6 for
more details)
• RESTRICT option: reject the deletion
• CASCADE option: propagate the new primary key value into the foreign keys of the
referencing tuples
• SET NULL option: set the foreign keys of the referencing tuples to NULL
• One of the above options must be specified during database design for each foreign
key constraint
Possible violations for each operation
• UPDATE may violate domain constraint and NOT NULL constraint on an attribute
being modified
• Any of the other constraints may also be violated, depending on the attribute
being updated:
• Updating the primary key (PK):
• Similar to a DELETE followed by an INSERT
• Need to specify similar options to DELETE
• Updating a foreign key (FK):
• May violate referential integrity
• Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain constraints
How to model inheritance in relational database
• Three possible strategies to convert the
entities connected by inheritance in the logical
model into the tables of the physical model.
These strategies are:

• One table per inheritance hierarchy.


• One table per entity.
• One table per entity with all attributes.
How to model inheritance in relational database
1. One table per inheritance
hierarchy:
• Only one table to represent all the
inheritance hierarchy.
• This table will have one column for
each attribute in the hierarchy.
• All the attributes from any entity in the
inheritance will be in the generated
table.
How to model inheritance in relational database
1. One table per inheritance hierarchy:

• Table vehicle has columns taken from four different entities in the inheritance model:
vehicle, car, bike, and electrical_bike
• Generated inheritance table have an additional column called discriminator, which will be
used to identify what type of entity in the hierarchy the record is representing
• Many columns are nullable
How to model inheritance in relational database
2. One table per entity

• We have one table per entity in the inheritance hierarchy.


• We have foreign keys between every parent and child entity connected by inheritance.
• The discriminator column is not needed; different objects (cars, bikes) are stored in
different tables.
• The relationship is only between the car_owner and car tables.
How to model inheritance in relational database
3. One Table Per Entity
with All Attributes

• Create one table per entity in the inheritance hierarchy,


but each table will have all the attributes of its parent
entities.
• Note: the composition of the primary keys in the tables
car and bike. For example, the bike primary key is
formed by the columns bike_id and vehicle_id.
How to model inheritance in relational database
3. One Table Per Entity
with All Attributes

• Note the duplication of


data.
Thank you

You might also like