0% found this document useful (0 votes)
18 views99 pages

DBS-er Model

The document provides an overview of Entity Relationship (E-R) modeling, detailing the concepts of entities, attributes, and relationships within a database. It explains the structure of E-R diagrams, including the representation of different types of relationships, cardinality constraints, and the distinction between strong and weak entity sets. Additionally, it discusses advanced features such as specialization, generalization, and aggregation, as well as the implications for relational database design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views99 pages

DBS-er Model

The document provides an overview of Entity Relationship (E-R) modeling, detailing the concepts of entities, attributes, and relationships within a database. It explains the structure of E-R diagrams, including the representation of different types of relationships, cardinality constraints, and the distinction between strong and weak entity sets. Additionally, it discusses advanced features such as specialization, generalization, and aggregation, as well as the implications for relational database design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Entity Relationship Modeling (E R Model)

 Database can be modeled as:


– Collection of entities
– Relationship among entities

 Entity is an object that exists and is distinguishable from other objects.


– Example: specific person, company, event, plant

 Entities have attributes


– Example: people have names and addresses

 An entity set is a set of entities of the same type that share the same properties.
– Example: set of all persons, companies, trees, holidays
Entity Sets customer and loan

customer_id customer_ name customer_street customer_ city loan_number amount


Relationship Sets
 A relationship is an association among several entities
Example:
Hayes depositor A-102
customer entity relationship set account entity

 A relationship set is a mathematical relation among n >= 2 entities, each taken


from entity sets
{(e1, e2, … en) | e1 E1, e2 E2, …, en  En}

where (e1, e2, …, en) is a relationship


Example:
(Hayes, A-102) depositor
Relationship Set borrower
Relationship Sets… Contd.

 An attribute can also be property of a relationship set.

 For instance, the depositor relationship set between entity sets customer and
account may have the attribute access-date
Degree of a Relationship Set

 Refers to number of entity sets that participate in a relationship set.

 Relationship sets that involve two entity sets are binary (or degree two).
Generally, most relationship sets in a database system are binary.

 Relationship sets may involve more than two entity sets.


 Example: Suppose employees of a bank may have jobs (responsibilities) at multiple
branches, with different jobs at different branches. Then there is a ternary relationship
set between entity sets employee, job, and branch

 Relationships between more than two entity sets are rare. Most relationships
are binary.
E-R Diagram with a Ternary Relationship
Attributes
 An entity is represented by a set of attributes, that is descriptive properties
possessed by all members of an entity set.
Example:
customer = (customer_id, customer_name, customer_street, customer_city )
loan = (loan_number, amount )

 Domain – the set of permitted values for each attribute

 Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
• Example: multivalued attribute: phone_numbers
– Derived attributes
• Can be computed from other attributes
• Example: age, given date_of_birth
Composite Attributes
Relationship Sets with Attributes

12
E-R Diagram With Composite, Multivalued, and Derived Attributes
Mapping Cardinality Constraints

 Express the number of entities to which another entity can be associated via a
relationship set.

 Most useful in describing binary relationship sets.

 For a binary relationship set the mapping cardinality must be one of the
following types:
– One to one

– One to many

– Many to one

– Many to many
Mapping Cardinalities…contd.

One to one One to many


Note: Some elements in A and B may not be mapped to any elements in the other set

Many to one Many to many


Note: Some elements in A and B may not be mapped to any elements in the other set
E-R Diagrams

 Rectangles represent entity sets.

 Diamonds represent relationship sets.

 Lines link attributes to entity sets and entity sets to relationship sets.

 Ellipses represent attributes


 Double ellipses represent multivalued attributes.
 Dashed ellipses denote derived attributes.
 Underline indicates primary key attributes
Cardinality Constraints
• We express cardinality constraints by drawing either a directed line ( ),
signifying “one,” or an undirected line (—), signifying “many,” between the
relationship set and the entity set.
• One-to-one relationship:
– A customer is associated with at most one loan via the relationship borrower
– A loan is associated with at most one customer via borrower

One-To-Many Relationship
• In the one-to-many relationship a loan is associated with at most one customer via borrower,
a customer is associated with several (including 0) loans via borrower
Roles
• Entity sets of a relationship need not be distinct.

• The labels “manager” and “worker” are called roles; they specify how employee entities
interact via the works_for relationship set.

• Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to
rectangles.

• Role labels are optional, and are used to clarify semantics of the relationship
Cardinality Constraints
• We express cardinality constraints by drawing either a directed line (->), signifying
“one,” or an undirected line (—), signifying “many,” between the relationship set and
the entity set.
• One-to-one relationship:
– A customer is associated with at most one loan via the relationship borrower
– A loan is associated with at most one customer via borrower
One-To-Many Relationship
• In the one-to-many relationship a loan is associated with at most one customer via borrower,
a customer is associated with several (including 0) loans via borrower
Many-To-One Relationships
• In a many-to-one relationship a loan is associated with several (including 0) customers
via borrower, a customer is associated with at most one loan via borrower

Many-To-Many Relationship
 A customer is associated with several (possibly 0) loans via borrower
 A loan is associated with several (possibly 0) customers via borrower
Participation of an Entity Set in a Relationship Set

 Total participation (indicated by double line): every entity in the entity set participates in at
least one relationship in the relationship set
 E.g. participation of loan in borrower is total
 every loan must have a customer associated to it via borrower

 Partial participation: some entities may not participate in any relationship in the relationship
set
 Example: participation of customer in borrower is partial
Alternative Notation for Cardinality Limits
 Cardinality limits can also express participation constraints
Weak Entity Sets
 An entity set that does not have a primary key is referred to as a weak entity set.

 The existence of a weak entity set depends on the existence of a identifying entity
set
– It must relate to the identifying entity set via a total, one-to-many relationship
set from the identifying to the weak entity set
– Identifying relationship depicted using a double diamond

 The discriminator (or partial key) of a weak entity set is the set of attributes that
distinguishes among all the entities of a weak entity set.

 The primary key of a weak entity set is formed by the primary key of the strong
entity set on which the weak entity set is existence dependent, plus the weak entity
set’s discriminator.
Weak Entity Sets … .contd

 We depict a weak entity set by double rectangles.

 We underline the discriminator of a weak entity set with a dashed line.

 payment_number – discriminator of the payment entity set

 Primary key for payment – (loan_number, payment_number)


Reduction to Relation Schemas

 Primary keys allow entity sets and relationship sets to be expressed uniformly
as relation schemas that represent the contents of the database.

 A database which conforms to an E-R diagram can be represented by a


collection of schemas.

 For each entity set and relationship set there is a unique schema that is assigned
the name of the corresponding entity set or relationship set.

 Each schema has a number of columns (generally corresponding to attributes),


which have unique names.
Representing Entity Sets as Schemas

 A strong entity set reduces to a schema with the same attributes.

 A weak entity set becomes a table that includes a column for the primary key
of the identifying strong entity set
payment =
( loan_number, payment_number, payment_date, payment_amount )

Representing Relationship Sets as Schemas

 A many-to-many relationship set is represented as a schema with attributes


for the primary keys of the two participating entity sets, and any descriptive
attributes of the relationship set.

 Example: schema for relationship set borrower


borrower = (customer_id, loan_number )
Redundancy of Schemas…contd.

 For one-to-one relationship sets, either side can be chosen to act as the
“many” side.
- That is, extra attribute can be added to either of the tables corresponding
to the two entity sets.

 If participation is partial on the “many” side, replacing a schema by an extra


attribute in the schema corresponding to the “many” side could result in null
values

 The schema corresponding to a relationship set linking a weak entity set to its
identifying strong entity set is redundant.

 Example: The payment schema already contains the attributes that would
appear in the loan_payment schema (i.e., loan_number and
payment_number).
Redundancy of Schemas
 Many-to-one and one-to-many relationship sets that are total on the many-side can be
represented by adding an extra attribute to the “many” side, containing the primary key
of the “one” side

 Example: Instead of creating a schema for relationship set account_branch, add an


attribute branch_name to the schema arising from entity set account
Composite and Multivalued Attributes
 Composite attributes are flattened out by creating a separate attribute for each
component attribute
– Example: given entity set customer with composite attribute name with component
attributes first_name and last_name the schema corresponding to the entity set has two
attributee name.first_name and name.last_name

 A multivalued attribute M of an entity E is represented by a separate schema EM


– Schema EM has attributes corresponding to the primary key of E and an attribute
corresponding to multivalued attribute M

– Example: Multivalued attribute dependent_names of employee is represented by a


schema:
employee_dependent_names = ( employee_id, dname)

– Each value of the multivalued attribute maps to a separate tuple of the relation on schema
EM
For example, an employee entity with primary key 123-45-6789 and dependents Jack
and Jane maps to two tuples:
(123-45-6789 , Jack) and (123-45-6789 , Jane)
Extended E-R Features: Specialization

• Top-down design process; we designate subgroupings within an entity set that


are distinctive from other entities in the set.

• These subgroupings become lower-level entity sets that have attributes or


participate in relationships that do not apply to the higher-level entity set.

• Depicted by a triangle component labeled ISA (E.g. customer “is a” person).

• Attribute inheritance – a lower-level entity set inherits all the attributes and
relationship participation of the higher-level entity set to which it is linked.
Specialization
Extended ER Features: Generalization
 A bottom-up design process – combine a number of entity sets that share the same
features into a higher-level entity set.

 Specialization and generalization are simple inversions of each other; they are
represented in an E-R diagram in the same way.

 The terms specialization and generalization are used interchangeably.


Specialization and Generalization

 Can have multiple specializations of an entity set based on different features.


 E.g. permanent_employee vs. temporary_employee, in addition to officer vs.
secretary vs. teller
 Each particular employee would be
- a member of one of permanent_employee or temporary_employee,
- and also a member of one of officer, secretary, or teller
 The ISA relationship also referred to as superclass - subclass relationship
Design Constraints on a Specialization/Generalization
 Constraint on which entities can be members of a given lower-level entity set.
– Condition-defined
Example: all customers over 65 years are members of senior-citizen entity
set; senior-citizen ISA person.
– User-defined
 Constraint on whether or not entities may belong to more than one lower-level
entity set within a single generalization.
– Disjoint
• An entity can belong to only one lower-level entity set
• Noted in E-R diagram by writing disjoint next to the ISA triangle
– Overlapping
• An entity can belong to more than one lower-level entity set
• Completeness constraint -- specifies whether or not an entity in the higher-level
entity set must belong to at least one of the lower-level entity sets within a
generalization.
– Total : an entity must belong to one of the lower-level entity sets
– Partial: an entity need not belong to one of the lower-level entity sets
Attribute Defined or Condition Defined Subclasses: In some specializations we can
determine exactly the entities that will become members of each subclass by placing a
condition on the value of some attribute of the superclass. Such subclasses are called
predicate-defined (or condition-defined) subclasses.

User Defined Subclass: When we do not have a condition for determining membership in
a subclass, the subclass is called user-defined. Membership in such a subclass is
determined by the database users when they apply the operation to add an entity to the
subclass
A total specialization constraint specifies that every entity in the superclass must be a
member of at least one subclass in the specialization. A double line is used to display a
partial specialization

E.g.: every EMPLOYEE must be either an HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE

Partial Participation : Which allows an entity not to belong to any of the subclasses. A single line is
used to display a partial specialization
Aggregation

 Consider the ternary relationship works_on, which we saw earlier


 Suppose we want to record managers for tasks performed by an employee at a branch
Aggregation … . contd

 Relationship sets works_on and manages represent overlapping information


– Every manages relationship corresponds to a works_on relationship
– However, some works_on relationships may not correspond to any manages
relationships
• So we can’t discard the works_on relationship

 Eliminate this redundancy via aggregation


– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity

 Without introducing redundancy, the following diagram represents:


– An employee works on a particular job at a particular branch
– An employee, branch, job combination may have an associated manager
E-R Diagram With Aggregation
Representing Specialization via Schemas

• Method 1:
– Form a schema for the higher-level entity
– Form a schema for each lower-level entity set, include primary key
of higher-level entity set and local attributes

schema attributes
person name, street, city
customer name, credit_rating
employee name, salary

– Drawback: getting information about, an employee requires


accessing two relations, the one corresponding to the low-level
schema and the one corresponding to the high-level schema
Representing Specialization as Schemas (Cont.)
• Method 2:
– Form a schema for each entity set with all local and inherited attributes

schema attributes
person name, street, city
customer name, street, city, credit_rating
employee name, street, city, salary

– If specialization is total, the schema for the generalized entity set (person)
not required to store information
• Can be defined as a “view” relation containing union of specialization
relations
• But explicit schema may still be needed for foreign key constraints

– Drawback: street and city may be stored redundantly for people who are
both customers and employees
E-R Design Decisions

 The use of an attribute or entity set to represent an object.

 Whether a real-world concept is best expressed by an entity set or a relationship set.

 The use of a ternary relationship versus a pair of binary relationships.

 The use of a strong or weak entity set.

 The use of specialization/generalization – contributes to modularity in the design.

 The use of aggregation – can treat the aggregate entity set as a single unit without
concern for the details of its internal structure.
E-R Diagram for a Banking Enterprise
Data Base Design
Relational Database Design

 Features of Good Relational Design

 Atomic Domains and First Normal Form

 Decomposition Using Functional Dependencies

 Functional Dependency Theory

 Algorithms for Functional Dependencies

 Decomposition Using Multivalued Dependencies

 More Normal Form

 Database-Design Process

 Modeling Temporal Data


The Banking Schema
 branch = (branch_name, branch_city, assets)

 customer = (customer_id, customer_name, customer_street, customer_city)

 loan = (loan_number, amount)

 account = (account_number, balance)

 employee = (employee_id. employee_name, telephone_number, start_date)

 dependent_name = (employee_id, dname)

 account_branch = (account_number, branch_name)

 loan_branch = (loan_number, branch_name)

 borrower = (customer_id, loan_number)

 depositor = (customer_id, account_number)

 cust_banker = (customer_id, employee_id, type)

 works_for = (worker_employee_id, manager_employee_id)

 payment = (loan_number, payment_number, payment_date, payment_amount)

 savings_account = (account_number, interest_rate)

 checking_account = (account_number, overdraft_amount)


Design Alternatives: Combine Schemas
 Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_number, amount )
 Result is possible repetition of information (L-100 in example below)
Design Alternatives: A Combined Schema Without Repetition
 Consider combining loan_branch and loan
loan_amt_br = (loan_number, amount, branch_name)

 No repetition but we have to create tuple with null value for amount
Design Alternatives: Smaller Schemas
 Suppose we had started with bor_loan. How would we know to split up (decompose) it
into borrower and loan?

 Write a rule “if there were a schema (loan_number, amount), then loan_number
would be a candidate key”

 Denote as a functional dependency:


loan_number  amount

 In bor_loan, because loan_number is not a candidate key, the amount of a loan may have
to be repeated. This indicates the need to decompose bor_loan.

 Not all decompositions are good. Suppose we decompose employee into


employee1 = (employee_id, employee_name)
employee2 = (employee_name, telephone_number, start_date)

 The next slide shows how we lose information - we cannot reconstruct the original
employee relation -- and so, this is a lossy decomposition.
A Lossy Decomposition
Guideline 1
• Design a relation schema so that it is easy to explain its meaning. Do
not combine attributes from multiple entity types and relationship
types into a single relation. Intuitively, If a relation schema
corresponds to one entity type or one relationship type, it is
straightforward to interpret and to explain its meaning . Otherwise, if
the relation corresponds to mixture of multiple entities and
relationships, Semantic ambiguities will result and the relation cannot
be easily explained
First Normal Form

 Domain is atomic if its elements are considered to be indivisible units


– Examples of non-atomic domains:
• Set of names, composite attributes
• Identification numbers like CS101 that can be broken up into parts

 A relational schema R is in first normal form if the domains of all attributes of


R are atomic.

 Non-atomic values complicate storage and encourage redundant (repeated) storage of


data
– E.g.: Set of accounts stored with each customer, and set of owners stored with each
account

– We assume all relations are in first normal form


First Normal Form…contd.

• Atomicity is actually a property of how the elements of the domain are used.
– E.g.: Strings would normally be considered indivisible

– Suppose that students are given roll numbers which are strings of the form
CS0012 or EE1127

– If the first two characters are extracted to find the department, the domain
of roll numbers is not atomic.

– Doing so is a bad idea: leads to encoding of information in application


program rather than in the database.
Goal — Devise a Theory for the Following
 Decide whether a particular relation R is in “good” form.

 In the case that a relation R is not in “good” form, decompose it into a set of
relations {R1, R2, ..., Rn} such that:

– Each relation is in good form

– The decomposition is a lossless-join decomposition

 Our theory is based on:

– Functional dependencies

– Multivalued dependencies
Functional Dependencies

 Constraints on the set of legal relations.

 Require that the value for a certain set of attributes determines uniquely the
value for another set of attributes.

 A functional dependency is a generalization of the notion of a key.

 Let R be a relation schema   R and   R

 The functional dependency    holds on R if and only if for any legal


relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes ,
they also agree on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ] 1 4
1 5
 E.g.: Consider r(A,B ) with the following instance of r. 3 7

 On this instance, A  B does NOT hold, but B  A does hold.


List all functional dependencies holds for the following relations

A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b3 c2 d4

A B C
a1 b1 c1
a1 b1 c2
a2 b1 c1
a2 b1 c3
A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b3 c2 d4
In the above relation r the following functional dependency holds:

A  C

A B C
a1 b1 c1
a1 b1 c2
a2 b1 c1
a2 b1 c3

Give all functional dependencies satisfied by the above relation r.


A  B and C  B
Emp_Proj

Ssn Pnumber Hours Ename Pname Plocations

FD1

FD2

FD3

 FD1: {Ssn, Pnumber}  Hours


 FD2: Ssn  Ename
 FD3: Pnumber  {Pname, Plocations}
Functional Dependencies …Contd.
 K is a superkey for relation schema R if and only if K  R

 K is a candidate key for R if and only if


K  R, and for no   K,   R

 Functional dependencies allow us to express constraints that cannot be


expressed using superkeys.

 Consider the schema: bor_loan = (customer_id, loan_number, amount ).


We expect this functional dependency to hold:
loan_number  amount
but would not expect the following to hold:
amount  customer_name
Use of Functional Dependencies
 We use functional dependencies to:
– Test relations to see if they are legal under a given set of functional
dependencies.
• If a relation r is legal under a set F of functional dependencies, we say
that r satisfies F.
– Specify constraints on the set of legal relations
• We say that F holds on R if all legal relations on R satisfy the set of
functional dependencies F.
 Note: A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on all legal
instances.
 E.g.: A specific instance of loan may, by chance, satisfy
amount  customer_name.
• A functional dependency is trivial if it is satisfied by all instances of a relation
General Form:    is trivial if   
- {customer_name, loan_number}  customer_name
- customer_name  customer_name
Closure of a Set of Functional Dependencies
 Given a set F of functional dependencies, there are certain other functional
dependencies that are logically implied by F.

– E.g.: If A  B and B  C, then we can infer that A  C

 The set of all functional dependencies logically implied by F is the closure of F.

 We denote the closure of F by F+.

 F+ is a superset of F.
Procedure for Computing F+
• To compute the closure of a set of functional dependencies F:

F+=F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
Closure of a Set of Functional Dependencies
 Given a set F set of functional dependencies, there are certain other functional
dependencies that are logically implied by F.
– For example: If A  B and B  C, then we can infer that A  C
 The set of all functional dependencies logically implied by F is the closure of F.
 We denote the closure of F by F+.

 We can find all of F+ by applying Armstrong’s Axioms:


– IR1: If   , then    (reflexivity)
– IR2: If   , then      (augmentation)
– IR3: If   , and   , then    (transitivity)
– IR4: If {x  yz} then x y (Decomposition / Projective)
- IR5:If x y, x  z then x yz (Union/Addition)
- IR6: If x y, wy z} then wx  z (Pseudotransitive)
 These rules are
– sound (generate only functional dependencies that actually hold) and
– complete (generate all functional dependencies that hold).
• R = (Ssn, Pnumber, Hours, Ename, Pname, Plocation)
F = { Ssn  Ename,
Pnumber  {Pname, Ploation},
{Ssn, Pnumber} Hours}
Compute Members of F+
 R = (Ssn, Pnumber, Hours, Ename, Pname, Plocation)
F = { Ssn  Ename,
Pnumber  {Pname, Ploation},
{Ssn, Pnumber} Hours}
Compute Members of F+

Compute Members of F+
Ssn+  {Ssn,Ename }
{Pnumber}+= {pnumber, Pname, Plocation}
{ssn,Pnumber}+= {Ssn, Pnumber, Ename,Pname,Plocation,Hours}
 Given R = (A, B, C, G, H, I)

F={ AB
AC
CG  H
CG  I
B  H}
Compute F+.
• R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
B  H}
• Members of F+
– AH
• by transitivity from A  B and B  H
– AG  I
• by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
– CG  HI
• by augmenting CG  I to infer CG  CGI,
and augmenting of CG  H to infer CGI  HI,
and then transitivity
 Compute the closure of the following set F of functional dependencies for relation
schema R= (A, B,C, D, E)

A  BC
CDE
B D
EA
 First Normal Form
DEPARTMENT
Dname Dnumber Dmgr_ssn Dlocations

- Remove the attribute Dlocations that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT. The primary key
of this relation is the combination of {Dnumber, Dlocations}
Dnumber Dlocations
- Expand the key so that there will be a separate tuple in the original relation DEPARTMENT
relation for each location of a DEPARTMENT.
Dname Dnumber Dmgr_ssn Dlocations
Research 10 35 {Lab1, Lab2}
Admin 1 25 First floor

Dname Dnumber Dmgr_ssn Dlocations


Research 10 35 Lab1
Research 10 35 Lab2
Admin 1 25 First floor

- If a maximum number of values known for the attribute then replace the attribute by atomic
attributes. Dlocation2
Dname Dnumber Dmgr_ssn Dlocation1
Research 10 35 Lab1 lab2
Admin 1 25 First floor Null
 Insertion Anomalies:
- To insert a new employee tuple into EMP_PROJ, we must include either
the attribute values for the department that the employee works for or
Null.
- It is difficult to insert a new department that has no employees as in the
EMP_DEPT relation. Only option is place Null values for Employee which
is not possible .

 Deletion Anomalies: If we delete from an employee that happens to


represent last the last employee working for a particular department,
the information concerning that department is lost from the database.

 Update Anomalies: In EMP_DEPT if we change the value of one of the


attributes of a particular department we must update the tuples of all
employee who work in that department.
EMP_DEPT
Ename Ssn Bdate Address Dnumber Dname Dmgr_Ssn
 Guideline2: Design the base relation schemas so that no insertion,
deletion, or modification anomalies are present in the relations. If any
anomalies are present note them clearly and make sure that the
programs that update the database will operate correctly.

 Generally real world problems which includes many of the attributes do not apply to
all tuples in the relation, we end up with many NULLs in those tuples. This can waste
space at storage level and may also lead to problems with understanding the meaning
of the attributes.

 Issues with Null Values in the tuple


- Applying for Aggregate functions

- Select & join operations requires comparison

- Null have multiple interpretations

- The attributes does not apply to this tuple.

- The attribute value for this tuple is unknown

- The value is known but absent that time it was not recorded .
 Guideline 3: As far as possible, avoid placing attributes in a base relation whose
values may frequently the NULL. If NULLs are unavoidable, make sure that they
apply in exceptional cases only and do not apply to a majority of tuples in the relation.

 Generation of Spurious tuples EMP_PROJ


Ssn Pnumber Hours Ename pname plocation

EMP_LOCS

Ename plocation
EMP_PROJ1

Ssn Pnumber Hours Pname Plocation

 Natural Join of EMP_LOCS and EMP_PROJ1 will generates spurious tuples


Canonical Cover
• Sets of functional dependencies may have redundant dependencies that can be inferred from
the others

– E.g.: A  C is redundant in: {A  B, B  C}

– Parts of a functional dependency may be redundant

• E.g.: on RHS: {A  B, B  C, A  CD} can be simplified to


{A  B, B  C, A  D}
• E.g.: on LHS: {A  B, B  C, AC  D} can be simplified to
{A  B, B  C, A  D}

• Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to


F, having no redundant dependencies or redundant parts of dependencies
Extraneous Attributes
 An attribute of a functional dependency is extraneous if we remove it without changing the
closure of the set of functional dependencies.

 Consider a set F of functional dependencies and the functional dependency    in F.


– Attribute A is extraneous in  if A  
and F logically implies (F – {  })  {( – A)  }.

– Attribute A is extraneous in  if A  
and the set of functional dependencies
(F – {  })  { ( – A)} logically implies F.

 E.g.: Given F = {A  C, AB  C }
– B is extraneous in AB  C because {A  C, AB  C} logically implies A  C (I.e.
the result of dropping B from AB  C).

 Example: Given F = {A  C, AB  CD}


– C is extraneous in AB  CD since AB  C can be inferred even after deleting C
Testing if an Attribute is Extraneous

 Consider a set F of functional dependencies and the functional dependency


   in F.

 To test if attribute A   is extraneous in 


1. compute ({} – A)+ using the dependencies in F
2. check that ({} – A)+ contains ; if it does, A is extraneous in 

 To test if attribute A   is extraneous in 


1. compute + using only the dependencies in
F’ = (F – {  })  { ( – A)},
2. check that + contains A; if it does, A is extraneous in 
Canonical cover
• A canonical cover Fc for F is a set of functional dependencies such that F logically
implies all dependencies in Fc and Fc logically implies all dependencies in F.
furthermore, Fc must have the following properties:

- No functional dependency in Fc contains extraneous attribute.

- Each left side of a functional dependency in Fc is unique. That is, there are no
two dependencies
1  1  2   2 in Fc such that  1   2
Canonical Cover
 A canonical cover for F is a set of dependencies Fc such that

– F logically implies all dependencies in Fc, and

– Fc logically implies all dependencies in F, and

– No functional dependency in Fc contains an extraneous attribute, and

– Each left side of functional dependency in Fc is unique.

 To compute a canonical cover for F:


repeat
Use the union rule to replace any dependencies in F
1  1 and 1  2 with 1  1 2
Find a functional dependency    with an
extraneous attribute either in  or in 
If an extraneous attribute is found, delete it from   
until F does not change
Extraneous Attributes

 Consider the following set F of functional dependency on schema (A,B,C) Compute the
canonical cover for F.
A  BC
B C
AB
AB  C Answer: A  B
BC

• F={AB  CD, A E, & E C} Find canonical cover of F.


Extraneous Attributes
 Consider the following set F of functional dependency on schema (A,B,C) Compute the
canonical cover for F.
A  BC
B C
AB
AB  C
Answer: A→ B
B→C
A →BC
A→B A→BC
A is extraneous AB→C from F-{AB→C}U {B→C} implies B→C
C is extraneous A→BC implies A→B & A→C Implies A→B

• F={A  BC, B AC, & C  AB} Find canonical cover of F.


Computation of Super key from FD’s
Given: Drinkers(name, addr, beersLiked, manf, favBeer)

Reasonable FD’s to assert:


1. name → addr
2. name → favBeer
3. beersLiked → manf

name addr beersLiked manf favBeer


Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud

Because beersLiked →manf


Because name → addr
Because name → favBeer
Compute the closure of the following set F of functional dependencies for relation
schema R = {A, B, C, D, E}.

A → BC
CD → E List the candidate keys for R.
B →D
E→ A
Compute the closure of the following set F of functional dependencies for relation schema R = {A, B,
C, D, E}.
A → BC
CD→ E List the candidate keys for R.
B→D
E→ A

Given: A → BC, B→ D so A → D so A → DC→ E therefore A → ABCDE


E → A, A→ ABCDE, so E → ABCDE CD→ E, so CD → ABCDE B→ D, BC → CD, so BC → ABCDE

Attribute closure: DE→ ABCDE


A → ABCDE ABC→ ABCDE
B → BD ABD→ ABCDE
C→ C ABE→ ABCDE
D→ D ACD→ ABCDE
E → ABCDE ACE→ ABCDE
AB→ ABCDE ADE → ABCDE
AC→ ABCDE BCD→ ABCDE
BD → BDAD→ ABCDE BDE→ ABCDE
AE→ ABCDE CDE → ABCDE
BC→ ABCDE ABCD→ ABCDE
BE→ ABCDE ABCE→ ABCDE
CD→ ABCDE ABDE → ABCDE
CE→ ABCDE ACDE→ ABCDE
BCDE → ABCDE
The candidate keys are A, E, CD, and BC
Lossless-join Decomposition
 For the case of R = (R1, R2), we require that for all possible relations r on schema R

r = R1 (r ) R2 (r )

 A decomposition of R into R1 and R2 is lossless join if and only if at least one of the
following dependencies is in F+:
• R1  R2  R1
• R1  R2  R2

 If R1  R2 forms a superkey of either R1 or R2, the decomposition of R is lossless


decomposition.

bor_loan  (customer_id, loan_number, amount)


decomposed into

borrower  (customer_id, loan_number) Here borrower  loan= loan_number


loan  (loan_number, amount) thus it is lossless decomposition
Check the following:
 R = (A, B, C)
F = {A  B, B  C)
– Can be decomposed in two different ways
 R1 = (A, B), R2 = (B, C)
– Lossless-join decomposition:
R1  R2 = {B} and B  BC
– Dependency preserving

 R1 = (A, B), R2 = (A, C)


– Lossless-join decomposition:
R1  R2 = {A} and A  AB
- Not dependency preserving
Let Fi be the set of dependencies F + that include only attributes in Ri.
- A decomposition is dependency preserving, if
(F1  F2  …  Fn )+ = F +
- If it is not, then checking updates for violation of functional dependencies may
require computing joins, which is expensive.
BCNF Decomposition
 R = (A, B, C )
F = {A  B
B  C}
Key = {A}
- R is not in BCNF (B  C but B is not superkey)
- Decomposition
– R1 = (B, C)
– R2 = (A, B)
Comparison of BCNF and 3NF
 It is always possible to decompose a relation into a set of relations that are in
3NF such that:
– the decomposition is lossless
– the dependencies are preserved

 It is always possible to decompose a relation into a set of relations that are in


BCNF such that:
– the decomposition is lossless
– it may not be possible to preserve dependencies.
Comparison of BCNF and 3NF

 It is always possible to decompose a relation into a set of relations that are in


3NF such that:
– Decomposition is lossless
– Dependencies are preserved

 It is always possible to decompose a relation into a set of relations that are in


BCNF such that:
– the decomposition is lossless
– it may not be possible to preserve dependencies.
Multivalued Dependencies (MVDs)
 Let R be a relation schema and let   R and   R. The multivalued dependency
 
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such
that t1[] = t2 [], there exist tuples t3 and t4 in r such that:
t1[] = t2 [] = t3 [] = t4 []
t3[] = t1 []
t3[R – ] = t2[R – ]
t4 [] = t2[]
t4[R – ] = t1[R – ]

E.g.:
Use of Multivalued Dependencies
 We use multivalued dependencies in two ways:
1. To test relations to determine whether they are legal under a given set of
functional and multivalued dependencies

2. To specify constraints on the set of legal relations. We shall thus concern


ourselves only with relations that satisfy a given set of functional and multivalued
dependencies

 If a relation r fails to satisfy a given multivalued dependency, we can construct a


relations r that does satisfy the multivalued dependency by adding tuples to r.

 Fourth Normal Form


 A relation schema R is in 4NF with respect to a set D of functional and multivalued
dependencies if for all multivalued dependencies in D+ of the form   , where
  R and   R, at least one of the following hold:
-    is trivial (i.e.,    or    = R)
-  is a superkey for schema R

 If a relation is in 4NF it is in BCNF


 Non Additive Join Decomposition into 4 NF

Whenever we decompose a relation schema into R1   X  Y  and R2   R  Y 


based on an MVD : X  Y that holds in R, the decomposition has the non
additive join property.

R1  R2   R1  R2 
OR
R1  R2   R2  R1 

Algorithm: Input: A universal relation R and a set of functional & multivalued


dependencies F.
1. Set D : {R}
2. While there is a relational schema Q in D that is not in 4NF, do
{ choose a relation schema Q in D that is not in 4NF;
find a nontrivial MVD : X  Y in Q that violates 4NF;
replace Q in D by two relation schemas  Q  Y  and ( X  Y ) ;
};
 Decompose the relation schema R in to 4NF with Nonadditive join property
R =(A, B, C, G, H, I)
F ={ A  B
B  HI
CG  H }
 R =(A, B, C, G, H, I)
F ={ A  B
B  HI
CG  H }
 R is not in 4NF since A  B and A is not a superkey for R

 Decomposition
a) R1 = (A, B) (R1 is in 4NF)

b) R2 = (A, C, G, H, I) (R2 is not in 4NF)

c) R3 = (C, G, H) (R3 is in 4NF)

d) R4 = (A, C, G, I) (R4 is not in 4NF)

 Since A  B and B  HI, A  HI, A  I


e) R5 = (A, I) (R5 is in 4NF)
f)R6 = (A, C, G) (R6 is in 4NF)

You might also like