0% found this document useful (0 votes)
98 views80 pages

CH - 1 Relational Database Design Updated

The document provides an overview of advanced database systems and the evolution of database technology from file processing to database management systems with query and transaction processing capabilities. It discusses several types of advanced database systems that have emerged to address the needs of new applications, including object-oriented, object-relational, spatial, temporal, text, multimedia, heterogeneous, legacy, and web-based databases. It also covers data warehousing and data mining techniques.

Uploaded by

abel bahiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views80 pages

CH - 1 Relational Database Design Updated

The document provides an overview of advanced database systems and the evolution of database technology from file processing to database management systems with query and transaction processing capabilities. It discusses several types of advanced database systems that have emerged to address the needs of new applications, including object-oriented, object-relational, spatial, temporal, text, multimedia, heterogeneous, legacy, and web-based databases. It also covers data warehousing and data mining techniques.

Uploaded by

abel bahiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

Advanced Database Systems

Chapter One
Relational Database Design

1
Introduction and overview
⚫ Database technology has evolved from primitive file
processing to the development of database
management systems with query and transaction
processing. Manual, File based (Excel), DBMS …with
SQL
⚫ Relational database systems have been widely used in
business applications.
⚫ With the advancement of database technology, various
kinds of advanced database systems have been
emerged and undergoing development to address the
requirements of new database applications
Introduction…

The new database applications include handling


• Spatial data (such as maps)
• Hypertext and multimedia (including text, images,
videos, and audio data)
• Time related data (such as historical records stock
exchange data)
• World wide web ( a huge, widely distributed
information repository made available by the internet
Introduction…

⚫ While such databases or information repositories require


sophisticated facilities to efficiently store, retrieve and
update large amounts of complex data.

⚫ Further progress has led to the increasing demand for


efficient and effective data analysis and data
understanding tools.

⚫ This need is a result of explosive growth in data collected


from applications including business and management,
government administration, science and engineering,
and environmental control. E.g Banking

4
Introduction…

⚫ These applications require efficient data structures and scalable


methods for handling complex object structures, variable length
records, semi- structure and unstructured data, text and multimedia
data, data schemas with complex structures and dynamic changes.
⚫ In response to these needs, advanced database system and specific
application oriented database systems have been developed.
These includes:
⚫ object-oriented and object relational database systems,
⚫ Temporal Databases And Time series Database
⚫ text and multimedia database system,
⚫ heterogeneous and legacy database system, and
⚫ web based global information systems.
Object oriented databases(OODB)

 based on the object oriented programming paradigm, where in
general terms, each entity is concerned as an object.
 data and code relating to an object are encapsulated into a single
unit
 Each object has associated with it the following:
 A set of variables that describe the objects(these correspond to
the attributes in the entity relationship and relational models.
 A set of messages that the object can use to communicate with
other objects, or with the rest of the database system.
 A set of methods, where each method holds the code to
implement a message Upon receiving a message, the method
returns
6 a value in response.
Introduction…

For instance the method for the message


get_photo(employee) will retrieve and return a photo of
the given employee object
Objects that share a common set of properties can be
grouped into an object class.

Each object is an instance of its class.


Object classes can be grouped organized into
class/subclass hierarchies so that each class represents
properties that are common to objects in that class.
Object Relational Databases(ORDB) …
 Are constructed based on an object –relational data model
 This model extends the relational model by providing a
rich data type for handling complex objects and object
orientation.
 In addition special constructs for relational query
languages are included to manage the added data types.
 Are becoming increasingly popular in industry and
applications.
Spatial Databases

⚫ Contains spatial related information


⚫ Such databases includes geographic (map) databases, VLSI
chip design databases, and medical and satellite image
databases.
⚫ Spatial data may be represented in raster format, consisting of
n-dimensional bit maps or pixel maps.
⚫ For example a 2D satellite image may be represented as raster
each pixel registers the rain fall in a given area.
⚫ Maps can be represented in vector format ,where roads,
bridges, buildings and lakes are represented as union of basic
geometric constructs such as points, lines, polygons, and the
partitions and networks formed by these shapes.
Introduction…

 Geographic database applications


⚫ Forestry and ecology planning
⚫ Location of telephone and electric cables, pipes and
sewage system
⚫ Vehicle navigation and dispatching system
⚫ Urban planning
Temporal Databases And Time series Database

⚫ Both stores time related data


⚫ A time-series data database stores sequences of values that exchange with
⚫ time, such as data collected regarding the stock exchange.
⚫ Data mining techniques can be used to find the characteristics of object
evolution or the trend of changes for objects in the database.
⚫ Such information can be useful in decision making and strategic planning
⚫ Ex Bacteria Growth expiration date
⚫ The mining of banking data may aid in the scheduling bank tellers according
to the volume of customer traffic.
⚫ Stock exchange data investment strategies
⚫ Time may be decomposed according to fiscal years, academic years, or
calendar years, years may be further decomposed into quarters, or months
Text databases and Multimedia databases
 Text databases are databases that contain word
descriptions for objects
 These word descriptions are actually not simple
keywords but rather long sentences or paragraphs such
as documents.
 Text databases may be highly unstructured(such as
home web pages on the www).
 Some text databases may be semi structured (such as e-
mail message and many HTML/XML web pages)
 Others are relatively well structured(such as library
database)
Introduction…

⚫ Multimedia databases store image, audio, and video


data
⚫ They are used in applications such as picture content-based
retrieval, voice mail systems, video on demand systems,
the www, and speech based user interfaces that recognize
spoken commands
⚫ multimedia databases must support large objects, since
data objects such as video can require gigabytes of
storage.
⚫ Specialized storage and search techniques are also
required
⚫ real-time retrieval(leap sysnchronizations)
Heterogeneous Databases and Legacy databases
 Objects in one component databases may differ greatly from
objects in other component database, making it difficult to
assimilate their semantics into the overall heterogeneous database
 A legacy database is a group of heterogeneous databases that
combines different kinds of data systems such as relational or
object oriented databases, hierarchical databases, network
databases, spreadsheets, multimedia databases, or file systems
 The heterogeneous databases in a legacy databases may be
connected by intra or inter computer networks
 Information exchange across such databases is difficult since one
needs to work out precise transformation rules from one
representation to another, considering diverse semantics.
The world Wide Web
 The www and its associated distributed information
services such as America online, yahoo, AltaVista, and
rich, world-wide, online information services, where
data objects are linked together to facilitate interactive
access.
 Users seeking information of interest traverse from one
object via links to another
 Web services that provide keyword-based search without
understanding the context behind particular web pages
can only offer limited help to users
Data warehouses
 Refers to a database that is maintained separately from an
organization’s operational databases to support decision
making.
 Data warehouse systems allow for the integration of a
variety of application systems
 They support information processing by providing a solid
platform of consolidated historical data for analysis
 Non volatile
 Time variant[5-10 years historical data]
 Integrated on multiple heterogeneous sources
Data Mining
⚫ Refers to extracting or “mining” knowledge from large amount
of data
⚫ knowledge mining from databases, knowledge extraction,
data analysis, data archeology, simply an essential step in the
process of knowledge discovery in databases
 The steps in knowledge discovery are :
⚫ Data cleaning to remove noise and inconsistent data
⚫ Data integration combining multiple sources
⚫ Data selection data relevant to analysis is selected
⚫ Data transformation into a form appropriate format for mining
⚫ Data mining extracting knowledge(patterns)
⚫ Pattern evaluation identifying the truly interesting patterns using measures
⚫ Knowledge presentation visualization(presentation to users)
Relational Data Model
Here the data elements are organized in the form of named
multiple tables(relations) with rows and columns
The user of the database system may query these tables,
insert new tuples(rows), and update (modify) tuples. There are
several languages for expressing these operations
The relational model is the most used data model for
commercial data-processing because it provides
greater flexibility in data organization and future
enhancement.
Data in one table can be related to data in another table by a
common field
Relational Data Model

The relational model is a combination of 3


components:
1. The Structural part -which defines the database
as a collection of tables
2. The integrity part-entity integrity and referential
integrity Relational Data Model(4)
3. Manipulative part -the relational algebra and
relational calculus are the tools used to manipulate
data in the database.
Relational Data Model(5)
 The following terminologies are used by the data
model, programmer and user respectively
Basic Structure of a Relation(1)
 Represent data as a two-dimensional table called a relation
Seven Characteristics of a relation

1. The name of the relation is different from all others


2. Each cell of the relation contains only one value
3. Each attribute has a name that is distinct
4. All the values of a particular attribute are from the
domain same
5. The order of the attributes makes no difference
6. There are no duplicate tuples
7. The order of the tuples makes no difference
Keys
 Superkey is a set of one or more attributes that allow us to
identify uniquely an entity in the entity set
 Candidate key is minimal superkey. One of those keys is
selected to be the primary key
 Primary key is a candidate key that is chosen to identify
entities within an entity set
 Foreign key is a primary key of one relation which is used in
another relation to create a connection between/among the
relations
Find Candidate Keys

R(A, B, C,
D)11 12 33 44
3 2 1 4
4 1 2 3

 X X
{A, B} {A, C} {A,
D}

  
{B, C} {B, D} {C, D}
X
{A,C,D}

 = okay  = not okay


How to determine keys

 Strong entity set: the entity primary key becomes the


relation primary key
 Weak entity set: the primary key of the relation is the union
of the strong entity set primary key and the discriminator
 Relationship set: the union of the primary keys of the
related entity sets becomes a superkey of the relation
 Combined tables: in a many-to-one, the primary key of the
many becomes the relation primary key. In a one-to-one
either primary key can be used
 Multivalued attributes: the entity primary key becomes
the primary key
Entity Integrity

 No component of the Primary Key of a base relation is allowed to accept


nulls and the primary key is distinct

Surname Given Name Salary


Red John $40,000
Black $50,000
Red Fred $60,000
Black $70,000
Foreign Key

 Can the foreign key accept nulls ?


 What should happen on an attempt to delete the target of a
foreign key reference?
 What should happen on an attempt to update the target of a
foreign
key reference ?

Employee
Dept
Emp# ename Worksfordept Dept Dname
e1 red d1 d1 Pay
d2 Tax
e2 blue
d3 Art
e3 brown d2
Foreign key
Referential Integrity

 If a relation R2 includes a foreign key FK matching the


primary key PK of some relation R1 then every value
of FK in R2 must either
(a)be equal to the value of PK in some tuple of R1, or
(b)be wholly null

 Note that PK and FK may comprise more than one


attribute and that R1 and R2 are not necessarily distinct

 Stated more simply a foreign key should be a valid


primary key value or null
Foreign Key Rules

 For each foreign key three rules need to be satisfied:


 Can the foreign key accept nulls ?
 What should happen on an attempt to delete the target of a foreign key
reference?
 What should happen on an attempt to update the target of a foreign
key reference ?

Employee Dept

Emp# ename Worksfordept Dept Dname


e1 red d1 d1 Pay
e2 blue d2 Tax
e3 brown d2 d3 Art

Dept works in Employee table as FK


Characteristics of a good database design

 A good database has the following characteristics:


 It is easy to modify and maintain without affecting
other fields or tables in the database.
 Information is easy to retrieve, and user
applications are easy to develop and build.
 The database is scalable, meaning that it can be
expanded to meet the changing needs of an
organization.
Enhanced ER or Extended ERE

EER Model Concepts


Includes all modeling concepts of basic ER
Additional concepts:
subclasses/superclasses
specialization/generalization
These are fundamental to conceptual modeling
The additional EER concepts are used to model
applications more completely and more accurately
EER includes some object-oriented concepts, such as
inheritance
Subclasses and Superclasses (1)

An entity type may have additional meaningful


subgroupings of its entities
Example: EMPLOYEE may be further grouped into:
SECRETARY, ENGINEER, TECHNICIAN, …
Based on the EMPLOYEE’s Job
MANAGER
EMPLOYEEs who are managers
SALARIED_EMPLOYEE, HOURLY_EMPLOYEE
Based on the EMPLOYEE’s method of pay
EER diagrams extend ER diagrams to represent these
additional subgroupings, called subclasses or subtypes
Subclasses and Super classes
Subclasses and Superclasses (2)

Each of these subgroupings is a subset of EMPLOYEE entities


Each is called a subclass of EMPLOYEE
EMPLOYEE is the superclass for each of these subclasses
These are called superclass/subclass relationships:
EMPLOYEE/SECRETARY
EMPLOYEE/TECHNICIAN
EMPLOYEE/MANAGER

Subclasses and Superclasses (3)

These are also called IS-A relationships


SECRETARY IS-A EMPLOYEE, TECHNICIAN IS-A
EMPLOYEE, ….
Note: An entity that is member of a subclass represents
the same real-world entity as some member of the
superclass:
The subclass member is the same entity in a distinct specific
role
An entity cannot exist in the database merely by being a
member of a subclass; it must also be a member of the
superclass
A member of the superclass can be optionally included as a
member of any number of its subclasses
Subclasses and Superclasses (4)

Examples:
A salaried employee who is also an engineer belongs to the
two subclasses:
ENGINEER, and
SALARIED_EMPLOYEE
A salaried employee who is also an engineering manager
belongs to the three subclasses:
MANAGER,
ENGINEER, and
SALARIED_EMPLOYEE
It is not necessary that every entity in a superclass be a
member of some subclass
Representing Specialization in EER Diagrams
Attribute Inheritance in Superclass / Subclass Relationships

An entity that is member of a subclass inherits


All attributes of the entity as a member of the superclass
All relationships of the entity as a member of the superclass
Example:
In the previous slide, SECRETARY (as well as
TECHNICIAN and ENGINEER) inherit the attributes
Name, SSN, …, from EMPLOYEE
Every SECRETARY entity will have values for the inherited
attributes
Specialization (1)

Specialization is the process of defining a set of subclasses of


a superclass
The set of subclasses is based upon some distinguishing
characteristics of the entities in the superclass

Example: {SECRETARY, ENGINEER, TECHNICIAN}


is a specialization of EMPLOYEE based upon job type.
May have several specializations of the same
superclass
Generalization

Generalization is the reverse of the specialization process


Several classes with common features are generalized into
a superclass;
original classes become its subclasses
Example: CAR, TRUCK generalized into VEHICLE;
both CAR, TRUCK become subclasses of the superclass
VEHICLE.
We can view {CAR, TRUCK} as a specialization of
VEHICLE
Alternatively, we can view VEHICLE as a generalization of
CAR and TRUCK
Generalization (2)
Generalization and Specialization
(1)
Diagrammatic notation are sometimes used to distinguish between
generalization and specialization
Arrow pointing to the generalized superclass represents a
generalization
Arrows pointing to the specialized subclasses represent a
specialization
We do not use this notation because it is often subjective as to which
process is more appropriate for a particular situation
We advocate not drawing any arrows
Data Modeling with Specialization and Generalization
A superclass or subclass represents a collection (or set or grouping)
of entities
It also represents a particular type of entity
Shown in rectangles in EER diagrams (as are entity types)
We can call all entity types (and their corresponding collections)
classes, whether they are entity types, superclasses, or subclasses
UML Example for Displaying Specialization / Generalization
Functional Dependencies and Normalization

1. Informal Design Guidelines for Relational


Databases
2. Functional Dependencies (FDs)
1. Definition of FD
2. Inference Rules for FDs
3. Proof of the Inference Rules

3. Normal Forms
1. Introduction to Normalization
2. First Normal Form
3. Second Normal Form
4. Third Normal Form
5. Boyce-Codd Normal Form(BCNF)
6. Fourth Normal Form

1
1. Informal Design Guidelines for Relational Databases
GUIDELINE 1
 Design a relation schema so that it is easy to explain its
meaning.
 Do not combine attributes from multiple entity types
and relationship types into a single relation

EMPLOYEE *
DEPARTMENT

Attributes from department

Attributes from project


3
2
1. Informal Design Guidelines for Relational Data(Continued…)

GUIDELINE 2
Design the base relation schemes so that no insertion deletion,
or modification anomalies are present.
GUIDELINE 3
Relation should be designed such that their tuples will have
few NULL values if possible.

Not applicable
NULL Unknown
Known but
absent
1. Informal Design Guidelines for Relational Data(Continued…)

GUIDELINE 4
Design relation schemas so that they can be joined with equality
conditions on attributes that either primary keys or foreign keys in
a way that guarantees that no spurious tuples are generated.
2. 1 Functional Dependencies (FDs)

 An attribute ( set of attributes ) X functionally determines an attribute ( set


of attributes) Y, if the value of X determines a unique value for Y.
 For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]= t2[X], then t1[Y]= t2[Y]
X is a candidate key of R
⇒ X→ Y for any subset Y of R
 X→ Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
 FDs are derived from the real-world constrains on the attributes
 Functional dependencies (FDs) are used to specify formal measures of the
„goodness‟ of relational designs
 FDs and keys are used to define normal forms for relations
 FDs are constrains that are derived from the meaning and interrelationships
of the data attributes
Examples of FD Constraints:

 Social security number determines employee name SSN → ENAME


 Project number determines project name and
location PNUMBER →{PNAME, PLOCATION}
Examples of FD constraints: (Cont.)

 Employee SSN and TEACH


project number TEACHER COURSE TEXT
determines the hours Smith D.S. Bartram
per week that the Smith D.M. Al-Nour
employee works on Hall Compilers
the project : Brown D.S. Hoffman
Augenthal
{SSN, PNUMBER} → HOURS TEACHER →er
COURSE

→ TEXT
TEXT

→ COURSE(P)

 An FD is a property of the attributes in the schema R


 The constraint must hold on every relation instance r(R)
 If K is a key of R, then K functionally determines all attributes in R
2.2 Inference Rules for FDs

 Given a set of FDs F, we can infer additional FDs that hold


whenever the FDs in F hold.
 Armstrong’s rules of inference:

A1. Reflexivity : If Y⊆X, then X →Y (trivial functional


dependency)
A2. Augmentation : If X →Y, then XZ →YZ
(Notation: XZ stands for X𝖴Z)
A3. Transitivity : If X →Y and Y →Z, then X →Z

 A1,A2, and A3 form a sound and complete set of inference


rules
2.2 Inference Rules for FDs(Continued…)

 Additional useful rules of inference :


 Decomposition : If X →YZ, then X →Y and X→Z
 Union: If X →Y and X →Z, then X →YZ
 Psuedotransitivity : If X →Y and WY→Z, then WX
→Z

 The three inference rules stated above , as well as any other


inference rules can be deduced from Armstrong’s inference rules :
A1, A2, and A3(completeness property)
2.3 Proof of the Rules of Inference

 1. Reflexivity : If Y⊆X, then X 1.X = {a, b, c, d, e}  


→Y Proof. 2.Y = {a, b, c}  

Assume t1, t2∈ r of R


and t1[X] = t2[X]
Since Y ⊆X , then
t1[Y] = t2[Y]
∴ X →Y holds true

 Home Work:
– Prove the Augmentation rule.
2.3 Proof of the Rules of Inference(Continued…)

2. Transitivity : If X →Y and Y →Z, then X


→Z Proof.
t1, t2 ∈ r of R and t1[X] = t2[X]
1) X →Y (given)
2) Y →Z
3) IF t1[X] = t2[X], then(given)
t1[ Y] = t2[Y] (1)
4) IF t1[Y] = t2[Y], then t1[Z] = t2[Z] (2)
5) IF t1[X] = t2[X], then t1[ Z] = (3) &
t2[Z]
X →Z (5) (4)
6)
2.3 Proof of the Rules of Inference(Continued…)

3. Decomposition : If X →YZ, then XY and X


→Z

1) Proof.
X →YZ (given)
2) YZ →Y (Reflexive rule)
3) X →Y (Transitive rule)

1) X →YZ (given)
2) YZ →Z (Reflexive rule)
3) X →Z (Transitive rule)
2.3 Proof of the Rules of Inference(Continued…)

4. Union : If XY and X →Z, then X


→YZ
Proof.
1) X →Y (given)
2) X →Z
3) X →XY (given)
4) XY augme
→YZ nting
5) X →YZ on (1)
with
X
(Note
: XX
=XU
X=
X)
augme
2.3 Proof of the Rules of Inference(Continued…)

5. Pseudo transitivity: If XY and WY →Z, then WX


→Z
Proof.
1) X →Y (given)
2) WY →Z
3) WX →WY (given)
4) WX →Z augmenting on (1) with W
transitive rule on (3) &
(2)
3.1 Introduction to Normalization

 Normalization is the processes of


decomposing relations with anomalies to
produce smaller, well-structured sets of
relations with desirable properties.

 It often refers to a series of tests performed


on relations to determine whether they satisfy
or violate the requirements of a normal form.

 The process of normalization was first


developed by Codd in 1972. Codd initially
defined three normal forms : 1NF,2NF,3NF.

 Boyce and Codd together introduced a


stronger definition of 3NF called Boyce-
Codd Normal Form (BCNF) in 1974.
3. 1 Introduction to Normalization(Continued…)

 All four of these normal forms are based on functional dependencies among
the attributes of a relation.
 A functional dependency describes the relationship between/among
attributes
in a relation.
– For example, if A and B are attributes or sets of attributes of relation R, B
is functionally dependent on A (denoted as A  B), if each value of A is
associated with exactly one value of B.
 In 1977 and 1979, a fourth (4NF) and fifth (5NF) normal form were
introduced which go beyond BCNF. However, they deal with situations
which are quite rare.
3.2 First Normal Form (1NF)

 A relation R is said to be in first normal form if and only if all its


columns contain only atomic values

 1NF addresses 2 issues :


– A column can‟t contain repeating groups
– Each row of data must have unique identifier

 It disallows composite attributes, multivalued attributes, and nested


relations.

 Considered the example in the next 2 slides:


1NF Example
(a) A relation schema that is not in 1NF

(b) Example relation instances


1NF Example(Continued…)
(c) 1NF relation with redundancy

Alternative 1

DMGSSN → PLOCATION
Alternative 2 KEY:{DNUMBER,DLOCATION}
(better)
3.3 Second Normal Form(2NF)

 A relation R is said to be in second normal form (2 NF) if it is in


first normal form and all its non-key attributes are dependent on all of
the components of the composite key

 A relation in 1 NF and that contains partial functional dependency is


not second normal form . This relation will be in 2NF if the partial
functional dependency is removed

 Consider the examples in the next slides:


2NF Example 1

 STUDENT(SID,SNAME,C_CODE,GRADE)

 (SID,C_CODE) (GRADE) : Full FD

 SIDSNAME , BUT C_CODE  SNAME : Partial FD

 So the STUDENT Relation must be split into the following 2


relations:

 STUDENT_COURSE_GRADE(SID,C_CODE, GRADE)

 STUDENT_DETAILS(SID, SNAME)

22
2NF Example 2

 {SSN, PNUMBER} →HOURS is a full FD since neither


SSN → HOURS nor PNUMBER → HOURS hold
 {SSN, PNUMBER} →ENAME is not a full FD (it is
called
partial functional dependency) since SSN
→ENAME also holds
3.4 Third Normal Form(3NF)

 A relation R is said to be in third normal form (3 NF) if it is in


second normal form and a contains no transitive dependency

 If a relation is in 3NF , then all non-key attributes are


functionally dependent only upon the primary key

 Consider the examples in the next slides:


3NF Example 1

STUDENT(SID: pk, Activity, Fee)


SID Activity Fee
100 Swimming 100
200 Tennis 100
300 Golf 300
400 Swimming 100

 Identify the normal form of the given


relation. A . 1NF B. 2NF C. 3NF
 Identify the kind of anomaly that exists in the
given relation.
A. Insertion anomaly B. Deletion anomaly
C. Update anomaly D. All
 Notice the following functional
dependencies
 SID  Activity
 25
3NF Example 1 (Continued…)

 Since the relation suffers from all the anomalies, it must be decomposed
into smaller relations. There are 3 possible decomposition for this relation:
– STUDENT_ACTIVITY(SID: pk, Activity)
– ACTIVITY_FEE(Activity: pk, Fee)
– STUDENT_FEE(SID: pk, Fee)

 From the 3 possible decompositions two of them are enough to represent


the original relation. Which pair do you think is the best?
3NF Example 2

 SSN→DMGRSSN is a transitive FD since


SSN→DNUMBER and DNUMBER→DMGRSSN hold
 SSN→ENAME is non-transitive since there is no set of attributes
X where SSN→X and X→ENAME
3NF Example 2 (Continued…)

FD2 and FD3 violate 2NF,


i.e., ENAME, PNAME, and PLOCATION
partially dependent on {SSN, PNUMBER}

It is not a primary key

SSN→DNUMBER
DNUMBER →DMGRSSN
SSN →DMGRSSN
3.5 Boyce – Codd Normal Form(BCNF)

 A relation schema R is in Boyce-Codd Normal Form (BCNF) if


whenever a FD X →A holds in R, then X is a key of R or contains a
key of R

 BCNF is more stricter form of 3NF where the determinant is a key

 Considered the example in the next slide:


BCNF Example 1

FD1
FD2
BCNF Example 1 (Continued…)

There are three possible


decompositions

1. { STUDENT, COURSE} and {STUDENT, INSTRUCTOR}


Generate spurious tuples

2. {COUSE, INSTRUCTOR} and { COURSE, STUDENT}


“Lost” FD1
Generate spurious tuples

3. {INSTRUCTOR, COURSE} and { INSTRUCTOR, STUDENT}


Lossless join

14-31
BCNF Example 2

 STUDENT (SID, Major, Advisor, Major-GPA)

SID Major Advisor MGPA


100 Microprocessors Mr. Gim 3.45
100 Software Engineering Dr. Manoj 3.92
200 Web-Services Mr. Seyo 3.22
300 Software Engineering Dr. Manoj 3.78
400 Databases Mr. Mezg 3.56
500 Databases Mr. Mezg 3.27

 Identify the normal form for the relation STUDENT and there exists any
anomaly in the relation.
– Update anomaly : change in advisor
– Insertion anomaly : adding a new advisor
– Deletion Anomaly : if a student is removed and if an advisor has only
one advisee
BCNF Example 2 (Continued…)

Student Relation
Before BCNF

Student and
Advisor Relations
in BCNF
3.6 Fourth Normal Form(4NF)

 A relation R is in fourth normal form (4 NF) if and only if it is in


BCNF and there is no multivalued dependency in the relation

 Multivalued dependency :
 Consider a relation R which has three attributes A,B,C. For each value of
A there is a set of values of B and set of values of C. However, the set of
values of B and C are independent of each other, and then there exists
multivalued dependency between the attributes A, B, C in the relation R.
 A ↠ B implies that for each value of A there is a set of values of B
 A ↠ C implies that for each value of A there is a set of values of C

 Basically, whenever two independent 1:M relationships A:B and A:C occur in
the same relation, a multi-valued dependency may occur

34
4NF (Continued…)
 Multi-value dependency example:
SUBECT(COURSE,INSTRUCTOR,REFERENCE BOOK)
– COURSE INSTRUCTOR
– COURSE REFERENCE BOOK

COURSE INSTRUCTOR REFERENCE BOOK

CSE 101 Mr. Sena R1


CSE 101 Mr. Mezg R2
CSE 309 Mr. Mezg Ref1
CSE 309 Mr. Mezg Ref2
CSE 309 Mr. Tesfa Ref 1
CSE 309 Mr. Tesfa Ref 2

CSE 309 Mr. Tesfa Ref 3


4NF (Continued…)
 The relation SUBJECT can be converted into 4NF by splitting into two
relations TEXT and REFERENCE
– TEXT(COURSE, INSTRUCTOR) Course Instructor
CSE 101 Mr. Sena
CSE 101 Mr. Mezg
CSE 309 Mr. Mezg
CSE 309 Mr. Tesfa

– REFERENCE(COURSE,REFERENCE BOOK)
Course Reference Book
CSE 101 R1
CSE 101 R2
CSE 309 Ref 1
CSE 309 Ref 2
CSE 309 Ref 3
End of Slides

Questions?

You might also like