0% found this document useful (0 votes)
37 views15 pages

CH - 3 Fundamentals of A Database System

Uploaded by

abrehamcs65
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views15 pages

CH - 3 Fundamentals of A Database System

Uploaded by

abrehamcs65
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CHAPTER THREE

Relatio nal Dat a Mod el


Relational Data Model is an implementation (representational) model proposed by E.F. Codd in1970.
The model is an approach in a database design towards the Relational Database
Management System (RDBMS).

 Stru ctu re of R elational D atabase


The main construct for representing data in the relational database is a two-dimensional table
called a relation.
Example
- “EMPLOYEES” relation

Employees
EmpId Name BDate Sub City Kebele Phone
E001 Alemu Girma 01/10/70 Bole 06 011-663-0712
E004 Kelem Belete 12/04/68 Gulele 03 011-227-2525
Fig 1. Typical Employee relation instance

The columns in the table are representing the attributes of the relationship, and the rows (other
than the heading row) represent tuples (records) of the relation.
A relation in a relational model consists of:
 The Relation schema: - that describes the column heads for the table and
 The Relation instance: - that is the table with the set of tuples.
The set of relation schema forms schema for the relational database called database schema
(relational database schema).
In relational model the relation schema are described first. And the schema specifies
- The relation's name
- Name for each attribute (field or column)
- Domain of each attribute: - A domain is referred to in a relation schema by the domain
name and has a set of associated values.
Example
- Employees (EmpId:sting, Name:string, BDate:date, SubCity:string, Kebel:integer, Phone:string)

- Projects (PrjId:integer, Name:string, SDate:date, DDate:date, CDate:date)


- Teams (Name:string, Descr:string)
Properties of Relations
 Rows (tuples) in a single relation are unique (that is; no two tuples are identical).
 Relations are set of tuples not lists (that is; order of tuples in a relation is immaterial).
 Attributes are atomic.
 The values that appear in a column must be drawn from the domain associated with that
column.
 The degree , also called arity, of a relation is the number of attributes in the relation.
 The relation names in a relational database are distinct.

Key Constraints
A key constraint is a statement that a certain minimal subset of the attributes of a relation is a
unique identifier for a tuple in the relation.
A set of attributes that uniquely identifies a tuple according to a key constraint is called a
candidate key for the relation; often abbreviated just as key.
Key attributes in relational model are indicated by underlying the attributes in the relational.
Example
- Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)

- Projects (PrjId, Name, SDate, DDate, CDate)

- Teams (Name, Descr)

REMARK: Note that a key for a relation may not be directly inferred from the high-level
conceptual models in some cases.

Foreign Key Constraints


The most common integrity constraint involving two relations is a foreign key constraint. It
keeps data consistency when a data modification is done on a relation.
The foreign key in the referencing relation requires a match to a primary key in the referenced
relation. That is, there must be a compatible data type attribute in the referenced relation so as
the referencing relation may make the referencing.
Example
- Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)

- WorkSchedule (SDate, EDate, HoursPerDay, Employee,)

In the above example for the “WorkSchedule” to refer to the “Employees” relation instance, it
has an attribute ‘Employee’ of the same type as the ‘EmpId’ in the “Employees” relation which is
a primary key. The foreign key constraint is implemented through the ‘Employee’ attribute in
the referencing relation “WorkSchedule”.

WorkSchedule Employees
… Hours Employee EmpId Name …
… 8 E001 E001 Alemu Girma …
… 6 E004 E004 Kelem Belete …
… 8 E002 E002 Mulken Getu …
… 4 E004
(a) Referencing relation (b) Referenced relation
Fig 2. Foreign Constraint in Relational Model

NOTE: - A single tuple can be referenced by zero or more tuples in the referencing relation, but
a single tuple with a single foreign key attribute can only reference one tuple.
- A foreign key could refer to the same relation.
- A relational database consists of related relations through a foreign key.

 Entity-R elational (E/R ) Model to Relational Model


Mappin g
The second phase in database design is implementation design that transforms the conceptual
data model into an internal model - schema such as a relational data model for an implementation
into relational database management system (RDBMS).
E/R diagram’s entity sets and relationship are ways of describing a relational schema and the
sets of entities and relationship sets form the relational instance of the E/R schema which is not
part of the database design.

 Entity Sets to Relations


Strong entity sets in E/R model are mapped to relations in relational model with the same name
and attributes. The primary keys assigned for the entity sets are also represented as keys in the
relations.
Example

Then the relations from the strong entity sets having only simple and single valued
attributes are as follows
- Projects(ProjId, Name, SDate, DDate)
- Customers(CustId, Name, Address)

Handling Weak Entity Sets


Suppose W is a weak entity set with attribute set {a1 , a2 , a3 , … an } and identifying strong entity
set E. And let the primary key of E is the set {b1 , b2 , … bm }, then the attributes of the relation
for the weak entity set must include attributes for its complete key (including those belonging to
the identifying strong entity set) and its own, non-key attributes. That is, the set of attributes of
the mapping relation is {a1, a2 , a3 , … an } U {b1 , b2 , … bm }.
The primary key for the weak entity set relation thus include:
 The discriminator of the weak entity set, and
 The primary key of the identifying strong entity set.
Example
For the weak entity set (TEAMS) in figure 3 above the corresponding relation is:
- Teams(ProjId, Name, Descr)
Handling Composite and Multivalued Attributes
 Composite attributes from E/R model to a relational model can be represented by
creating separate attributes for each of the components of the attributes (Note that the
composite attribute is not mapped directly into a separate attribute).
 Multivalued attributes are handled by creating relations with the name of the attribute
having attributes that corresponds to the components of the multivalued attribute and
the primary key of the entity set or relationship set of which the attribute belongs. The
primary key for the newly created relation consists of:
- The primary key of the entity set or relationship set, and
- The attribute or set of attributes from the multivalued attribute.
Example
Consider the EMPLOYEES entity set in figure 3 above the corresponding relations for the
entity set are:
- Employees(EmpId, Name, BDate, Age)
- Addresses(City, SubCity, Kebele, HNo, Phone1, Phone2, EmpId)
REMARK
Note that; if the multivalued attribute has a fixed size of multiplicity (small size), it can be
represented by separate attributes for each multiplicity. For example consider phone
attribute above.

 Relationship Sets to Relations


Suppose entity set E with a primary key {a11, a12, a13, … a1n } is related to an entity set F with a
primary key {a21, a22, a23 , … a2m } through a relationship R. Let the relationship R has a
descriptive attribute set {b1, b2, b3, … bp}, then the relationship is represented by a relation
whose attributes are:
 The keys of the connected entity sets: {a 11, a 12, a 13, … a1n} U {a21, a22 , a23 , … a2m }, and
 Attributes of the relationship itself: {b 1, b 2, b 3, … bp}.
The union of the primary keys of the related entity sets forms super key for the relationship
relation. If the relationship is many-to-many the super key also becomes a primary key for the
relation, otherwise the primary key from the many side becomes the primary key for the relation.
Example
From figure 3 above the corresponding relations for the relationship sets are:
- Assigned(EmpId, ProjId, TeamName)
- Owns(ProjId, CustId)
NOTE: Supporting relationships (for example WorksOn) need not be transformed to relations if
their purpose is solely for identifying a weak entity set by passing on the identifying
strong entity set’s primary key to the weak entity set; otherwise they will introduce
redundancy.
Suppose entity set E and F are related through a many-to-one relationship R from E to F, then it
is possible to join the relations for E and R that come out of this E/R model into a single relation
S with a schema consisting of:
 All attributes of the entity set E,
 The keys attributes of the entity set F, and
 All Attributes of the relationship R.
If the participation of E into R is total it is also possible to include all attributes of F in the
relation S and have one single relation S in place of the three relations E, F and R.
The primary key for S would be the primary key of E.
Example
Consider the entity sets “PROJECTS” and “CUSTOMERS” and the corresponding
relationship “Owns”, then we can have:
- Projects(ProjId, Name, SDate, DDate, CustId) and Customers(CustId, Name, Address),
or
- Projects(ProjId, Name, SDate, DDate, CustId, Name, Address)

 Representation of Generalization and Specializa tion


Hierarchical structure (Specialization and Generalization or Inheritance) in relation model can be
represented in three different ways:
1. E/R Style: One relation for each lower-level entity set and the higher-level entity set.
Every relation of the lower-level entity set will include:
 Key attribute(s) of the higher-level entity set which forms the primary key of the
entity set, and
 Attributes of that lower-level entity set.
For total and disjoint generalization the higher-level entity set may not be mapped into a
relation instead all its attributes are passed to all immediate lower-level entity sets
realtions.
2. Use of Nulls: One relation having a large set of attributes of all the lower-level entity
sets and higher-level entity set; entities have NULL in attributes that don’t belong to
them. Involves large number of NULL values for disjoint generalization.
3. Object-Oriented Approach: One relation per subset of subclasses, with all relevant
attributes including:
 Attributes of the higher-level entity set, and
 Attributes of that lower-level entity set.
The primary key of the higher-level entity set becomes the primary key of each relation.
Example
Consider the entity sets “EMPLOYEES” and its lower-level entity sets, then
- FullTimeEmployees(EmpId, Salary, Saving, Allowance)
- PartTimeEmployees(EmpId, HourlyPay, ContractPeriod)

 D epe ndencies
In a database design the two most common pitfalls that result in bad designing are:
 Repetition of information, and
 Inability to present certain information (Loss of information).

 Functional Dependencies
Functional dependency is a kind of constraint that helps to remove redundancy in relational
database design.

Defintion: Functional dependency denoted by X  A is an assertion about a relation R that


whenever two tuples of R agree on all the attributes of X, then they must also agree
on the attribute A. We say that “X  A holds in R” or “X functional determines A”
Note that in the notation X  A; X represent sets of attributes and A represent single
attribute. That is A1 A2 A3…An  B
The functional dependency is a generalization of the notion of superkey.
Example:
- Consider the Teams relation: Teams(PrjId, Name, Descr), then
PrjId, Name  Descr
- For the Employees relation:
Employees(EmpId, NationalId, Name, BDate, Age, Gender, City, HAddr, Phone)
EmpId  Name; EmpId  Age; Name BDate  Gender
A functional dependency A1 A2 A3…An  B is said to be trivial dependency if B is an element
of {A 1, A2, A3…An }.

Rules of Functional Dependenc y


Combining Rule:
The functional dependencies:

A1 A2 A3…An  B1
A1 A2 A3…An  B2
:
A1 A2 A3…An  Bm
can be written as:

A1 A2 A3…An  B1 B2 … Bm
Splitting Rule

The functional dependency A1 A2 A3…An  B1 B2 … Bm can be written as A1 A2 A3…An  Bi for


i=1, 2, 3, .. m
Closure of Attributes
Suppose {A1 , A2 , …An } is a set of attributes and S is a set of functional dependencies in
relation R. The closure of the set {A1, A2 , …An } under the functional dependency set S is the
set of attributes B that are functionally determined from the set S. That is; A1 A2 …An  B
follows from the set S. The closure set of attributes A1 , A2 , …An is denoted by {A1 , A2 ,
…An}+
The closure set of attributes can be determined by repeatedly applying the following three
rules known as Armstrong’s Axioms:
Reflexivity Rule

If α is set of attributes and β C α then, α  β holds.


Augmentation Rule

If α  β holds and γ is set of attributes, then γα  γβ holds.


Transitivity Rule

If α  β holds and β  γ holds, then α  γ holds.


Algorithm for computing the closure of X, X+ is given below.
1. Let X be a set of attributes that eventually will become the closure. First, we initialize X to be X.
2. Now, we repeatedly
1 2 msearch for some functional dependency B1 B2 …Bm  C Such that all of B , B ...B
are in the set of attributes X but C is not. We then add C to the set X.
3. Repeat step 2 as many times as necessary until no more attributes can be added to X.
4. The set X, after no more attributes can be added to it, is the closure set X+.
Example: Consider a relation with attributes A, B, C, D, E, and F. Suppose that this relation
has the functional dependencies AB  C, BC  AD, D  E, and CF  B. What is
the closure of {A, B}, that is {A, B}?
Solution:
X = {A, B}

From the function dependency AB  C, we add C to X that is X = {A, B, C}


Similarly; BC  AD  X = {A, B, C, D}
D  E  X = {A, B, C, D, E}
No more changes in X are possible. Thus {A, B}+ = {A, B, C, D, E}

From the closure set it is to follow that AB  D


Exercise: Test whether D  A flows from the functional dependency set?
To test for D  A, first determine the closure set of {D}
X = {D}

From the function dependency D  E, we add E to X that is X = {D, E}


No more changes in X are possible. Thus {D}+ = {D, E}

From the closure set D  A does not hold.

 Multivalued Dependencies
Multivalued dependency for a relation R, is defined as a constraint when the values of one set of
attributes is fixed, then the values in certain other attributes are independent of values of all the
other attributes in R.

That is; for a multivalued dependency X  Y in R where X and Y are subsets of the set of
attributes in R, if t and u are tuples in the relational instance r for the schema R, then there exist
a third tuple v that agrees:
1. with both t and u on X’s,
2. with t on Y’s, and
3. with u on all attributes of R that are not among X’s or Y’s (R – (X U Y)).

Rules of Multivalued Dependenc y


Multivalued dependency is a generalization for the functional dependency. That is;

If α  β holds, then α  β also holds.


All the rules except the splitting rule for the functional dependency are also applicable for a
multivalued dependency.
Complementation Rule
One additional rule in a multivalued dependency that does not have a counterpart in functional
dependency is the complementation rule.

The rule states that if X  Y holds then X  (R – (X U Y)), where R is a set of attributes for
the relational schema R.

 Normalization and Normal F orms


In relational databases, normalization is a process that helps to
 eliminates redundancy,
 organizes data efficiently,
 reduces the potential for anomalies during data operations, and
 improves data consistency.
The formal classifications used for quantifying "how normalized" a relational database are called
normal forms (abbreviated as NF).

 Normalization and Denormalization


Following standard database normalization recommendations when designing databases can
greatly maximize a database's performance by helping to:
 Reduce the total amount of redundant data in the database. The less data, the less work on
the RDBMS has to perform, hence, speeding its performance.
 Reduce the use of NULLS in the database. The use of NULLs in a database can greatly
reduce database performance, especially in WHERE clauses.
 Reduce the number of columns in tables. The less number of columns in tables, the more
rows can fit on a single data page, which helps to boost read performance of the
RDBMS.
 Reduce the amount of SQL code. The less code there is, the less that has to run, speeding
your application's performance.
 Maximize the use of clustered indexes. The more data is separated into multiple tables
because of normalization, the more clustered indexes become available to help speed
up data access.
 Reduce the total number of indexes. The less columns tables have, the less need there is
for multiple indexes to retrieve it. And the fewer indexes, the less negative is the
performance effect of data insertion, modification and deletion.
Redundancy in a database design results in data anomalies classified as:
 Insertion Anomalies
 Deletion Anomalies
 Modification Anomalies
Example: Consider a relation schemas for Employees and Teams in a single realtion as
follows
Emp_Teams(EmpId, Name, BDate, Gender, TeamId, Project, TeamName)
It can easily be noted that there is redundancy of data in the “Emp_Teams” relation for
the Teams detail of the Employees. Consider the following instance for the relations
Employees
EmpId Name BDate Gender TeamId Project TeamName
E001 Alemu Girma 01/10/70 M 1 1 Programmer
E001 Alemu Girma 01/10/70 M 3 2 Programmer
E004 Kelem Belete 12/04/68 M 2 1 Tester
E005 Mulu Tasew 10/05/69 F 3 2 Programmer
E008 Belachew K 02/11/62 M 1 1 Programmer
E003 Almaz B 05/06/65 F 5 3 Programmer
E005 Mulu Tasew 10/05/69 F 2 1 Tester

- Insertion Anomalies: Suppose we want to insert a new employee that works in project 1
as a programmer, then the corresponding fields for the Team
detail has to be entered correctly. If data is entered incorrectly
the consistency will be violated.
- Deletion Anomalies: Suppose E003 is to be removed from the employees list, then
Team information of TeamId 5 will also be removed and vice
versa.
- Modification Anomalies: During data update the consistency may also be violated as in
the case of insertion.
Although normalization is a way to remove redundancy anomalies and preserve consistency,
integrity and maintainability, it may also lead:
 Increase in storage space
 Complex queries (queries with many multiple joins of tables)
In such situations it may be desired to denormalize some of the tables in order to reduce storage
space and the number of required joins.
Denormalization is the process of selectively taking normalized tables and re-combining the data
in them. Sometimes the addition of a single column of redundant data to a table from another
table can reduce a 4-way join into a 2-way join, significantly boosting performance by reducing
the time it takes to perform the join.
Databases intended for Online Transaction Processing (OLTP) are normalized. By contrast,
databases intended for On Line Analytical Processing (OLAP) operations are primarily "read
only" databases and tend to extract historical data that has accumulated in the project for quite a
long time. For such databases, redundant or "denormalized" data may facilitate Business
Intelligence applications.
While denormalization can boost storage and query performance, it can also have negative
effects. For example, by adding redundant data to tables, you risk the following problems:
 More data means the RDBMS has to read more data pages than otherwise needed,
hurting performance.
 Redundant data can lead to data anomalies and bad data.
 In many cases, extra code will have to be written to keep redundant data in separate
tables in synch, which adds to database overhead.

 Normal Forms
Normalization procedure provides:
 A framework for analyzing relation schemas based on functional and multivalued
dependencies.
 A series of normal form test that can be carried out on individual relation schemas so
that the relational database can be normalized to any degree.
Normalization through decomposition need to preserve the existence of two additional
properties of a relational schema:
 Lossless or Nonadditive Join: Nonadditive join property guarantees that the spurious
tuple generation does not occur after decomposition
 Dependency Preservation: Dependency preservation ensures that each functional
dependency is presented in one of the individual relation resulting after decomposition.

First Normal Form (1NF)


A relation (table) R is in 1NF if and only if all underlying domains of attributes contain only
atomic (simple, indivisible) values, i.e. the value of any attribute in a tuple (row) must be a single
value from the domain of their attribute.
1NF allows removal of multivalued attributes, composite attributes and their combination in the
relational schema.
Normalization (Decomposition)
Form new relation for each non-atomic attribute or nested relation.

Second Normal Form (2NF)


A relation schema R is in 2NF if it is in 1NF and every non-prime attribute A in R is fully
functionally dependent on the primary key. (i.e. not partially dependent on candidate key).
Functional dependency X  Y is said to be fully functionally dependent if removal of any
attribute from X result in for the dependency not hold.
NOTE: Mostly relational schemas that are mapped carefully from E/R model are in 2NF.
Normalization (Decomposition)
Decompose and set up a new relation for each partial key with its dependent attribute(s). Make
sure to keep relation with the original primary key and any attributes that are fully functionally
dependent on it.
Example: Consider a relation schemas for Employees and Teams in a single realtion as
follows
Emp_Teams(EmpId, Name, BDate, Gender, TeamId, Project, TeamName)
EmpId  Name, BDate, Gender
TeamId  Project, TeamName
Then upon decomposition we will have
Employees(EmpId, Name, BDate, Gender)
Teams(TeamId, Project, TeamName)
Emp_Teams(EmpId, TeamId)
Third Normal Form (3NF)
3NF for a relation schema R requires that the R be in 2NF, and that there would be no nonprime
attribute of R that has transitive dependencies on the primary key. In summary, all non-key
attributes are mutually independent. Thus, any relation in which all the attributes are prime
attributes (part of some key) is guaranteed to be in at least 3NF.
That is; if X  Y is non-trivial functional dependency in R, then
X is superkey for schema R, or
 Attribute Y is a member of a candidate key (prime attribute).
Normalization (Decomposition)
Decompose and set up a relation that includes the non-key attribute(s) that functionally
determine(s) other non-key attributes.

Boyce-Codd Normal Form (BCNF)


BCNF requires that there will be no non-trivial functional dependencies of attributes on
something other than a superset of a candidate key (called a superkey). At this stage, all
attributes are dependent on a key, a whole key and nothing but a key (excluding trivial
dependencies).
A table is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-
irreducible functional dependency has a candidate key as its determinant. In more informal
terms, a table is in BCNF if it is in 3NF and the only determinants are the candidate keys.
That is; if X  Y is non-trivial functional dependency in R, then
X is superkey for schema R.
Note that major goals of database design with functional dependencies are:
 BCNF,
 Lossless join, and
 Dependency preservation;
However; in certain situations it is needed to compromise BCNF need with 3NF to preserve
dependency.
Example:
A relation that is in 3NF form but not in BCNF:

R(A, B, C, D) and F = {ABCD, BCAD, AC}


AB and BC are candidate keys, thus
AC will not violet 3NF where as it violets BCNF since A is not superkey.
A relation that is in 3NF form and in BCNF:
R(A,B) is guaranteed to be in BCNF since its only possible functional
dependencies are AB, BA and/or the trivial ABAB.
Example: Consider the Project relation from the E/R model
- Projects(ProjId, Name, SDate, DDate, CustId, Name, Address)
ProjId  Name, SDate, DDate, CustId, Name, Address
CustId  Name, Address
Then upon decomposition we will have
Projects(ProjId, Name, SDate, DDate, CustId)
Customers(CustId, Name, Address)

Fourth Normal Form (4NF)


4NF requires that there be no non-trivial multivalued dependencies of attribute sets on
something other than a superset of a candidate key.
A table is said to be in 4NF if and only if it is in the BCNF and multivalued dependencies are
functional dependencies. The 4NF removes unwanted data structures (redundancy): multivalued
dependencies.

That is; if X  Y is non-trivial multivalued dependency in R, then


X is superkey for schema R.
Example:
Consider relation R and its dependency set

R(A, B, C, D) and F = {ABCD, AB C}


Then the relation can be normalized as:
R1(A, B, C) and R1(A, B, D)

Although, there are also other higher level normalizations such as 5NF or PJNF, DKNF
and 6NF, most relational database designs are sufficiently normalized at BCNF level or even at 3NF.

You might also like