0% found this document useful (0 votes)
20 views126 pages

Chap 4 Database Constraints and Normalization

Chapter 4 discusses the relational database model, focusing on the concepts of primary and foreign keys, which establish relationships between tables. It also covers database constraints, including integrity and domain constraints, which ensure data integrity and consistency. Additionally, the chapter explains various types of keys and assertions used in database management systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views126 pages

Chap 4 Database Constraints and Normalization

Chapter 4 discusses the relational database model, focusing on the concepts of primary and foreign keys, which establish relationships between tables. It also covers database constraints, including integrity and domain constraints, which ensure data integrity and consistency. Additionally, the chapter explains various types of keys and assertions used in database management systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Chapter 4: Database

Constraints and Normalization

Compiled by: Er. ASHOK G.M 1


Relational Database Model
• Relational Model was developed by E.F. Codd,
• Most popular database model in the world,
• The data are stored on relation (table), relation in Relational Database
is similar to table where column represents attributes, and information
are stored in rows.
Attributes

ISBN Bname Price Author

000-124-456 The Old Man and The Sea Rs. 97 Ernest Hemingway

Values/inform 978-1-85326-067-4 Far from the Madding


Crowd
Rs. 200 Thomas Hardy

ation 978-81-291-0818-0 One Night @ The Call Rs. 200 Chetan Bhagat
Center

Compiled by: Er. ASHOK G.M 2


Contd…
• The concept of Primary Key and Foreign Key helps to create logical
relationship between relations.
• Primary Key is a one of the best keys that is chosen by database
designer for the purpose of uniquely identifying all the entities of entity
set.
• A combination of a NOT NULL and UNIQUE. Ensures that a column (or
combination of two or more columns) have an unique identity which
helps to find a particular record in a table more easily and quickly.
• Foreign Key is actually the primary key of one table and serves as
attribute for another table.
• Ensures the referential integrity of the data in one table to match
values in another table

Compiled by: Er. ASHOK G.M 3


Contd…
ISBN Bname Price Author

000-124-456 The Old Man and The Sea Rs. 97 Ernest Hemingway

978-1-85326-067-4 Far from the Madding Rs. 200 Thomas Hardy


Crowd
978-81-291-0818-0 One Night @ The Call Rs. 200 Chetan Bhagat
Center

Foreign Key

ID Name Grade ISBN

0001 Sanjay Sharma BBA 978-81-291-0818-0

0002 Sushil Shrestha BSC 000-124-456

0030 Samikshaya Sharma BBA 978-1-85326-067-4

Compiled by: Er. ASHOK G.M 4


Database Constraints
• Constraints enforce limits to the data or type of data that can be
inserted/updated/deleted from a table.
• The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. There are several types of constraints
that can be created in RDBMS.
Types:
• Integrity Constraints
• Domain Constraints

Compiled by: Er. ASHOK G.M 5


Integrity Constraints

➢Integrity constraints are a set of rules. It is used to maintain the


quality of information.
➢Integrity constraints ensure that the data insertion, updating, and
other processes have to be performed in such a way that data
integrity is not affected.
➢Thus, integrity constraint is used to guard against accidental
damage to the database.

Compiled by: Er. ASHOK G.M 6


Entity Integrity
- Generally applies on an attribute,
- Concerned with the presence of primary key in each relation(table),
- It advocates the following:
- Primary Key must be NOT NULL,
- Primary Key must be UNIQUE,
NOTE:
IF THERE IS NULL VALUE FOR A PRIMARY KEY, THEN IT WILL BE
UNATTAINABLE FOR US TO IDENTIFY ALL THE TUPLES
INDIVIDUALLY.

Compiled by: Er. ASHOK G.M 7


IN SQL (Entity Integrity)
CREATE TABLE Library (
ISBN INT,
Bname VARCHAR (20),
Price MONEY,
Author VARCHAR (20),
CONSTRAINT pk_id PRIMARY KEY (ISBN));

ISBN Bname Price Author


Primary Key

000-124-456 The Old Man and The Sea Rs. 97 Ernest Hemingway

978-1-85326-067-4 Far from the Madding Rs. 200 Thomas Hardy


Crowd
978-81-291-0818-0 One Night @ The Call Rs. 200 Chetan Bhagat
Center

Compiled by: Er. ASHOK G.M


Domain Constraints
• Integrity constraints guard against accidental damage to the database,
by ensuring that authorized changes to the database do not result in a
loss of data consistency.
• Domain constraints are the most elementary form of integrity
constraint.
• Domain constraints can be defined as the definition of a valid set of values
for an attribute.
• The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the
corresponding domain.

Compiled by: Er. ASHOK G.M 9


Domain Constraints (Cont.)
• They test values inserted in the database, and test queries to
ensure that the comparisons make sense.
• New domains can be created from existing data types
• E.g. create domain Dollars numeric(12, 2)
create domain Pounds numeric(12,2)
• We cannot assign or compare a value of type Dollars to a value of
type Pounds.
• However, we can convert type as below
(cast r.A as Pounds)
(Should also multiply by the dollar-to-pound conversion-rate)

Compiled by: Er. ASHOK G.M 10


• The check clause in SQL permits domains to be restricted:
Use check clause to ensure that an hourly-wage domain allows only values
greater than a specified value.
create domain hourly-wage numeric(5,2)
constraint value-test check(value > = 4.00)
The domain has a constraint that ensures that the hourly-wage is greater
than 4.00
The clause constraint value-test is optional; useful to indicate which
constraint an update violated.
• Can have complex conditions in domain check
create domain AccountType char(10)
constraint account-type-test
check (value in (‘Checking’, ‘Saving’))
check (branch-name in (select branch-name from branch))

Compiled by: Er. ASHOK G.M 11


Types of Integrity Constraint

➢Not null constraints


➢Null constraint
➢Default constraints
➢Check constraints
➢Primary key constraints
➢Referential constraints

Compiled by: Er. ASHOK G.M 12


Referential Integrity Constraint
• Ensures that a value that appears in one relation for a given set of
attributes also appears for a certain set of attributes in another
relation.
• Example: If “Perryridge” is a branch name appearing in one of the tuples in
the account relation, then there exists a tuple in the branch relation for
branch “Perryridge”.
• Formal Definition
• Let r1(R1) and r2(R2) be relations with primary keys K1 and K2 respectively.
• The subset  of R2 is a foreign key referencing K1 in relation r1, if for every t2
in r2 there must be a tuple t1 in r1 such that t1[K1] = t2[].
• Referential integrity constraint also called subset dependency since its can be
written as
 (r2)  K1 (r1)

Compiled by: Er. ASHOK G.M 13


Referential Integrity Constraints Contd..

Compiled by: Er. ASHOK G.M 14


Referential Integrity in SQL
• Primary and candidate keys and foreign keys can be specified as part of the
SQL create table statement:
• The primary key clause lists attributes that comprise the primary key.
• The unique key clause lists attributes that comprise a candidate key.
• The foreign key clause lists the attributes that comprise the foreign key
and the name of the relation referenced by the foreign key.
• By default, a foreign key references the primary key attributes of the
referenced table
foreign key (account-number) references account
• Short form for specifying a single column as foreign key
account-number char (10) references account
• Reference columns in the referenced table can be explicitly specified
• but must be declared as primary/candidate keys
foreign key (account-number) references account(account-number)

Compiled by: Er. ASHOK G.M 15


Referential Integrity in SQL – Example
create table customer
(customer-name char(20),
customer-street char(30),
customer-city char(30),
primary key (customer-name))
create table branch
(branch-name char(15),
branch-city char(30),
assets integer,
primary key (branch-name))

Compiled by: Er. ASHOK G.M 16


Referential Integrity in SQL – Example (Cont.)
create table account
(account-number char(10),
branch-name char(15),
balance integer,
primary key (account-number),
foreign key (branch-name) references branch)
create table depositor
(customer-name char(20),
account-number char(10),
primary key (customer-name, account-number),
foreign key (account-number) references account,
foreign key (customer-name) references customer)

Compiled by: Er. ASHOK G.M 17


Cascading Actions in SQL
create table account (
...
foreign key(branch-name) references branch
on delete cascade
on update cascade
...)
• Due to the on delete cascade clauses, if a delete of a tuple in
branch results in referential-integrity constraint violation,
the delete “cascades” to the account relation, deleting the
tuple that refers to the branch that was deleted.
• Cascading updates are similar.

Compiled by: Er. ASHOK G.M 18


Cascading Actions in SQL (Cont.)
• If there is a chain of foreign-key dependencies across multiple relations,
with on delete cascade specified for each dependency, a deletion or
update at one end of the chain can propagate across the entire chain.
• If a cascading update to delete causes a constraint violation that cannot be
handled by a further cascading operation, the system aborts the
transaction.
• As a result, all the changes caused by the transaction and its cascading actions are
undone.
• Referential integrity is only checked at the end of a transaction
• Intermediate steps are allowed to violate referential integrity provided later steps
remove the violation
• Otherwise it would be impossible to create some database states, e.g. insert two
tuples whose foreign keys point to each other
• E.g. spouse attribute of relation marriedperson(name, address, spouse)

Compiled by: Er. ASHOK G.M 19


Referential Integrity in SQL (Cont.)
• Alternative to cascading:
• on delete set null
• on delete set default
• Null values in foreign key attributes complicate SQL
referential integrity semantics, and are best prevented using
not null
• if any attribute of a foreign key is null, the tuple is defined to satisfy
the foreign key constraint!

Compiled by: Er. ASHOK G.M 20


Primary Key constraints
➢The Primary key (entity integrity) constraint states that primary
key value can't be null.
➢This is because the primary key value is used to identify
individual rows in relation and if the primary key has a null value,
then we can't identify those rows.
➢A table can contain a null value other than the primary key field.

Compiled by: Er. ASHOK G.M 21


Unique Key constraints

➢Keys are the entity set that is used to identify an entity within its
entity set uniquely.

Compiled by: Er. ASHOK G.M 22


KEYS in DBMS
➢Key is an attribute or set of attributes which helps you to identify a
row(tuple) in a relation(table).
➢They allow you to find the relation between two tables.
➢Keys help you uniquely identify a row in a table by a combination
of one or more columns in that table.
➢Key is also helpful for finding unique record or row from the table.
Types of Keys in Database Management System
➢Super Key
➢Primary Key
➢Candidate Key
➢Alternate Key
➢Foreign Key

Compiled by: Er. ASHOK G.M 23


Super key
A super key is a group of single or multiple keys which identifies
rows in a table. A Super key may have additional attributes that are
not needed for unique identification.

EmpSSN EmpNum Empname


9812345098 AB05 Shown
9876512345 AB06 Roslyn
199937890 AB07 James

In the above-given example, EmpSSN and EmpNum name are


superkeys.

Compiled by: Er. ASHOK G.M 24


Primary Key
➢PRIMARY KEY is a column or group of columns in a table that uniquely identify
every row in that table.
➢The Primary Key can't be a duplicate meaning the same value can't appear more
than once in the table.
➢A table cannot have more than one primary key.
Rules for defining Primary key:
➢Two rows can't have the same primary key value
➢For every row it need to have a primary key value.
➢The primary key field cannot be null.
➢The value in a primary key column can never be modified or updated if any
foreign key refers to that primary key.
➢In the following example, StudID is a Primary Key.
studID Roll No First Name LastName Email
1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]
Compiled by: Er. ASHOK G.M 25
Alternate key
➢ALTERNATE KEYS is a column or group of columns in a table that uniquely
identify every row in that table.
➢A table can have multiple choices for a primary key but only one can be set as
the primary key.
➢All the keys which are not primary key are called an Alternate Key.
Example:
In this table, StudID, Roll No, Email are qualified to become a primary key. But
since StudID is the primary key, Roll No, Email becomes the alternative key.

StudID Roll No First Name LastName Email


1 11 Tom Price [email protected]
2 12 Nick Wright [email protected]
3 13 Dana Natan [email protected]

Compiled by: Er. ASHOK G.M 26


Candidate Key
➢CANDIDATE KEY is a set of attributes that uniquely identify tuples in a table.
➢Candidate Key is a super key with no repeated attributes.
➢The Primary key should be selected from the candidate keys.
➢Every table must have at least a single candidate key.
➢A table can have multiple candidate keys but only a single primary key.

Compiled by: Er. ASHOK G.M 27


Foreign key
➢FOREIGN KEY is a column that creates a relationship between two tables.
➢The purpose of Foreign keys is to maintain data integrity and allow navigation
between two different instances of an entity.
➢It acts as a cross-reference between two tables as it references the primary key of
another table.

Compiled by: Er. ASHOK G.M 28


Assertions
• An assertion is a predicate expressing a condition that we wish the
database always to satisfy.
• An assertion in SQL takes the form
create assertion <assertion-name> check <predicate>
• When an assertion is made, the system tests it for validity, and tests it
again on every update that may violate the assertion
• This testing may introduce a significant amount of overhead; hence assertions
should be used with great care.
• Asserting
for all X, P(X)
is achieved in a round-about fashion using
not exists X such that not P(X)

Compiled by: Er. ASHOK G.M 29


Assertion Example
• The sum of all loan amounts for each branch must be less than the sum of all
account balances at the branch.
create assertion sum-constraint check
(not exists (select * from branch
where (select sum(amount) from loan
where loan.branch-name = branch.branch-name)
>= (select sum(balance) from account
where account.branch-name = branch.branch-name)))

Compiled by: Er. ASHOK G.M 30


Assertion Example
• Every loan has at least one borrower who maintains an account with a minimum
balance of $1000.00
create assertion balance-constraint check
(not exists (
select * from loan
where not exists (
select *
from borrower, depositor, account
where loan.loan-number = borrower.loan-number
and borrower.customer-name = depositor.customer-name
and depositor.account-number = account.account-number
and account.balance >= 1000)))

Compiled by: Er. ASHOK G.M 31


Triggers

• A trigger is a statement that is executed automatically by the


system as a side effect of a modification to the database.
• To design a trigger mechanism, we must:
• Specify the conditions under which the trigger is to be executed.
• Specify the actions to be taken when the trigger executes.
• Triggers introduced to SQL standard in SQL, but supported
even earlier using non-standard syntax by most databases.

Compiled by: Er. ASHOK G.M 32


Trigger Example
• Suppose that instead of allowing negative account balances, the
bank deals with overdrafts by
• setting the account balance to zero
• creating a loan in the amount of the overdraft
• giving this loan a loan number identical to the account number of the
overdrawn account
• The condition for executing the trigger is an update to the account
relation that results in a negative balance value.

Compiled by: Er. ASHOK G.M 33


Trigger Example in SQL
create trigger overdraft-trigger after update on account
referencing new row as nrow for each row
when nrow.balance < 0
begin atomic
insert into borrower
(select customer-name, account-number
from depositor
where nrow.account-number = depositor.account-number);
insert into loan values
(nrow.account-number, nrow.branch-name, – nrow.balance);
update account set balance = 0
where account.account-number = nrow.account-number
end

Compiled by: Er. ASHOK G.M 34


Triggering Events and Actions in SQL
• Triggering event can be insert, delete or update
• Triggers on update can be restricted to specific attributes
• E.g. create trigger overdraft-trigger after update of balance on account
• Values of attributes before and after an update can be referenced
• referencing old row as : for deletes and updates
• referencing new row as : for inserts and updates
• Triggers can be activated before an event, which can serve as extra
constraints. E.g. convert blanks to null.
create trigger setnull-trigger before update on r
referencing new row as nrow
for each row
when nrow.phone-number = ‘ ‘
set nrow.phone-number = null

Compiled by: Er. ASHOK G.M 35


Statement Level Triggers
• Instead of executing a separate action for each affected row,
a single action can be executed for all rows affected by a
transaction
• Use for each statement instead of for each row
• Use referencing old table or referencing new table to refer to
temporary tables (called transition tables) containing the affected
rows
• Can be more efficient when dealing with SQL statements that update
a large number of rows

Compiled by: Er. ASHOK G.M 36


When Not To Use Triggers
• Triggers were used earlier for tasks such as
• maintaining summary data (e.g. total salary of each department)
• Replicating databases by recording changes to special relations
(called change or delta relations) and having a separate process that
applies the changes over to a replica
• There are better ways of doing these now:
• Databases today provide built in materialized view facilities to
maintain summary data
• Databases provide built-in support for replication
• Encapsulation facilities can be used instead of triggers in
many cases
• Define methods to update fields
• Carry out actions as part of the update methods instead of
through a trigger

Compiled by: Er. ASHOK G.M 37


Security
• Security - protection from malicious attempts to steal or modify
data.
• Database system level
• Authentication and authorization mechanisms to allow specific users access only to required
data
• We concentrate on authorization in the rest of this chapter
• Operating system level
• Operating system super-users can do anything they want to the database! Good operating
system level security is required.
• Network level: must use encryption to prevent
• Eavesdropping (unauthorized reading of messages)
• Masquerading (pretending to be an authorized user or sending messages supposedly from
authorized users)

Compiled by: Er. ASHOK G.M 38


Security (Cont.)
• Physical level
• Physical access to computers allows destruction of data by intruders;
traditional lock-and-key security is needed
• Computers must also be protected from floods, fire, etc.
• Human level
• Users must be screened to ensure that an authorized users do not give
access to intruders
• Users should be trained on password selection and secrecy

Compiled by: Er. ASHOK G.M 39


Authorization
Forms of authorization on parts of the database:

• Read authorization - allows reading, but not modification of data.


• Insert authorization - allows insertion of new data, but not
modification of existing data.
• Update authorization - allows modification, but not deletion of
data.
• Delete authorization - allows deletion of data

Compiled by: Er. ASHOK G.M 40


Authorization (Cont.)
Forms of authorization to modify the database schema:
• Index authorization - allows creation and deletion of indices.
• Resources authorization - allows creation of new relations.
• Alteration authorization - allows addition or deletion of attributes in a
relation.
• Drop authorization - allows deletion of relations.

Compiled by: Er. ASHOK G.M 41


Granting of Privileges
• The passage of authorization from one user to another may be represented by an
authorization graph.
• The nodes of this graph are the users.
• The root of the graph is the database administrator.
• Consider graph for update authorization on loan.
• An edge Ui →Uj indicates that user Ui has granted update authorization on loan
to Uj.
U1 U4

DBA U2 U5

U3

Compiled by: Er. ASHOK G.M 42


Authorization Grant Graph
• Requirement: All edges in an authorization graph must be part of
some path originating with the database administrator
• If DBA revokes grant from U1:
• Grant must be revoked from U4 since U1 no longer has authorization
• Grant must not be revoked from U5 since U5 has another authorization path
from DBA through U2
• Must prevent cycles of grants with no path from the root:
• DBA grants authorization to U7
• U7 grants authorization to U8
• U8 grants authorization to U7
• DBA revokes authorization from U7
• Must revoke grant U7 to U8 and from U8 to U7 since there is no path
from DBA to U7 or to U8 anymore.

Compiled by: Er. ASHOK G.M 43


Security Specification in SQL
• The grant statement is used to confer authorization
grant <privilege list>
on <relation name or view name> to <user list>
• <user list> is:
• a user-id
• public, which allows all valid users the privilege granted
• A role (more on this later)
• Granting a privilege on a view does not imply granting any privileges
on the underlying relations.
• The grantor of the privilege must already hold the privilege on the
specified item (or be the database administrator).

Compiled by: Er. ASHOK G.M 44


Privileges in SQL
• select: allows read access to relation,or the ability to query using the view
• Example: grant users U1, U2, and U3 select authorization on the branch
relation:
grant select on branch to U1, U2, U3
• insert: the ability to insert tuples
• update: the ability to update using the SQL update statement
• delete: the ability to delete tuples.
• references: ability to declare foreign keys when creating relations.
• usage: In SQL-92; authorizes a user to use a specified domain
• all privileges: used as a short form for all the allowable privileges

Compiled by: Er. ASHOK G.M 45


Privilege To Grant Privileges
• with grant option: allows a user who is granted a privilege to pass
the privilege on to other users.
• Example:
grant select on branch to U1 with grant option
gives U1 the select privileges on branch and allows U1 to grant this
privilege to others

Compiled by: Er. ASHOK G.M 46


Roles
• Roles permit common privileges for a class of users can be
specified just once by creating a corresponding “role”
• Privileges can be granted to or revoked from roles, just like user
• Roles can be assigned to users, and even to other roles
• SQL:1999 supports roles
create role teller
create role manager

grant select on branch to teller


grant update (balance) on account to teller
grant all privileges on account to manager

grant teller to manager

grant teller to alice, bob


grant manager to avi

Compiled by: Er. ASHOK G.M 47


Revoking Authorization in SQL
• The revoke statement is used to revoke authorization.
revoke<privilege list>
on <relation name or view name> from <user list> [restrict|cascade]
• Example:
revoke select on branch from U1, U2, U3 cascade
• Revocation of a privilege from a user may cause other users also to lose that
privilege; referred to as cascading of the revoke.
• We can prevent cascading by specifying restrict:
revoke select on branch from U1, U2, U3 restrict
With restrict, the revoke command fails if cascading revokes are required.

Compiled by: Er. ASHOK G.M 48


Revoking Authorization in SQL (Cont.)
• <privilege-list> may be all to revoke all privileges the revokee may hold.
• If <revokee-list> includes public all users lose the privilege except those
granted it explicitly.
• If the same privilege was granted twice to the same user by different
grantees, the user may retain the privilege after the revocation.
• All privileges that depend on the privilege being revoked are also revoked.

Compiled by: Er. ASHOK G.M 49


Limitations of SQL Authorization
• SQL does not support authorization at a tuple level
• E.g. we cannot restrict students to see only (the tuples storing) their own grades
• With the growth in Web access to databases, database accesses come primarily from
application servers.
• End users don't have database user ids, they are all mapped to the same database
user id
• All end-users of an application (such as a web application) may be mapped to a single
database user
• The task of authorization in above cases falls on the application program, with no support
from SQL
• Benefit: fine grained authorizations, such as to individual tuples, can be implemented
by the application.
• Drawback: Authorization must be done in application code, and may be dispersed all
over an application
• Checking for absence of authorization loopholes becomes very difficult since it
requires reading large amounts of application code

Compiled by: Er. ASHOK G.M 50


Audit Trails
• An audit trail is a log of all changes (inserts/deletes/updates) to the
database along with information such as which user performed the
change, and when the change was performed.
• Used to track erroneous/fraudulent updates.
• Can be implemented using triggers, but many database systems
provide direct support.

Compiled by: Er. ASHOK G.M 51


Authentication
• Password based authentication is widely used, but is susceptible to sniffing on
a network
• Challenge-response systems avoid transmission of passwords
• DB sends a (randomly generated) challenge string to user
• User encrypts string and returns result.
• DB verifies identity by decrypting result
• Can use public-key encryption system by DB sending a message encrypted using
user’s public key, and user decrypting and sending the message back
• Digital signatures are used to verify authenticity of data
• E.g. use private key (in reverse) to encrypt data, and anyone can verify authenticity
by using public key (in reverse) to decrypt data. Only holder of private key could
have created the encrypted data.
• Digital signatures also help ensure nonrepudiation: sender
cannot later claim to have not created the data

Compiled by: Er. ASHOK G.M 52


Relational Database Design Using ER to Relational Mapping

Consider a COMPANY database example to illustrate the mapping procedure.


The COMPANY ER schema is shown below:

Compiled by: Er. ASHOK G.M 53


• The corresponding COMPANY relational database schema is shown below:

Compiled by: Er. ASHOK G.M 54


• Step 1: Mapping of Regular Entity Types. For each regular (strong) entity type E in the
ER schema, create a relation R that includes all the simple attributes of E. Include only
the simple component attributes of a composite attribute. Choose one of the key
attributes of E as the primary key for R. If the chosen key of E is a com-posite, then the
set of simple attributes that form it will together form the primary key of R.
• If multiple keys were identified for E during the conceptual design, the information
describing the attributes that form each additional key is kept in order to specify
secondary (unique) keys of relation R. Knowledge about keys is also kept for index-ing
purposes and other types of analyses.
• Step 2: Mapping of Weak Entity Types. For each weak entity type W in the ER schema
with owner entity type E, create a relation R and include all simple attrib-utes (or
simple components of composite attributes) of W as attributes of R. In addition,
include as foreign key attributes of R, the primary key attribute(s) of the relation(s)
that correspond to the owner entity type(s); this takes care of mapping the identifying
relationship type of W. The primary key of R is the combination of the primary key(s) of
the owner(s) and the partial key of the weak entity type W, if any.
• If there is a weak entity type E2 whose owner is also a weak entity type E1, then E1
should be mapped before E2 to determine its primary key first.

Compiled by: Er. ASHOK G.M 55


• Step 3: Mapping of Binary 1:1 Relationship Types. For each binary 1:1
relationship type R in the ER schema, identify the relations S and T that
correspond to the entity types participating in R. There are three possible
approaches: (1) the foreign key approach, (2) the merged relationship
approach, and (3) the cross-reference or relationship relation approach.
• Foreign key approach: Choose one of the relations—S, say—and include as a foreign
key in S the primary key of T. It is better to choose an entity type with total participation
in R in the role of S. Include all the simple attributes (or simple components of
composite attributes) of the 1:1 relationship type R as attributes of S.
• Merged relation approach: An alternative mapping of a 1:1 relationship type is to
merge the two entity types and the relationship into a single relation. This is possible
when both participations are total, as this would indicate that the two tables will have
the exact same number of tuples at all times.
• Cross-reference or relationship relation approach: The third option is to set up a third
relation R for the purpose of cross-referencing the primary keys of the two relations S
and T representing the entity types.

Compiled by: Er. ASHOK G.M 56


• Step 4: Mapping of Binary 1:N Relationship Types. For each regular binary 1:N
relationship type R, identify the relation S that represents the participating
entity type at the N-side of the relationship type.
• Include as foreign key in S the primary key of the relation T that represents the
other entity type participating in R; we do this because each entity instance on
the N-side is related to at most one entity instance on the 1-side of the
relationship type. Include any simple attributes (or simple components of
composite attributes) of the 1:N relationship type as attributes of S.
• Step 5: Mapping of Binary M:N Relationship Types. For each binary M:N
relationship type R, create a new relation S to represent R. Include as foreign
key attributes in S the primary keys of the relations that represent the
participating entity types; their combination will form the primary key of S.
Also include any simple attributes of the M:N relationship type (or simple
components of composite attributes) as attributes of S.
• Notice that we cannot represent an M:N relationship type by a single foreign
key attribute in one of the participating relations (as we did for 1:1 or 1:N
relationship types) because of the M:N cardinality ratio; we must create a
separate relationship relation S.

Compiled by: Er. ASHOK G.M 57


• Step 6: Mapping of Multivalued Attributes. For each multivalued attribute A,
create a new relation R. This relation R will include an attribute corresponding
to A, plus the primary key attribute K—as a foreign key in R—of the relation
that represents the entity type or relationship type that has A as a multivalued
attribute. The primary key of R is the combination of A and K. If the
multivalued attribute is composite, we include its simple components.
• Step 7: Mapping of N-ary Relationship Types. For each n-ary relationship type
R, where n > 2, create a new relation S to represent R.
• Include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types. Also include any simple attributes of
the n-ary relationship type (or simple components of composite attributes) as
attributes of S. The primary key of S is usually a combination of all the foreign
keys that reference the relations representing the participating entity types.
• However, if the cardinality constraints on any of the entity types E
participating in R is 1, then the primary key of S should not include the foreign
key attribute that references the relation E corresponding to E

Compiled by: Er. ASHOK G.M 58


Informal Design Guidelines for Relational Schema
• Making sure that the semantics of the attributes is clear in the schema
• Reducing the redundant information in tuples
• Reducing the NULL values in tuples
• Disallowing the possibility of generating spurious tuples
• Spurious Tuples are those rows in a table, which occur as a result of joining two tables in
wrong manner. They are extra tuples (rows) which might not be required.

Compiled by: Er. ASHOK G.M 59


Functional Dependency
▪ Functional Dependency determines the relation of one attribute to another
attribute in a database management system (DBMS) system.
▪ Functional dependency helps you to maintain the quality of data in the database.
▪ A functional dependency is denoted by an arrow →.
▪ The functional dependency of X on Y is represented by X → Y.
▪ Functional Dependency plays a vital role to find the difference between good and
bad database design.

Compiled by: Er. ASHOK G.M 60


Functional Dependencies (Cont.)

• Let R be a relation schema


  R and   R
• The functional dependency
→
holds on R if and only if for any legal relations r(R), whenever
any two tuples t1 and t2 of r agree on the attributes , they also
agree on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
• Example: Consider r(A,B ) with the following instance of r.
A B
1 4
1 5
3 7
• On this instance, A → B does NOT hold, but B → A does hold.

Compiled by: Er. ASHOK G.M 61


For example:
Assume we have an Student table with attributes: Roll_no, Name, GPA.

Here Roll_no attribute can uniquely identify the Name attribute of Student table
because if we know the Roll_no, we can tell that student name associated with it.
Functional dependency can be written as:
Roll_no → Name
We can say that Name is functionally dependent on Roll_no.

Roll_no → Name True


Roll_no Name GPA Roll_no →GPA True
1 Prabin 3 1 → Prabin
2 Suman 3 2 → Suman
3 → Suman
3 Suman 3.5
GPA → NAME False
4 Tilak 2.5 3 → Prabin
3 → Suman
Name → GPA False
Compiled by: Er. ASHOK G.M 62
Types of Functional dependency

• Fully-Functional Dependency
• Partial Dependency
• Transitive Dependency
• Multivalued Dependency
• Trivial Functional Dependency
• Non trivial Functional Dependency

Compiled by: Er. ASHOK G.M 63


Fully Functional Dependency
• An attribute is fully functional dependent on another attribute, if it is
Functionally Dependent on that attribute and not on any of its proper
subset.
For example, an attribute Q is fully functional dependent on
another attribute P, if it is Functionally Dependent on P and not on
any of the proper subset of P.

ProjectID → ProjectCost
Here ProjectCost is fully functionally dependent on ProjectID

Compiled by: Er. ASHOK G.M 64


• {EmpID, ProjectID} → {Days} exist and fully functional dependency
occurs.
• the subset {EmpID, ProjectID} can easily determine the {Days} spent on
the project by the employee.
EmpID→Days FD doesn’t exist.

Compiled by: Er. ASHOK G.M 65


Partial Dependency
• Partial Dependency occurs when a non-prime attribute is
functionally dependent on part of a candidate key.

• In the above table, we have partial dependency;


• The prime key attributes are StudentID and ProjectNo.
• As stated, the non-prime attributes i.e. StudentName and ProjectName
should be functionally dependent on part of a candidate key, to be Partial
Dependent.
• FD’s:
{StudentID, ProjectNo} → StudentName
{StudentID, ProjectNo} → ProjectName
Compiled by: Er. ASHOK G.M 66
StudentID → StudentName TRUE
ProjectNo → ProjectName TRUE

• The StudentName can be determined by StudentID that makes


the relation Partial Dependent.
• The ProjectName can be determined by ProjectID, which that the
relation Partial Dependent.

Compiled by: Er. ASHOK G.M 67


Transitive Dependency
• A functional dependency is said to be transitive if it is indirectly formed
by two functional dependencies. For e.g.
X -> Z is a transitive dependency if the following three functional
dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three or more
attributes.
Example:

Compiled by: Er. ASHOK G.M 68


{Company} -> {CEO} (if we know the compay, we know its CEO's
name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we
know the company name, we can know his age.

Compiled by: Er. ASHOK G.M 69


Trivial Functional Dependency
The Trivial dependency is a set of attributes which are called a trivial
if the set of attributes are included in that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X.
X∩Y≠ϕ

Consider this table of with two columns Emp_id and Emp_name.


{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency
as Emp_id is a subset of {Emp_id,Emp_name}.

Compiled by: Er. ASHOK G.M 70


Non-Trivial Functional Dependency
Functional dependency which also known as a nontrivial
dependency occurs when A->B holds true where B is not a subset of
A. In a relationship, if attribute B is not a subset of attribute A, then
it is considered as a non-trivial dependency.
When A ∩ B = ϕ , then A → B is called as complete non-trivial.
Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional
dependency.

Compiled by: Er. ASHOK G.M 71


Multivalued Dependency
Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table. A multivalued dependency is
a complete constraint between two sets of attributes in a relation. It requires that
certain tuples be present in a relation.
Example: Name ProjectNo Hobby
A P1 FOOTBALL
A P2 FOOTBALL
B P3 GULF
B P3 TENNIS

In this example, projectNo and Hobby are independent of each other but
dependent on Name. In this example, these two columns are said to be
multivalue dependent on Name.
This dependence can be represented like this:
Name →→ ProjectNo
Name →→ Hobby

Compiled by: Er. ASHOK G.M 72


Inference Rule (IR):

▪The Armstrong's axioms are the basic inference rule.


▪Armstrong's axioms are used to conclude functional dependencies
on a relational database.
▪The inference rule is a type of assertion. It can apply to a set of
FD(functional dependency) to derive other FD.
▪Using the inference rule, we can derive additional functional
dependency from the initial set.
▪The Functional dependency has 7 types of inference rule:

Compiled by: Er. ASHOK G.M 73


1. Reflexive Rule (IR1)
In the reflexive rule, if Y is a subset of X, then X determines Y.
If Y  X then X → Y
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In
augmentation, if X determines Y, then XZ determines YZ for any Z.
If X → Y then XZ → YZ
SID → NAME then {SID ,PHONE } → {NAME,PHONE}
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X
must also determine Z.
If X → Y and Y → Z then X → Z
SID → NAME and NAME → CITY then SID → CITY

Compiled by: Er. ASHOK G.M 74


4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must
also determine Y and Z.
If X → Y and X → Z then X → YZ
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of
union rule.
This Rule says, if X determines Y and Z, then X determines Y and X
determines Z separately.
If X → YZ then X → Y and X → Z
6. Pseudo transitive Rule (IR6)
In Pseudo transitive Rule, if X determines Y and YZ determines W,
then XZ determines W.
If X → Y and YZ → W then XZ → W

Compiled by: Er. ASHOK G.M 75


4. Self-determination Rule(IR7)

A→A

Compiled by: Er. ASHOK G.M 76


1.Let set of FD:
F= {A→B, C →X,BX →Z} Prove AC →Z
Soln: C →X and BX →Z then
BC →Z (pseudo Transitive)
BC→Z and A→B Then
AC →Z (pseudo Transitive)
2. Let F= {A→B, C →D,C  B} Prove A →C
(Reflexivity rule)
Soln: C  B then B →C
A→B and B →C then A →C (transitive)
3. Let F= {A→B, BC →D} Prove/Disprove AC →D, B → D, AD →B
(pseudo transitive) proved
Soln: A→B, BC →D then AC →D
B →D disprove
A →B Then AD →BD (Augmentation)
AD →B, AD →D (Decomposition) proved

Compiled by: Er. ASHOK G.M 77


Practice Session

• Let F= { XY-> W, Y->Z, WZ-> P, WP->QR, Q-> X}


Show: 1) XY-> P
2)QP->X
3) XYP->R
4)WP->X
5) Q  WP

Compiled by: Er. ASHOK G.M 78


Closure of a Set of Functional Dependencies
• Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• For example: If A → B and B → C, then we can infer that A → C
• The set of all functional dependencies logically implied by F is the
closure of F.
• We denote the closure of F by F+.
• We can find all of F+ by applying Armstrong’s Axioms:
• if   , then  →  (reflexivity)
• if  → , then   →   (augmentation)
• if  → , and  → , then  →  (transitivity)
• These rules are
• sound (generate only functional dependencies that actually hold) and
• complete (generate all functional dependencies that hold).
Closure of a set of Functional Dependencies (Cont.)
• We can further simplify manual computation of F+ by
using the following additional rules.
• If  →  holds and  →  holds, then  →   holds (union)
• If  →   holds, then  →  holds and  →  holds
(decomposition)
• If  →  holds and   →  holds, then   →  holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.
Procedure for Computing F+

• To compute the closure of a set of functional dependencies F:

F+=F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
Example
Given a relation R = (A, B, C, G, H, I) with a set of FD F = { A → B, A → C,
CG → H, CG → I, B → H}. Find the closure of FD, F+.
A→H by transitivity from A → B and B → H
AG → I by augmenting A → C with G, to get AG → CG
and then transitivity with CG → I
CG → HI by augmenting CG → I to infer CG → CGI,
and augmenting of CG → H to infer CGI → HI, and then transitivity
Practice:
1. Given R= { A,B,C,D} and set of FDs: F={ A→BC, B→AC, C→AB} find F+ .
2. Given R={ A,B,C,D,E} and F be the set of FDs: F={ A→BC, B→CD, E→ A,
C→ED}. Find the closure set of FD.

Compiled by: Er. ASHOK G.M 83


Closure of Attribute Sets

• Given a set of attributes  the closure of  under F


(denoted by +) are the set of attributes that are functionally
determined by  under F
• Algorithm to compute +, the closure of  under F

result := ;
while (changes to result) do
for each  →  in F do
begin
if   result then result := result  
end
Example of Attribute Set Closure
• R = (A, B, C, G, H, I)
• F = {A → B, A → C ,CG → H,CG → I, B → H}
• (AG)+
1. result = AG
2. result = ABCG (A → C and A → B)
3. result = ABCGH (CG → H and CG  AGBC)
4. result = ABCGHI (CG → I and CG  AGBCH)
• Is AG a candidate key?
1. Is AG a super key?
1. Does AG → R? == Is (AG)+  R YES
2. Is any subset of AG a superkey?
1. Does A → R? == Is (A)+  R Result of A+= {ABCH} So NO
2. Does G → R? == Is (G)+  R
Practice:
1. Given R= {A,B,C,D,E,F} and the set of Fds: { A→B,C→DE,
AC→F,D→AF,E→CF}. Determine closure set of attributes for A, B, C,
DE
2. Given R= {A,B,C,D,E,F,G} and the set of Fds: { A→B,BC→DE, AEG→G}.
Determine attribute Closure A+, AC+, ABC+

Compiled by: Er. ASHOK G.M 86


Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
• Testing for superkey:
• To test if  is a superkey, we compute +, and check if + contains all
attributes of R.
• Testing functional dependencies
• To check if a functional dependency  →  holds (or, in other words, is in F+),
just check if   +.
• That is, we compute + by using attribute closure, and then check if it
contains .
• Is a simple and cheap test, and very useful
• Computing closure of F
• For each   R, we find the closure +, and for each S  +, we output a
functional dependency  → S.
Covers and Equivalence
• Suppose a given set of FDs F, F+ is a closure of F and contains all FDs that can
be derived from F. Suppose we have another set of FDs G. We say that F and
G are equivalent iff F+= G+
• If a set of F and G are equivalent, we consider one to be representative of
the other or one covers the other.
• A set of FD E is said to be covered by a set of FD F, if every FD in E is also in F+.
Example: F={ A→B, A→ C} and G={ A→ B and B→ C} determine covers.

Compiled by: Er. ASHOK G.M 88


Non redundant Covers
• For a given set of FDs; if proper subset G of F covers F then, F is redundant
and we can remove some FD which are redundant to make the whole FD
non-redundant.
• Steps to find redundant Covers
1. Start with the FD to check redundant.
2. Find the attribute closure of determinant excluding the FD itself.
3. If the closure consist of all the attribute sets of the relation, then it is
redundant.
4. Remove the redundant sets from the list to make it non redundant
cover.
Example: Let R={ A,B,C,D} and a set of FDs be F={A→B, B→C, BC→D, DA→B}.
Find the non redundant cover.

Compiled by: Er. ASHOK G.M 89


Extraneous Attributes
• Consider a set F of functional dependencies and the functional
dependency  →  in F.
• Attribute A is extraneous in  if A  
and F logically implies (F – { → })  {( – A) → }.
• Attribute A is extraneous in  if A  
and the set of functional dependencies
(F – { → })  { →( – A)} logically implies F.
• Note: implication in the opposite direction is trivial in each of the
cases above, since a “stronger” functional dependency always
implies a weaker one
• Example: Given F = {A → C, AB → C }
• B is extraneous in AB → C because {A → C, AB → C} logically implies A →
C (I.e. the result of dropping B from AB → C).
• Example: Given F = {A → C, AB → CD}
• C is extraneous in AB → CD since AB → C can be inferred even after
deleting C
Canonical Cover
• Sets of functional dependencies may have redundant
dependencies that can be inferred from the others
• For example: A → C is redundant in: {A → B, B → C} since it can be
inferred from the given FD’s
• Parts of a functional dependency may be redundant
• E.g.: on RHS: {A → B, B → C, A → CD} can be simplified to
{A → B, B → C, A → D}
• E.g.: on LHS: {A → B, B → C, AC → D} can be simplified to
{A → B, B → C, A → D}
• Intuitively, a canonical cover of F is a “minimal” set of
functional dependencies equivalent to F, having no
redundant dependencies or redundant parts of
dependencies
Testing if an Attribute is Extraneous
• Consider a set F of functional dependencies and the functional
dependency  →  in F.
• To test if attribute A   is extraneous in 
• compute ({} – A)+ using the dependencies in F
• check that ({} – A)+ contains ; if it does, A is extraneous in 
• To test if attribute A   is extraneous in 
• compute + using only the dependencies in
F’ = (F – { → })  { →( – A)},
1. check that + contains A; if it does, A is extraneous in 
Canonical Cover
• A canonical cover for F is a set of dependencies Fc such that
• F logically implies all dependencies in Fc, and
• Fc logically implies all dependencies in F, and
• No functional dependency in Fc contains an extraneous attribute, and
• Each left side of functional dependency in Fc is unique.
• To compute a canonical cover for F:
repeat
Use the union rule to replace any dependencies in F
1 → 1 and 1 → 2 with 1 → 1 2
Find a functional dependency  →  with an
extraneous attribute either in  or in 
If an extraneous attribute is found, delete it from  → 
until F does not change
• Note: Union rule may become applicable after some extraneous
attributes have been deleted, so it has to be re-applied
Computing a Canonical Cover
• R = (A, B, C)
F = {A → BC ,B → C, A → B, AB → C}, compute the canonical cover Fc
• Decompose A → BC into A → B into A → BC
• Set is now {A → BC, B → C, AB → C}
• A is extraneous in AB → C
• Check if the result of deleting A from AB → C is implied by the other dependencies
• Yes: in fact, B → C is already present!
• Set is now {A → BC, B → C}
• C is extraneous in A → BC
• Check if A → C is logically implied by A → B and the other dependencies
• Yes: using transitivity on A → B and B → C.
• Can use attribute closure of A in more complex cases
• The canonical cover is: A→B
B→C
Practice Session

• Let R(P,Q,R,S) AND FD’S={P->QR,Q->R,P->Q, PQ->S}. FIND Fc.

Compiled by: Er. ASHOK G.M 95


Decomposition of a Relation

• Decomposition is the process of breaking down a complex relation


into simpler structures(tables). It is the fundamental process during
Normalization.
• Properties:
• Lossy Decomposition
• Loss Less Decomposition
• Attribute Preservation
• Dependency Preservation
• Lack of Redundancies

Compiled by: Er. ASHOK G.M 96


Lossless-join Decomposition
• For the case of R = (R1, R2), we require that for all possible relations r on
schema R
r = R1 (r ) R2 (r )
• A decomposition of R into R1 and R2 is lossless join if and only if at least
one of the following dependencies is in F+:
• R1  R2 → R1
• R1  R2 → R2
Example
• R = (A, B, C)
F = {A → B, B → C)
• Can be decomposed in two different ways
• R1 = (A, B), R2 = (B, C)
• Lossless-join decomposition:
R1  R2 = {B} and B → BC
• Dependency preserving
• R1 = (A, B), R2 = (A, C)
• Lossless-join decomposition:
R1  R2 = {A} and A → AB
• Not dependency preserving
(cannot check B → C without computing R1 R2)
• R1=(A,C), R2 =( B,C) ??
Dependency Preservation
• Let Fi be the set of dependencies F + that include only attributes in Ri
• A decomposition is dependency preserving, if
(F1  F2  …  Fn )+ = F +
• If it is not, then checking updates for violation of functional
dependencies may require computing joins, which is expensive.
Testing for Dependency Preservation
• To check if a dependency  →  is preserved in a decomposition of R into R1,
R2, …, Rn we apply the following test (with attribute closure done with
respect to F)
• result = 
while (changes to result) do
for each Ri in the decomposition
t = (result  Ri)+  Ri
result = result  t
• If result contains all attributes in , then the functional dependency
 →  is preserved.
• We apply the test on all dependencies in F to check if a decomposition is
dependency preserving
• This procedure takes polynomial time, instead of the exponential time
required to compute F+ and (F1  F2  …  Fn)+
Definition: Normalization
▪ Database Normalization is a technique of organizing the data in the
database.
▪ Normalization is a systematic approach of decomposing tables to eliminate
data redundancy(repetition) and undesirable characteristics like Insertion,
Update and Deletion Anomalies.
▪ It is a multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
▪ Normalization is used for mainly two purposes:
▪ Eliminating redundant(useless) data.
▪ Ensuring data dependencies make sense i.e data is logically stored.
▪ A properly normalized database should have the following
characteristics:
• Scalar values in each fields
• Absence of redundancy.
• Minimal use of null values.
• Minimal loss of information.
Compiled by: Er. ASHOK G.M 101
Levels of Normalization
• Levels of normalization based on the amount of redundancy
in the database.
• Various levels of normalization are:
• First Normal Form (1NF)
• Second Normal Form (2NF)

Number of Tables
• Third Normal Form (3NF)

Redundancy

Complexity
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
• Domain Key Normal Form (DKNF)

Most databases should be 3NF or BCNF in order to avoid the database anomalies.

Compiled by: Er. ASHOK G.M 102


Levels of Normalization
1NF
2NF BCNF
3NF
4NF
5NF
DKNF

Each higher level is a subset of the lower level

Compiled by: Er. ASHOK G.M 103


Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized.
These are :
▪Insertion anomaly
▪Update anomaly
▪Deletion anomaly
Example: Suppose a manufacturing company stores the employee details in a
table named Employee that has four attributes: emp_id, emp_name, emp_address
and emp_dept for storing the department details in which the employee works. At
some point of time the table looks like this:
emp_id emp_name emp_address emp_dept
The above table is not
101 Sabin Pulchowk D001 normalized. We will
101 Sabin Pulchowk D002 see the problems that
123 Mohan New Road D890
we face when a table is
not normalized.
166 Rabin Kalimati D900

166 Rabin Kalimati D004

104
Update anomaly: In the above table we have two rows for employee Sabin
as he belongs to two departments of the company. If we want to update the
address of Sabin then we have to update the same in two rows or the data
will become inconsistent. If somehow, the correct address gets updated in
one department but not in other then as per the database, Sabin would be
having two different addresses, which is not correct and would lead to
inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we would not be
able to insert the data into the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the
department D890 then deleting the rows that are having emp_dept as D890
would also delete the information of employee Mohan since he is assigned
only to this department.

105
First Normal Form
• We say a relation is in 1NF if all values stored in the relation are single-
valued and atomic.
• 1NF places restrictions on the structure of relations.
• Values must be simple.

Compiled by: Er. ASHOK G.M 106


First Normal Form
The following in not in 1NF

EmpNum EmpPhone EmpDegrees


123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

EmpDegrees is a multi-valued field:


employee 679 has two degrees: BSc and MSc
employee 333 has three degrees: BA, BSc, PhD

91.2914 107
First Normal Form
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

To obtain 1NF relations we must, without loss of


information, replace the above with two relations ..

91.2914 108
First Normal Form
EmployeeDegree
Employee
EmpNum EmpDegree
EmpNum EmpPhone
333 BA
123 233-9876
333 BSc
333 233-1231
333 PhD
679 233-1231
679 BSc
679 MSc

An outer join between Employee and EmployeeDegree will


produce the information we saw before

91.2914 109
Second Normal Form
Second Normal Form
A relation is in 2NF if it is in 1NF, and every non-key attribute
is fully dependent on each candidate key. (That is, we don’t
have any partial functional dependency.)

• 2NF (and 3NF) both involve the concepts of key and


non-key attributes.
• A key attribute is any attribute that is part of a key;
any attribute that is not a key attribute, is a non-key
attribute.
• Relations that are not in BCNF have data redundancies
• A relation in 2NF will not have any partial dependencies

91.2914 110
Second Normal Form
Consider this InvoiceLine table. Table InvoiceLine is only in 1NF
InvNum LineNum ProdNum Qty InvDate
InvNum, LineNum ProdNum, Qty
There are two
candidate keys.
Qty is the only non-
key attribute, and it is
InvNum InvDate
dependent on InvNum
Table InvoiceLine is not 2NF since there is
a partial dependency of InvDate on InvNum

91.2914 111
Second Normal Form
InvoiceLine
InvNum LineNum ProdNum Qty InvDate
The above relation has redundancies: the invoice date is
repeated on each invoice line number.
We can improve the database by decomposing the relation
into two relations:
InvNum LineNum ProdNum Qty

InvNum InvDate

91.2914 112
Is the following relation in 2NF? Prod_no → prod_desc;
Transitive dependency
occurs. So it is not in 3NF

inv_no line_no prod_no prod_desc qty

YES in 2NF, but not in 3NF, nor in BCNF:

91.2914 113
Third Normal Form
• A database is in third normal form if it satisfies the following conditions:
▪ It is in second normal form
▪ There is no transitive functional dependency
• By transitive functional dependency, we mean we have the following
relationships in the table: A is functionally dependent on B, and B is
functionally dependent on C. In this case, C is transitively dependent on A
via B.
• This definition of 3NF differs from BCNF only in the specification of non-
key attributes - 3NF is weaker than BCNF. (BCNF requires all determinants
to be candidate keys.)
• A relation in 3NF will not have any transitive dependencies
of non-key attribute on a candidate key through another non-key attribute.

91.2914 114
Third Normal Form
Consider this Employee relation Candidate keys
are? …

EmpNum EmpName DeptNum DeptName

EmpName, DeptNum, and DeptName are non-key attributes.


DeptNum determines DeptName, a non-key attribute, and
DeptNum is not a candidate key.
Is the relation in 3NF? … no Is the relation in
BCNF? … no
Is the relation in 2NF? --yes

91.2914 115
Third Normal Form

EmpNum EmpName DeptNum DeptName

We correct the situation by decomposing the original relation


into two 3NF relations. Note the decomposition is lossless.

EmpNum EmpName DeptNum DeptNum DeptName

91.2914 116
Boyce-Codd Normal Form or BCNF

• Boyce-Codd Normal Form or BCNF is an extension to the third


normal form, and is also known as 3.5 Normal Form.
• For a table to satisfy the Boyce-Codd Normal Form, it should
satisfy the following two conditions:
▪ It should be in Third Normal Form.
▪ For any dependency A → B, A should be a super key. It means,
that for a dependency A → B, A cannot be a non-prime attribute,
if B is a prime attribute.

Compiled by: Er. ASHOK G.M 117


In 3NF, but not in BCNF:

Instructor teaches one


course only.
student_no course_no instr_no
Student takes a course
and has one instructor.

{student_no, course_no} → instr_no


instr_no → course_no

since we have instr_no → course-no, but instr_no is not a


Candidate key so it is not in BCNF.

91.2914 118
student_no course_no instr_no

student_no instr_no

course_no instr_no

{student_no, instr_no} → student_no


{student_no, instr_no} → instr_no
instr_no → course_no
91.2914 119
4th Normal Form
For a table to satisfy the Fourth Normal Form, it should satisfy the
following two conditions:
▪It should be in the Boyce-Codd Normal Form.
▪And, the table should not have any Multi-valued Dependency.
S_id Course Hobby
1 C# music

1 C# dance

2 C# dance

2 Php dance

In this example, course and hobby are independent of each other


but dependent on s_id. In this example, these two columns are said
to be multi-valued dependent on s_id.
120
To make the above relation satisfy the 4th normal form, we can
decompose the table into 2 tables.
Course
S_id Course
1 C#
2 C#
2 Php

hobby
S_id Hobby
1 music
1 dance
2 dance

Now this relation satisfies the fourth normal form.


121
Fifth Normal Form
• A relation is in 5NF if every join dependency in the relation is
implied by the keys of the relation
• Implies that relations that have been decomposed in
previous normal forms can be recombined via natural joins
to recreate the original relation.

Compiled by: Er. ASHOK G.M 122


Fifth Normal Form (5NF)
• Fifth normal form is satisfied when all tables are broken
into as many tables as possible in order to avoid
redundancy. Once it is in fifth normal form it cannot be
broken into smaller relations without changing the facts or
the meaning.

Compiled by: Er. ASHOK G.M 123


Domain Key Normal Form (DKNF)
• A relation is in DKNF when insertion or delete anomalies are not present in the
database. Domain-Key Normal Form is the highest form of Normalization. The reason is
that the insertion and updation anomalies are removed.
• The constraints are verified by the domain and key constraints.
• A table is in Domain-Key normal form only if it is in 4NF, 3NF and other normal forms. It
is based on constraints:
Domain Constraint
Values of an attribute had some set of values, for example, EmployeeID should be four
digits long:

General Constraint EmpID EmpName EmpAge


Predicate on the set of all relations. 0921 Hari 25
Every constraint should be a logical
0922 Geeta 24
sequence of the domain constraints
and key constraints applied to the
relation. The practical utility of DKNF is Key Constraint
less. An attribute or its combination
is a candidate key

Compiled by: Er. ASHOK G.M 124


Denormalization
• Denormalization is a database optimization technique in which we add
redundant data to one or more tables. This can help us avoid costly joins in a
relational database.
• Note that denormalization does not mean ‘reversing normalization’ or ‘not to
normalize’. It is an optimization technique that is applied after normalization.
• Basically, The process of taking a normalized schema and making it non-
normalized is called denormalization, and designers use it to tune the
performance of systems to support time-critical operations.
• For example, in a normalized database, we might have a Courses table and a
Teachers table. Each entry in Courses would store the teacherID for a Course but not
the teacherName. When we need to retrieve a list of all Courses with the Teacher’s
name, we would do a join between these two tables.
• Denormalization, then, strikes a different compromise. Under denormalization, we
decide that we’re okay with some redundancy and some extra effort to update the
database in order to get the efficiency advantages of fewer joins by adding an extra
attribute teacherName in Courses Table.

Compiled by: Er. ASHOK G.M 125


• Pros of Denormalization:
• Retrieving data is faster since we do fewer joins
• Queries to retrieve can be simpler(and therefore less likely to have bugs),
since we need to look at fewer tables.
• Cons of Denormalization:
• Updates and inserts are more expensive.
• Denormalization can make update and insert code harder to write.
• Data may be inconsistent.
• Data redundancy necessitates more storage.

Compiled by: Er. ASHOK G.M 126

You might also like