0% found this document useful (0 votes)
16 views

Unit II (2) Object Oriented Programming

Uploaded by

aishwarya patil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Unit II (2) Object Oriented Programming

Uploaded by

aishwarya patil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

UNIT – II

Relational Database Design

Reference: Database System Concepts (6th edition) by Silberschatz A., Korth H.,Sudarshan
S.
Outline
 Relational Model: Basic concepts, Attributes and Domains
 Relational Integrity: Domain, Referential Integrities
 CODD's Rules
 Features of Good Relational Design
 Atomic Domains and First Normal Form
 Decomposition Using Functional Dependencies
 Functional Dependency Theory
 Normalization
Relational Model
 Relational model stores data in the form of tables. This
concept
proposed by Dr. E.F. Codd, a researcher of IBM in the year 1960s.

 The relational model consists of three major components:


 The set of relations and set of
 Integrity rules
 The operations

 A rational model database is defined as a database that allows you to


group its data items into one or more independent tables that can be
related to one another by using fields common to each related table.
Basic Structure
 Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus a relation is a set of n-tuples (a1, a2, …, an) where
each ai  Di
 Example: if
customer-name = {Jones, Smith, Curry}
customer-street = {Main, North, Park}
customer-city = {Harrison, Rye,
Then Pittsfield}
r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye)}
is a relation over customer-name x customer-street x customer-city
Characteristics of Relational Database
 Relational database systems have the following characteristics:

 The whole data is conceptually represented as an orderly


arrangement of data into rows and columns, called a relation or
table.
 All values are scalar. That is, at any given row/column position in
the relation there is one and only one value.
 All operations are performed on an entire relation and result is an
entire relation, a concept known as closure.
Basic Terminology used in Relational Model

 The figure shows a relation with the. Formal names of the basic
components marked the entire structure is, as we have said, a
relation.

 A single row of a table, which contains a single record for that relation
is called a tuple.. Actually, each row is an n-tuple, but the "n-" is
usually dropped.
Basic Terminology used in Relational Model

 Relation instance − A finite set of tuples in the relational database


system represents relation instance. Relation instances do not have
duplicate tuples.
 Relation schema − A relation schema describes the relation name (table
name), attributes, and their names.
A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
E.g. Customer-schema = (customer-name, customer-street,
customer-city)
 Cardinality of a relation: The number of tuples in a relation
determines
its cardinality. In this case, the relation has a cardinality of 4.
 Degree of a relation: Each column in the tuple is called an attribute. The
number of attributes in a relation determines its degree. The relation in
figure has a degree of 3.

 Domains: A domain definition specifies the kind of data represented by


the attribute.
Basic Terminology used in Relational Model

 Keys are very important part of Relational database. They are used to
establish and identify relation between tables. They also ensure that
each record within a table can be uniquely identified by combination of
one or more fields within a table.
 Super Key –

Super Key is defined as a set of attributes within a table that uniquely


identifies each record within a table. Super Key is a superset of
Candidate key.
 Candidate Key –

Candidate keys are defined as the set of fields from which primary key
can be selected. It is an attribute or set of attribute that can act as a
primary key for a table to uniquely identify each record in that table.
 Primary Key

Primary key is a candidate key that is most appropriate to become


main key of the table. It is a key that uniquely identify each record in a
table.
Basic Terminology used in Relational Model

 Primary Key

Primary key is a candidate key that is most appropriate to become main


key of the table. It is a key that uniquely identify each record in a table.
Basic Terminology used in Relational Model

 Candidate Key -
 Example of employee table, EMPLOYEE_ID is best suited for primary
key as its from his own employer. Rest of the attributes like passport
number, SSN, license Number etc are considered as candidate key.
Basic Terminology used in Relational Model

 Composite Key

Key that consist of two or more attributes that uniquely identify an entity
occurance is called Composite key. But any attribute that makes up the
Composite key is not a simple key in its own.
Basic Terminology used in Relational Model

 Foreign key -

A foreign key is a field (or collection of fields) in one table that uniquely
identifies a row of another table or the same table. In simpler words, the
foreign key is defined in a second table, but it refers to the primary key
in the first table.
Basic Terminology used in Relational Model
 Secondary or Alternative key

The candidate key which are not selected for primary key are known as
secondary keys or alternative keys

 Non-key Attribute

Non-key attributes are attributes other than candidate key attributes in a


table.

 Non-prime Attribute

Non-prime Attributes are attributes other than Primary attribute.


Basic Terminology used in Relational Model
 Constraints- Every relation has some conditions that must hold for it to
be a valid relation. These conditions are called Relational Integrity
Constraints. There are three main integrity constraints −
 Key constraints
 Domain constraints
 Referential integrity constraints

 Key Constraints - There must be at least one minimal subset of


attributes in the relation, which can identify a tuple uniquely. This
minimal subset of attributes is called key for that relation. If there are
more than one such minimal subsets, these are called candidate keys.
Key constraints force that −
 in a relation with a key attribute, no two tuples can have identical
values for key attributes.
 a key attribute can not have NULL values.

 Key constraints are also referred to as Entity Constraints.


Basic Terminology used in Relational Model
 Domain Constraints - Attributes have specific values in real-world
scenario. For example, age can only be a positive integer. The same
constraints have been tried to employ on the attributes of a relation.
Every attribute is bound to have a specific range of values. For
example, age cannot be less than zero and telephone numbers cannot
contain a digit outside 0-9.

 Referential integrity Constraints - Referential integrity constraints


work on the concept of Foreign Keys. A foreign key is a key attribute of
a relation that can be referred in other relation.

 Referential integrity constraint states that if a relation refers to a key


attribute of a different or same relation, then that key element must
exist.

 a foreign key must have a matching primary key or it must be null


Basic Terminology used in Relational Model
 Example: Customer table contains the information about the customers
with CNO as the primary key. The Cutomer_Loan table stores the
information about CNO, LNO and AMOUNT. It has the primary key
combination of CNO and LNO. Here, CNO also acts as the foreign key
and refers to CNO of Customer table.
Codd's 12 Rules

 Dr Edgar F. Codd, after his extensive research on the Relational Model


of database systems, came up with twelve rules. These rules can be
applied on any database system that manages stored data using only
its relational capabilities. This is a foundation rule, which acts as a base
for all the other rules.

 Rule 1: Information Rule - The data stored in a database, may it be


user data or metadata, must be a value of some table cell. Everything
in a database must be stored in a table format.

 Rule 2: Guaranteed Access Rule - Every single data element (value)


is guaranteed to be accessible logically with a combination of table-
name, primary-key (row value), and attribute-name (column value). No
other means, such as pointers, can be used to access data.
Codd's 12 Rules

 Rule 3: Systematic Treatment of NULL Values - The NULL values in


a database must be given a systematic and uniform treatment. This is a
very important rule because a NULL can be interpreted as one the
following − data is missing, data is not known, or data is not applicable.

 Rule 4: Active Online Catalog - The structure description of the entire


database must be stored in an online catalog, known as data
dictionary, which can be accessed by authorized users. Users can use
the same query language to access the catalog which they use to
access the database itself.

 Rule 5: Comprehensive Data Sub-Language Rule - A database can


only be accessed using a language having linear syntax that supports
data definition, data manipulation, and transaction management
operations. This language can be used directly or by means of some
application. If the database allows access to data without any help of
this language, then it is considered as a violation.
Codd's 12 Rules

 Rule 6: View Updating Rule - All the views of a database, which can
theoretically be updated, must also be updatable by the system.

 Rule 7: High-Level Insert, Update, and Delete Rule - A database


must support high-level insertion, updation, and deletion. This must not
be limited to a single row, that is, it must also support union,
intersection and minus operations to yield sets of data records.

 Rule 8: Physical Data Independence - The data stored in a database


must be independent of the applications that access the database. Any
change in the physical structure of a database must not have any
impact on how the data is being accessed by external applications.
Codd's 12 Rules
 Rule 9: Logical Data Independence - The logical data in a database
must be independent of its user’s view (application). Any change in
logical data must not affect the applications using it. For example, if two
tables are merged or one is split into two different tables, there should
be no impact or change on the user application. This is one of the most
difficult rule to apply
 Rule 10: Integrity Independence - A database must be independent of
the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the
application. This rule makes a database independent of the front-end
application and its interface.
 Rule 11: Distribution Independence - The end-user must not be able
to see that the data is distributed over various locations. Users should
always get the impression that the data is located at one site only. This
rule has been regarded as the foundation of distributed database
systems.
 Rule 12: Non-Subversion Rule - If a system has an interface that
provides access to low-level records, then the interface must not be
able to subvert the system and bypass security and integrity
constraints.
Introduction of Database Normalization

Database normalization is the process of organizing the attributes of the database to


reduce or eliminate data redundancy (having the same data but at different
places) .

Problems because of data redundancy


Data redundancy unnecessarily increases the size of the database as the same data is
repeated in many places. Inconsistency problems also arise during insert, delete and
update operations.

Functional Dependency

Functional Dependency is a constraint between two sets of attributes in relation to a


database. A functional dependency is denoted by an arrow (→). If an attribute A
functionally determines B, then it is written as A → B.

For example, employee_id → name means employee_id functionally determines the


name of the employee. As another example in a timetable database, {student_id, time} →
{lecture_room}, student ID and time determine the lecture room where the student
should be.
What does functionally dependent mean?
A function dependency A → B means for all instances of a particular value of A, there is
the same value of B. For example in the below table A → B is true, but B → A is not true
as there are different values of A for B = 3.
A B
------
1 3
2 3
4 0
1 3
4 0
Trivial Functional Dependency
X → Y is trivial only when Y is subset of X.
Examples

ABC → AB
ABC → A
ABC → ABC
Non Trivial Functional Dependencies
X → Y is a non trivial functional dependency when Y is not a subset of X.
X → Y is called completely non-trivial when X intersect Y is NULL.
Example:
Id → Name,
Name → DOB
Semi Non Trivial Functional Dependencies
X → Y is called semi non-trivial when X intersect Y is not NULL.
Examples:

AB → BC,
AD → DC
Combine Schemas?
 Suppose we combine instructor and department into inst_dept
 (No connection to relationship set inst_dept)
 Result is possible repetition of information
A Combined Schema Without Repetition
 Consider combining relations
 sec_class(sec_id, building, room_number) and
 section(course_id, sec_id, semester, year)
into one relation
 section(course_id, sec_id, semester, year,
building, room_number)
 No repetition in this case
What About Smaller Schemas?
 Suppose we had started with inst_dept. How would we know to split up
(decompose) it into instructor and department?
 Write a rule “if there were a schema (dept_name, building, budget), then
dept_name would be a candidate key”
 Denote as a functional dependency:

dept_name  building, budget


 In inst_dept, because dept_name is not a candidate key, the building
and budget of a department may have to be repeated.
 This indicates the need to decompose inst_dept
 Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary) into
employee1 (ID, name)
employee2 (name, street, city, salary)
 The next slide shows how we lose information -- we cannot reconstruct
the original employee relation -- and so, this is a lossy decomposition.
A Lossy Decomposition
Example of Lossless-Join Decomposition

 Lossless join decomposition


 Decomposition of R = (A, B, C)
R1 = (A, B)
R2 = (B, C)

A B C A B B C
 1 A  1 1 A
 2 B  2 2 B
r A,B(r) B,C(r)

A B C
A (r) B (r)
 1 A
 2 B
Functional Dependencies
 Constraints on the set of legal relations.
 Require that the value for a certain set of attributes determines
uniquely the value for another set of attributes.
 A functional dependency is a generalization of the notion of a
key.
Functional Dependencies (Cont.)
 Let R be a relation schema
R and   R
 The functional dependency

holds on R if and only if for any legal relations r(R), whenever any
two tuples t1 and t2 of r agree on the attributes , they also agree
on the attributes . That is,
t1[] = t2 []  t1[ ] = t2 [ ]
 Example: Consider r(A,B ) with the following instance of r.

1 4
1 5
3 7
 On this instance, A  B does NOT hold, but B  A does hold.
Functional Dependencies (Cont.)
 K is a superkey for relation schema R if and only if K  R
 K is a candidate key for R if and only if
 K  R, and
 for no   K,   R
 Functional dependencies allow us to express constraints that cannot be
expressed using superkeys. Consider the schema:
inst_dept (ID, name, salary, dept_name, building, budget ).
We expect these functional dependencies to hold:
dept_name building
and ID  building
but would not expect the following to hold:
dept_name  salary
Use of Functional Dependencies
 We use functional dependencies to:
 test relations to see if they are legal under a given set of functional
dependencies.
 If a relation r is legal under a set F of functional
dependencies, we say that r satisfies F.
 specify constraints on the set of legal relations
 We say that F holds on R if all legal relations on R satisfy the set
of functional dependencies F.
 Note: A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on all legal
instances.
 For example, a specific instance of instructor may, by chance,
satisfy
name  ID.
Functional Dependencies (Cont.)
 A functional dependency is trivial if it is satisfied by all instances of a
relation
 Example:
 ID, name  ID
 name  name
 In general,    is trivial if   
Closure of a Set of Functional
Dependencies
 Given a set F of functional dependencies, there are certain
other functional dependencies that are logically implied by F.
 For example: If A  B and B  C, then we can
infer that A 
C
 The set of all functional dependencies logically implied by F is
the
closure of F.
 We denote the closure of F by F+.
 F+ is a superset of F.
Closure of a Set of Functional
Dependencies
 Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
 For e.g.: If A  B and B  C, then we can infer that A  C
 The set of all functional dependencies logically implied by F is the
closure of F.
 We denote the closure of F by F+.
Closure of a Set of Functional
Dependencies
 We can find F+, the closure of F, by repeatedly applying
Armstrong’s Axioms:
 if   , then    (reflexivity)
 if   , then      (augmentation)
 if   , and   , then    (transitivity)
 These rules are
 sound (generate only functional dependencies that actually hold),
and
 complete (generate all functional dependencies that hold).
Example
 R = (A, B, C, G, H, I)
F={AB
AC
CG  H
CG  I
B 
H}
 some
members of
F+
 AH
 by
transi
tivity
from
A 
B
and
B 
H
Closure of Functional Dependencies
(Cont.)
 Additional rules:
 If    holds and    holds, then     holds
(union)
 If     holds, then    holds and    holds
(decomposition)
 If    holds and     holds, then    
holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.
Closure of Attribute Sets
 Given a set of attributes  define the closure of  under F (denoted
by +) as the set of attributes that are functionally determined by 
under F

 Algorithm to compute +, the closure of  under F

result := ;
while (changes to result) do
for each    in F do
begin
if   result then result := result 

end
Example of Attribute Set Closure
 R = (A, B, C, G, H, I)
 F = {A  B
AC
CG  H
CG  I
B  H}
 (AG)+
1. result =
2. AG
result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG 
AGBCH)
 Is AG a candidate key?
1. Is AG a super key?
1. Does AG  R?
== Is (AG)+ 
R
2. Is any subset of AG a superkey?
1. Does A  R? == Is (A)+  R
Canonical Cover
 Sets of functional dependencies may have redundant dependencies
that can be inferred from the others
 For example: A  C is redundant in: {A  B, B  C,
A C}
 Parts of a functional{Adependency
 E.g.: on RHS:  B, B  may beredundant
C, A CD} can be simplified
to
{A  B, B  C, A  D}
 E.g.: on LHS: {A  B, B  C, AC  D} can be simplified
to
{A  B, B  C,
A  D}
 Intuitively, a canonical cover of F is a “minimal” set of functional
dependencies equivalent to F, having no redundant dependencies or
redundant parts of dependencies
Canonical Cover
 A canonical cover for F is a set of dependencies Fc such that
 F logically implies all dependencies in Fc, and
 Fc logically implies all dependencies in F, and
 No functional dependency in Fc contains an extraneous attribute, and
 Each left side of functional dependency in Fc is unique.
 To compute a canonical cover for F:
repeat
Use the union rule to replace any dependencies in F
1  1 and 1  2 with 1  1 2
Find a functional dependency    with an
extraneous attribute either in  or in 
/* Note: test for extraneous attributes done
using Fc, not F*/
If an extraneous attribute is found, delete it from   
until F does not change
 Note: Union rule may become applicable after some extraneous attributes
have been deleted, so it has to be re-applied
Multivalued Dependencies
 Suppose we record names of children, and phone numbers for
instructors:
 inst_child(ID, child_name)
 inst_phone(ID, phone_number)
 If we were to combine these schemas to get
 inst_info(ID, child_name, phone_number)
 Example data:
(99999, David, 512-555-1234)
(99999, David, 512-555-4321)
(99999, William, 512-555-1234)
(99999, William, 512-555-4321)
 This relation is in BCNF
 Why?
MVD (Cont.)
 Tabular representation of   
Example (Cont.)
 In our example:

ID  child_name
ID  phone_number
 The above formal definition is supposed to formalize the notion that given
a particular value of Y (ID) it has associated with it a set of values of Z
(child_name) and a set of values of W (phone_number), and these two
sets are in some sense independent of each other.
 Note:
 If Y  Z then Y  Z
 Indeed we have (in above notation) Z1 = Z2
The claim follows.
Use of Multivalued Dependencies
 We use multivalued dependencies in two ways:

1. To test relations to determine whether they are legal under a


given set of functional and multivalued dependencies
2. To specify constraints on the set of legal relations. We shall
thus concern ourselves only with relations that satisfy a given
set of functional and multivalued dependencies.
 If a relation r fails to satisfy a given multivalued dependency, we can
construct a relations r that does satisfy the multivalued dependency
by adding tuples to r.
NORMALIZATION:
• Major aim of relational database design is to group attributes into
relations to minimize data redundancy and reduce file storage
space required by base relations.
• How functional dependencies can be used to group attributes into
relations that are in a known normal form.

Normalization is the process of reducing /decomposing a relation


into a set of small relations free from data redundancy and
ensuring data integrity.
Defined as a step-by-step process of decomposing a
complex relation into a simple and stable data structure.
Objectives
• Problems associated with redundant data.
• Identification of various types of update anomalies such as
insertion, deletion, and modification anomalies.

I. Insertion anomalies: It may be impossible to store certain information without


storing some other, unrelated information.
II. Deletion anomalies: It may be impossible to delete certain information without
losing some other, unrelated information.

III. Update anomalies: If one copy of such repeated data is updated, all copies need to
be updated to prevent inconsistency.

• Increasing storage requirements: The storage requirements may


increase over time.
Successive normal forms of a relation:
• There are successive higher normal forms such as 2NF, 3NF,
BCNF, 4NF, 5NF.
• Each normal form is an improvement over the earlier form.
• A higher normal form relation is a subset of lower normal
form.
SID CID S_name C_name Grade F_Name F_phone

IS318, Vishal, 60192,


1 Anand Database, EC A,B
IS301 Pradipkumar 45869

2 IS318 Vijay Database A Vishal 60192

3 IS318 Shrikant Database B Vishal 60192

IS301, EC, Pradipkumar 45869,


4 Vinay A,B
Database 60192
, Vishal
IS318
1NF
• The relation is said to be in 1NF if and only if, it. It contains no
repeating attributes or groups attributes.
• The relation/table is said to be in INF when each cell of the
tuble/relation contains precisely one value (atomic)

• A relation schema is in 1NF :


– if and only if all the attributes of the relation R are atomic in
nature.
– Atomic: the smallest level to which data may be broken
down and remain meaningful
2NF
A Relation is said to be in Second Normal Form if and only if :
– It is in the First normal form, and
– No partial dependency exists between non-key attributes and
key attributes.

A relation schema R is in 2NF if every non prime attribute A in R is


fully functionally, dependent on every key of R.
3NF
A relation R is said to be in the Third Normal Form (3NF) if and only if
−It is in 2NF and
−No transitive dependency exists between non-key attributes and
key attributes.

A relation is in 3NF if and only if it is in 2NF and there are no transitive


functional dependency.
BCNF
A relation is said to be in Boyce Codd Normal Form (BCNF)
- if and only if all the determinants are candidate keys.
BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.
Normal Form Basic Motivation

1NF Removing non-atomicity

2NF Removing partial dependency (Part of key attribute →


Non-key attribute)

3NF Removing transitive dependency

BCNF Removing any kind of redundancy


Convert table in Normal forms
 SID = Student ID, S_Name= Student Name,

CID = Course ID, C_Name = Course


Name,
Grade = Student’s Grade in Course Faculty = Faculty Name,
F_Phone = Faculty Phone

Functional Dependencies are:


SID → S_name
SID and CID → Grade
CID → C_name
CID → F_Name
F_Name → F_phone
SID CID S_name C_name Grade F_Name F_phone

IS318, Vishal, 60192,


1 Anand Database, EC A,B
IS301 Pradipkumar 45869

2 IS318 Vijay Database A Vishal 60192

3 IS318 Shrikant Database B Vishal 60192

IS301, EC, Pradipkumar 45869,


4 Vinay A,B
Database 60192
, Vishal
IS318
1NF

SID CID S_name C_name Grade Faculty F_phone

1 IS318 Anand Database A Vishal 60192

1 IS301 Anand Program B Pradipku 45869


mar

2 IS318 Vijay Database A Vishal 60192

3 IS318 Shrikant Database B Vishal 60192

4 IS301 Vinay Program A Pradipku 45869


mar

4 IS318 Vinay Database B Vishal 60192


2NF
• SID and CID → Grade

SID CID Grade

1 IS318 A

1 IS301 B

2 IS318 A

3 IS318 B

4 IS301 A

4 IS318 B
SID → S_name
SID S_name

1 Anand

2 Vijay

3 Shrikant

4 Vinay

CID → C_name
CID C_name Faculty F_phone
CID → F_Name
IS318 Database Shrikant 60192
IS301 Program Vikram 45869

IS318 Database Shrikant 60192

IS318 Database Shrikant 60192

IS301 Program Vikram 45869


IS318 Database Shrikant 60192
3NF
SID CID Grade
1 IS318 A
1 IS301 B
2 IS318 A

3 IS318 B
4 IS301 A
4 IS318 B

SID S_name

1 Anand

2 Vijay

3 Shrikant

4 Vinay
CID C_name FID

IS318 Database 1

IS301 Program 2

IS318 Database 1

IS318 Database 1

IS301 Program 2 FID Faculty F_phone


IS318 Database 1
1 Vishal 60192

2 Pradipkumar 45869

Final table list in 3NF:

Grade (SID,CID,Grade)
Student (SID, S_name)
Course (CID, C_name, FID)
Faculty (FID, Faculty,
F_phone)

You might also like