0% found this document useful (0 votes)
520 views23 pages

Unit 2 - Dbms-Complete

The document discusses key concepts in relational databases including: 1. A relational database represents data as a collection of relations where each relation is a table of values. A relation schema defines the structure of a relation including its attributes. 2. Constraints such as domain constraints, key constraints, and referential integrity constraints are used to maintain consistency in the database. Domain constraints specify valid values for attributes. Key constraints require each tuple to have a unique key value. Referential integrity constraints require foreign keys to match values in the referenced relation. 3. Relational algebra provides basic operations to manipulate relations, including unary operations like select and project to retrieve subsets of tuples from a single relation.

Uploaded by

arumobileworld
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
520 views23 pages

Unit 2 - Dbms-Complete

The document discusses key concepts in relational databases including: 1. A relational database represents data as a collection of relations where each relation is a table of values. A relation schema defines the structure of a relation including its attributes. 2. Constraints such as domain constraints, key constraints, and referential integrity constraints are used to maintain consistency in the database. Domain constraints specify valid values for attributes. Key constraints require each tuple to have a unique key value. Referential integrity constraints require foreign keys to match values in the referenced relation. 3. Relational algebra provides basic operations to manipulate relations, including unary operations like select and project to retrieve subsets of tuples from a single relation.

Uploaded by

arumobileworld
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

UNIT II
CHAPTER 4
RELATIONAL DATA MODEL AND
RELATIONAL DATABASE CONSTRAINTS

4.1 Relational Model Concepts:


The relational model represents the database as a collection of relations. A relation is a
table of values where each row represents a collection of data values.

4.1.1 Domain: A domain D is a set of atomic values. Each value in the domain is indivisible. A
common method of specifying a domain is to specify a data type from which the data values
forming the domain are drawn.

Example:
Set of telephone numbers- Numbers
Set of student names- Character strings representing names of students
They are logical definitions of domains. A data type or format is specified for each domain.

4.1.2 Relation Schema: A relation schema R(A1,A2,….,An) is made up of a relation R and a list
of attributes A1,A2,….,An. Each attribute Ai is the name of the role played by some domain D in
the relation schema R. D is called the domain of Ai and is denoted by dom(Ai). A relation
schema is used to describe a relation. The degree of a relation is the number of attributes in the
relation schema. A relation or a relation state r of a relation schema R(A1,A2,….,An), is also
denoted by r(R) is a set of n-tuples r={t1,t2,…,tn}. Each n-tuple is an ordered list of values
t=<v1,v2,….,vn> where each value vi is 1<=i<=n is an element of dom(Ai) or a special null
value.

Tuple: Each row in a relation is called a tuple.


Attribute: The column header in a relation is called an attribute of a relation.

4.1.3 Characteristics of Relations:


1. Ordering of tuples in a relation: A relation is defined as a set of tuples. Mathematically,
elements of a set do not have any order. Hence the tuples in a relation do not have any
order. When they are stored physically on a disk, there is always an order. Hence when a
table is displayed, it is displayed in the order it is stored. But, the order can be explicitly
mentioned, when the relation is displayed.
2. Ordering of values within a tuple: A n-tuple relation is an ordered list of n-values.
Hence ordering of values in a tuple is important.
Relation: A relation schema R={A1,A2,…,An} is a set of attributes and the relation state
r(R) is a finite set of mappings r={t1,t2,…,tm} where each tuple ti is a mapping from R
to D. Here D is the union of attributes domains. i.e.
D=dom(A1)Udom(A2)U….Udom(An). Each mapping ti is called a tuple. A tuple can be
considered as a set of (<attribute>,<value>) pairs where each pair gives a value of a
mapping from attribute Ai to a value vi from dom(Ai)
3. Values and nulls in a tuple: Each value in a tuple is an atomic value. This model is
called a flat relation. Hence composite and multi-valued attribute are not allowed. Multi-
valued attributes must be represented by a separate relation. Null values are used to
represent the values of attributes that may be unknown or may not be applied to a tuple.

4.2 Relational Model Constraints and Database Schemas :


Constraints are restrictions imposed on the actual values in the database state. Constraints
on the database can be divided into three categories.
 Model based constraints: These are constraints that are inherent in the data model.
 Schema based constraints: These are constraints that can be directly expressed in the
schemas of the data model, typically by specifying them in the DDL
 Application based constraints: These are constraints that are expressed and enforced
in application programs.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 1


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

4.2.1 Schema Based constraints: These constraints include


a. Domain constraint
b. Key constraints
c. Constraints as null
d. Entity Integrity constraint
e. Referential Integrity constraint

Domain Constraint: This constraint specifies that within each tuple the value of each
attribute A must be an atomic value from the domain dom(A). The data types for domain
include standard numeric data types for integer and real values, fixed length and variable
length strings.

Key Constraints: A relation is a set of tuples. By definition all the elements in a set are
distinct.
No two tuples can have the same combination of values for all attributes.
Let SK be a sub set of attributes of R with the property that no two tuples have the same
combination of values.
Let t1,t2 be two distinct tuples. Then, t1[SK]t2[SK].
Such set of attributes is called a super key of the relation schema R. A super key SK
specifies uniqueness constraint that no two tuples in the state r of R can have the same value
for SK. Every relation has at least one default super key- the set of all its attributes
Candidate Key: A relation schema can have more than one key. Each of the key is called a
candidate key.
Primary key: A candidate key whose values are used to identify tuples in a relation is called
a primary key.

Constraints as null: This is a constraint on attributes that specifies whether null values are
permitted or not. E.g. If every student tuple must have a valid non-null value for name
attribute, then the name attribute is constrained as not null.
Entity Integrity Constraint: It states that no primary key value can be null. This is because
primary keys are used to identify the tuples in a relation.

Referential Integrity Constraint: The referential integrity constraint is specified between


two relations and is used to maintain consistency among tuples in two relations. This
constraint specifies that a tuple in one relation that refers to another relation must refer to an
existing tuple in that relation. The concept of foreign key is used to define referential integrity
Foreign Key: Let R1, R2 be two relation schemas. A set of attributes FK in a relation schema
R1 is a foreign key of R1 that references a relation R2 if it satisfies the following rules
 The attributes in FK have the same domains as primary key attributes PK of R2. The
attributes FK are said to refer to the relation R2.
 A value of FK in a tuple t1 of current state r1(R1) either occurs as a value of PK for some
tuple t2 in the current state r2(R2) or is null. Then we have, t1[FK]=t2[PK]. We say that
t1 refers to the tuple t2.
Here R1 is called the referencing relation and R2 is called the referenced relation. If
these conditions hold, a referential integrity constraint from R1 to R2 is said to hold.

1.3 Operations on Relations: Violation of Constraints

4.3.1 Insert operation: It provides a list of attribute values for a new tuple that is to be inserted
into R. Insert operation can violate the following constraints:
 Domain constraint can be violated if an attribute value does not match the specified
domain.
 Key constraint can be violated if a key value in the new tuple t already exists in another
tuple in the relation r(R).
 Entity integrity constraint can be violated if the primary key of the new tuple t is null
 Referential integrity constraint can be violated if the value of any foreign key in t refers
to a tuple that does not exist in the referenced relation.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 2


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

4.3.2 Delete Operation: This operation deletes a tuple from a relation.


 It can violate only referential integrity constraint. This occurs in case the tuple being
deleted is referenced by foreign key from other tuples in the database.

4.3.3 Update Operation: This operation is used to modify / change values of one or more
attributes in tuple(s) in a relation R. It is necessary to specify a condition on attributes of a
relation to select a tuple to be modified. Updating an attribute that is neither a primary key nor a
foreign key creates no problem. Modifying a primary key is similar to deleting a tuple and
inserting another in its place. When updating DBMS checks to confirm the new value is of
correct data type and domain.

Assignment
Short Answers (2 marks)
1. Define primary key
2. Define candidate key
3. Define foreign key
4. Define domain of a relation
5. Define relation schema
6. Define tuple in a relation
7. Define attribute in a relation

Long answers (4 or more marks)


1. Explain Entity and referential integrity constraints.
2. Explain the characteristics of relation
3. Explain the schema based constraints on a relation.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 3


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

CHAPTER 5
RELATIONAL ALGEBRA
The basic set of operations on relational model is the relational algebra. These operations
enable a user to specify basic retrieval requests. The result of retrieval is a new relation that is
formed from one or more relations

5.1 Unary Relational Operations:


The operations that operate on single relation are called unary operations. The unary
operations in relational algebra are
a) Select b) Project

5.1.1 SELECT Operation: The select operation is used to select a subset of tuples from a
relation that satisfy a selection condition. It can be considered as a filter that keeps only those
tuples that satisfy a qualifying condition. The select operation can be visualized as a horizontal
partition of the relation into two sets of tuples- those that satisfy the condition are selected and
those that do not satisfy the condition are discarded. Select operation is denoted by
<select condition>(R)
E.g. To select employee tuples whose department is 4, it can be specified as follows:
DNO=4(EMPLOYEE)
The boolean expression specified in the selection condition is made up of a number of clauses of
the form
<attribute name> <comparison operator> <constant value>
OR
<attribute name> <comparison operator> <attribute name>
The comparison operators can be any of the elements in the set {, , , , , }
Two or more selection conditions can be combined using boolean operators like AND, OR and
NOT.
Condition 1 AND condition2 is true only if both condition1 and condition2 are true, otherwise it
is false.
Condition 1 OR condition2 is true only if either of the condition is true, otherwise it is false.
NOT condition is true if condition is false and false otherwise
The select operation is commutative
i.e. condition 1 (condition 2 (R))=condition 2condition1 (R))
A sequence of selects can be applied in any order. We can also combine a cascade of select
operations into a single select operation with an AND condition
i.e. condition 1 (condition 2 (…..condition n (R))= condition 1AND condition2 AND condition 3…..condition n(R)

5.1.2 PROJECT Operation: Project operation selects certain columns from the table and
discards other columns. It can be visualized as vertical partitioning of relation into two relations
– one has the required attributes that contains the result of the operation and the other contains
discarded columns. Project operations is denoted by
<attribute list> (R)
Here, <attribute list> is the desired list of attributes that are to be projected from the relation R

For example, if the name and salary of an employee is to be listed, it can be written as follows:
NAME,SAL (EMPLOYEE)
The number of tuples in a relation resulting from project operation is always less than or equal to
the total number of tuples in R.

5.2 Set Theory Operations:


Standard mathematical operations on sets can also be applied to relational algebra. The
three operations in this category are
a) UNION b) INTERSECTION c)MINUS

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 4


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

Union Compatibility:
These are binary operations and are applied on two relations. But, the relations must have same
type of tuples. Two relations R(A1,A2,….,An) and S(B1,B2,…,Bn) are said to be union
compatible if they have the same degree n and if dom(Ai)=dom(Bi) for 1 i  n
i.e. Two relations should have the same number of attributes and each corresponding pair of
attributes should have the same domain. This condition is known as union compatibility.
Example: Two union Compatible relations
STUDENT INSTRUCTOR
FNAME LNAME FNAME LNAME
Suresh Rao Sachin Kumar
Ramesh Krishna Rohit Sharma
Ravi Reddy Ravi Reddy
Vipul Kumar
Vinay Kumar
Sachin Kumar
Let R and S be two union compatible relations. The given set of binary operations can be defined
as follows:
5.2.1 UNION – The result of this operation is denoted by RS. The result is a relation that
includes all tuples that are in R or in S or in both R and S. Duplicate tuples are eliminated.

Example: STUDENT  INSTRUCTOR

FNAME LNAME
Suresh Rao
Ramesh Krishna
Ravi Reddy
Vipul Kumar
Vinay Kumar
Sachin Kumar
Rohit Sharma

5.2.2 INTERSECTION – The result of this operation is denoted by RS. It results in a relation
that contains all tuples that are present in both R and S

Example: STUDENT  INSTRUCTOR


FNAME LNAME
Ravi Reddy
Sachin Kumar

5.2.3 SET DIFFERENCE – The result of this operation is denoted by R-S. It is a relation that
includes all tuples that are in R but not in S.
Examples:

a) STUDENT-INSTRUCTOR
FNAME LNAME
Suresh Rao
Ramesh Krishna
Vipul Kumar
Vinay Kumar

b) INSTRUCTOR-STUDENT
FNAME LNAME
Rohit Sharma

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 5


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

5.2.4 Cartesian Product:


It is also called cross product or cross join and is denoted by  . This is a binary set
operation. It is used to combine relation in a combinatorial fashion. If a relation R has n tuples
and relation S has am tuples, the total number of tuples in RS is mn. It creates tuples with
combined attributes of two relations. We can select only related tuples from two relations by
specifying appropriate selection conditions.

5.3 Binary Relational Operators:

5.3.1 Join Operation:


The join operation denoted by is used to combine related tuples from two relations into a
single tuple. This operation is used to process relationships among relations. The general form of
join operation on two relations R(A1,A2,…,An) and S(B1,B2,B3…,Bm) is R (join condition) S
The result of the above operation is a relation Q with n+m attributes
Q(A1,A2…,An,B1,B2,….,Bm). Q has one tuple for each combination of tuples one from R and
one from S- whenever the combination satisfies the join condition. The main difference between
Cartesian product and join is that in Cartesian product all combinations of tuples are included
whereas in the latter case the result contains tuples that satisfy the join condition. The different
types of join are
a) Equi Join b) Natural Join c) Theta Join

a) Equi-Join – When join conditions involve only equality comparisons on the attributes of
2 tables ,then, such joins are called equi-joins. The result of equi-join operation has
always one or more pair of attributes that are identical in every tuple.
EMP
EMPNO ENAME DNO
E001 RAMA D001
E002 GITA D002
E003 RITA D001

DEPT
DNUM DNAME
D001 HR
D002 SALES

EMP⋈ DEPT
EMPNO ENAME DNO DNUM DNAME
E001 RAMA D001 D001 HR
E002 GITA D002 D002 SALES
E003 RITA D001 D001 HR

b) Natural Join- In order to eliminate identical values in every tuple, we use natural joins.
The definition of natural join requires that the two join attributes have the same name in both
the relations. In case the two attributes do not have the same name, renaming is done.
EMP
EMPNO ENAME DNAME
E001 RAMA HR
E002 GITA SALES
E003 RITA HR

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 6


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

DEPT
DNAME DLOCATION
HR CHENNAI
SALES KANNUR

EMP⋈ DEPT
EMPNO ENAME DNAME DLOCATION
E001 RAMA HR CHENNAI
E002 GITA SALES KANNUR
E003 RITA HR CHENNAI

c) Theta Join – When the join conditions involve all comparison operators on the attributes
i.e.{<,>,<=,>=,!=} then, such a join is called theta join.
CAR
CarModel CarPrice
CarA 20,000
CarB 30,000
CarC 50,000

BOAT
BoatModel BoatPrice
Boat1 10,000
Boat2 40,000
Boat3 60,000

CAR⋈CARPRICE>=BOATPRICE BOAT
CarModel CarPrice BoatModel BoatPrice
CarA 20,000 Boat1 10,000
CarB 30,000 Boat1 10,000
CarC 50,000 Boat1 10,000
CarC 50,000 Boat2 40,000

Outer Join Operations:


The join operations where only matching tuples are kept in the result are called inner
joins. In inner joins, tuples are without a match and those with null values are eliminated. This
amounts to loss of information if the result of the join is suppose to contain all the information. A
set of operations called outer join can be used when we need all the tuples in R or all those in S
or all those both in R and S, regardless of whether they have matching tuples in the other
relations. This will satisfy the need of the query in which tuples from both the tables are to be
combined by matching corresponding rows but without losing any tuples for lack of matching
values. The different types of outer joins are
a. Left Outer Join
b. Right Outer Join
c. Full Outer Join

Left Outer Join ⟕: This operation keeps all the tuples in the first or left relation (R x S) i..e. R.
If no match is found in S then the join result pads the attribute with null values.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 7


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

Employee
Name EmpId DeptName
Hari 3415 Finance
Samit 2241 Sales
Geetha 3401 Finance
Haritha 2202 Sales
Tom 1123 Executive

Dept
DeptName Manager
Sales Haritha
Production Charles

Employee ⟕ Dept
Name EmpId DeptName Manager
Hari 3415 Finance NULL
Samit 2241 Sales Haritha
Geetha 3401 Finance NULL
Haritha 2202 Sales Haritha
Tom 1123 Executive NULL

Right Outer Join(⟖): This operation keeps all the tuples in the second or right relation (R x S)
i..e. S. If no match is found in R then the join result pads the attribute with null values.
Employee
Name EmpId DeptName
Hari 3415 Finance
Samit 2241 Sales
Geetha 3401 Finance
Haritha 2202 Sales
Tom 1123 Executive

Dept
DeptName Manager
Sales Haritha
Production Charles

Employee ⟖ Dept
Name EmpId DeptName Manager
Samit 2241 Sales Haritha
Haritha 2202 Sales Haritha
NULL NULL Production Charles

Full Outer Join(⟗): This operation keeps all the tuples in both the left and right relations.
When no match is found in either relations, the corresponding attributes are padded with null
values as needed.
Employee
Name EmpId DeptName
Hari 3415 Finance
Samit 2241 Sales
Geetha 3401 Finance
Haritha 2202 Sales
Tom 1123 Executive
Dept
DeptName Manager
Sales Haritha
Production Charles

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 8


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

Employee ⟗ Dept
Name EmpId DeptName Manager
Hari 3415 Finance NULL
Samit 2241 Sales Haritha
Geetha 3401 Finance NULL
Haitha 2202 Sales Haritha
Tom 1123 Executive NULL
NULL NULL Production Charles
5.3.2 Division Operator:
The division operation is denoted by  and is useful for special kind of query that
sometimes occur in database operation. The division operator is applied to two relations R(Z)
S(X) where XZ. For a tuple t to appear in the result T of the division, the values in t must
appear in R in combination with every tuple t in S.
Example: T=RS

R S T
Empno Pno Empno Pno
101 1 101 1
102 1 102 4
103 1 103
104 1
101 2
103 2
102 3
103 3
104 3
101 4
102 4
103 4

Summary of Relational Algebra Operations:

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 9


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

5.4 Additional Relational Operations:


5.4.1 Aggregate Functions and Grouping
Functions that are used on a collection of values from the database are called aggregate
functions. Example: Retrieving average salary or total salary of employees. The most common
functions applied to collection of numeric values are SUM, AVERAGE, MINIMUM,
MAXIMUM, COUNT.
Aggregate functions are defined using the operator  and specifies the following request.
<grouping attributes>  <function list> (R)
1 2
where 1: list of attributes of relation specified and
2: list of (function, attribute)

Example: To retrieve the department number, number of employees in the department and their
average salary, we can write it as
R(DNO,NO-OF-EMP,AVG-SAL) (DNO  COUNTENO,AVERAGE SAL(EMPLOYEE))

Assignment
Short Answers (2 marks)
1. What is the purpose of SELECT operation in relational algebra?
2. What is the purpose of PROJECT operation in relational algebra?
3. List the set theory operations in relational algebra.
4. What is union compatibility?
5. What do you mean by outer union operation?

Long answers (4 or more marks)


1. Explain SELECT operation in relational algebra.
2. Explain PROJECT operation in relational algebra.
3. Explain set theory operations in relational algebra.
4. Explain the various types of join operations in relational algebra.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 10


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

CHAPTER 6
FUNCTIONAL DEPENDENCIES AND NORMALIZATION
6.1 Functional Dependency:
Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y
in R X→Y if and only if each value of X is associated with one value of Y. Here X is called
determinant and Y is called dependent.
Example: Student table
Rollno Name Sub-code Subject Marks
121 Priya BCA01 DBMS 59
121 Priya BCA02 OOPS 70
122 Rama BCA01 DBMS 80
122 Rama BCA02 OOPS 65
123 Payal BCA03 NETWORKS 70

Here, rollno does not uniquely identify rows in a table, therefore it cannot be a primary key.
Similarly, sub-code does not uniquely identify rows in a table. But, a combination of rollno and
sub_code uniquely identifies a row in the table. Hence (rollno,sub-code) together will be a
primary key in a table

6.2 Normal Forms based on Primary Keys


Normalization: Normalization is a technique used to reduce redundancy in tables. It is a formal
process for deciding which attributes should be grouped together in a relation. It serves as a tool
for validating and improving logical design, so that logical design avoids unnecessary
duplication of data i.e. it eliminates redundancy and promotes integrity.

Normal Forms:
The different forms of normalization that can be applied to relations are as follows:
 First Normal Form 1NF
 Second Normal Form 2NF
 Third Normal Form 3NF
 Boyce-Codd Normal Form BCNF
First Normal Form (1NF)
A relation R is said to be in 1NF if every attribute of R takes only single atomic values. In
order to transform un-normalized table to 1NF we identify and remove repeating groups within a
table.
Example
DEPT
Deptno Deptname DeptLoc
D001 Accounts Chennai
D002 R&D Delhi,Bangalore

The above table is transformed to 1NF as follows:

Deptno Deptname Deptno DeptLoc


D001 Accounts D001 Chennai
D002 R&D D002 Delhi
D002 Bangalore
6.3 Second Normal Form (2NF):
Second normal form is based on functional dependency. A relation is in 2NF if every non-prime
attribute A in R is fully functionally dependent on the primary key of R.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 11


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

EMP-PROJ
ECODE P-NUMBER HOURS ENAME PROJ-NAME PLOCATION
E1

E2

E3

|| 2NF
E1
ECODE P-NUMBER HOURS

E2
ECODE ENAME

E3
P-NUMBER PROJ-NAME PLOCATION

6.4 Third Normal Form (3NF):

Transitive Dependency:
A functional dependency X→Y is a transitive dependency if there is a set of attributes Z in R
such that if X→Y and Y→Z, then, X→Z.

Example:
ECODE ENAME DOB DEPTNO DNAME DMNGR
The above table has transitive dependency. Here, ecode→deptno and deptno→dmngr. Hence,
ecode→dmngr.

Third Normal Form (3NF):


Third Normal form is based on transitive dependency. A relation R is said to be in 3NF if it
satisfies
a) It is fully functionally dependent on every key of R
b) It is non-transitively dependent on every key of R

Example:
EMPDEPT
ECODE ENAME DOB ADDR DEPTNO DNAME DMNGR

3NF
EMP
ECODE ENAME DOB ADDR DEPTNO

DEPT
DEPTNO DNAME DMNGR

6.5 Boyce-Codd Normal Form (BCNF)


A relation is said to be in BCNF if and only if every determinant is a candidate key. The
difference between 3NF and BCNF is that for a functional dependency X→Y, the #NF allows

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 12


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

this dependency if Y is a primary key attribute and A is not a candidate key; whereas in BCNF X
must be a candidate key. Hence, BCNF is stronger than 3NF.
Example:
PRODUCT(PROD_NO,PNAME,PRICE)
Here, PROD_NO →PNAME,PRICE
In the above example, PROD_NO is the candidate key. Hence the above relation is in BCNF

Assignment
Short Answers (2 marks)
1. Define normalization
2. Define 1NF
3. Define functional dependency
4. Define transitive dependency

Long answers (4 or more marks)


1. Explain 1NF with an example
2. Explain 2NF with an example
3. Explain 3NF with an example
4. Explain BCNF with an example

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 13


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

CHAPTER 7
DISK STORAGE
7.1 Introduction:
The collection of data that makes up a computerized database must be physically stored
on a computer storage medium. The DBMS can then retrieve, update and process data as and
when needed. The two main categories of computer storage media are
a. Primary Storage – This category includes storage media that can be directly accessed by
CPU. It provides fast access to data but is of limited storage. Example: RAM, ROM
b. Secondary Storage – Data stored on these devices cannot be directly accessed by the
CPU. It must be first be copied onto the primary storage. These devices have larger
capacity, cost less and provide slower access to data. Example: Magnetic disks, optical
disks and tapes

7.2 Secondary Storage Devices:

7.2.1 Hardware Description of Disk Devices:


Magnetic disks are used to store large amount of data. The most basic unit of data on the
disk is a single bit of information. By magnetizing an area on the disk one can represent a bit
value of either a 0 or 1. To code the information bits are grouped into bytes. The capacity of the
disk is the number of bytes that can be stored in a disk.
Disks are all made up of magnetic material shaped as thin circular disks and protected by
a plastic or acrylic cover. A disk is single sided if it can store information only on one surface
and is double sided if information can be stored on both the surfaces. To increase the storage
capacity, disks are assembled into a disk a pack, which includes many disks.
Information is stored on disk surfaces in concentric circles of small width, each having a
distinct diameter. Each circle is called a track. For disk packs, the tracks with same diameter on
various surfaces is called cylinder. Each track is divided into smaller blocks called sectors. The
division of tracks into sectors is hard coded and cannot be changed. The division of tracks into
equal sized disk blocks is set by the operating system during disk formatting. A disk is a random
access address device. Transfer of data between main memory and disk takes place in units of
disk blocks. The physical address of a block is a combination of cylinder number, track number
and block number. The address of a buffer is a contiguous reserved area in the main storage that
holds one block. For a read operation, the block of data from the disk is copied into buffer. For a
write operation, the block of data from the buffer is copied into disk.
The actual hardware mechanism that reads or writes a block is the read/write head, which
is a part of the disk drive. Read/write heads are attached to s mechanical arm. All arms are
connected to the actuator attached to an electric motor, which moves the read/write head and
positions them over the required tracks. Some disk units have as many read/write heads as there
are tracks on each disk. These are called fixed head disks. Disk heads with an actuator are called
movable head disks. The time required for the disk controller to mechanically position the
read/write head on the correct track is called seek time. Rotational delay/rotational latency is the
time required for the beginning of the desired block to rotate and position itself on the read/write
head. Block transfer time is the time needed to transfer data to/from the buffer and the disk.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 14


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

7.2.2 Magnetic Tapes:


Disks are random access storage devices because an arbitrary disk block may be accessed
at random. Magnetic tapes are sequential access devices. i.e. to access the nth block on tape, we
must first scan the preceding n-1 blocks. A tape drive is used to read data from or to write the
data to a tape reel. A read/write head is used to read/write data onto tape. The main characteristic
of tape is that data blocks can be accessed in sequential order. For this reason the access to data
can be slow and hence are not used for online applications. Tapes are usually used for backing up
the data base.
One reason for backup is to keep the copies of disk files in case the data is lost because of
disk crash. Hence, disk files are periodically copied onto files. Database files that are seldom
used or are outdated, but are required for historical record keeping can be archived on magnetic
tapes.

7.3 Buffering of Blocks:


When several blocks need to be transferred from disk to main memory and all block
addresses are known, several buffers can be reserved in the main memory to speed up the data
transfer. While one buffer is read/written, the CPU can process data in the other buffer. This is
possible because of an independent disk I/O controller that transfers data block from memory to

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 15


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

hard disk independent of and in parallel to CPU processing. Buffering is most useful when
processes can run concurrently in a parallel fashion either because separate disk I/O processor is
available or multiple CPU processors exist.

Double Buffering: The CPU can start processing a block once its transfer to the main memory is
completed. At the same time disk I/O processor can be reading and transferring the next block
into a different buffer. This technique can be used to write a continuous stream of blocks from
memory to disk. It permits continuous reading/writing of blocks thus, eliminating seek time and
rotational delay for all except for the first block transfer.

7.4 Placing File Records on Disk


7.4.1 Files and Records:
Data are usually stored in the form of records. Each record is a collection of related data
values. A collection of field names and their corresponding data type constitute a record type or
record format definition. A file is a sequence of records. A file may contain 2 types of records.
a. Fixed Length Records: If every record in the file has the same size, the file is said to be
made of fixed length records. If different records in a file have different sizes, the file is
said to be made up of variable length records.
b. Variable Length Records: If different records in a file have different sizes then, the file is
said to be made up of variable length records

A file may have variable length records because of the following reasons
 The file records are of same record type but one or more fields may be of varying size.
 The file records are of same record type but , one or more fields may have multiple value
for the same field for individual records. Such fields are called repeating fields.
 A file may contain same record type but the fields may be optional.
 A file may contain records of different record types and hence of varying size.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 16


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

For variable-length fields, each record has a value for each field, but we do not know the
exact length of some fields. To determine the bytes within a particular record that represent each
field, we can use special characters (such as $, %,?) - which do not appear in any field value to
terminate variable length fields. These characters are called separator characters.
The values in the record are stored as <field name, field value> pairs. Separators are used
to separate field name from field value, separate one field from the next and for repeating field.
We can also assign a short field type code – say, an integer number to each field and include the
record sequence as <field-type, field value>pairs. Repeating field needs one separator character
to separate the repeating values of the field and another separator to indicate the termination of
the field.

7.4.2 Record blocking:


Records in a file must be allocated to disk blocks because block is the unit of transfer
between disk and memory. When the block size is larger than the record size, each block will
contain numerous records. Suppose the block size is B bytes, for a file of fixed length records of
size R bytes with B>=R we can fit
bfr=  B/R  records per block. Here (x) is the floor function that rounds own a number x to
an integer. The value of bfr is called the blocking factor for the file. R may not divide B exactly,
so we have some unused space in each block that is equal to B-(bfr*R) bytes. To utilize this
unused space, we can store a part of a record on one block and rest on another. A pointer at the
end of first block points to a block that contains the rest of the record; in case the blocks are not
consecutive. This organization is called spanned because records can span more than one block.
When the record is larger than the block, we use spanned organization. Variable length records
usually use spanned organization. When records are not allowed to cross block boundaries the
organization is unspanned. This is generally used in fixed length records.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 17


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

7.4.3 Allocating File Blocks on disks:


There are several techniques to allocate blocks of files onto disks. They are
a. Contiguous allocation: In this type, the records are allocated consecutive blocks. Reading
the file is fast and easy.
b. Linked allocation: Each file contains a pointer to the next file block. In this type, it is
easy to expand the file but reading the file becomes slow.
A combination of the above two approaches allocates clusters of consecutive disk blocks and
the clusters are linked. Clusters are also called file segments or extents. Another method is
indexed allocation, where one or more index blocks contain pointers to actual file blocks.

7.4.4 File Headers:


A file header or a descriptor contains information about a file that is needed by the
system programs that access file records. It includes information to determine disk addresses of
file blocks, record format description which include field length, order of the fields and field type
codes etc.

7.5 Operation on Files:


The two most common types of operations on files are
1. Retrieval operation – does not change any data but locates certain records to be processed
by the user
2. Update operation- These operations make changes to a file by adding, deleting or
modifying records.
In either case, we have to select one or more records based on the selection condition or
filtering condition. The set of operations a DBMS software performs to process requests
of users are
 Open – prepares the file for either reading or writing
 Reset – sets the file pointer of an open file to the beginning of a file
 Find/Locate – searches the first record that satisfies the search condition.
Transfers the block containing that record into main memory buffer
 Read/Get – copies the current record from buffer to program variable in the user
program
 FindNext – searches for the next record in the file that satisfies the search
condition
 Delete – deletes the current record and updates the file on the disk to reflect
deletion
 Modify – modifies the field values for current record and updates the file on the
disk
 Insert – inserts a new record in the file by locating the block where the record is to
be inserted
 Close – completes file access by releasing the buffers.
All the above operations except open/close are called record-at-time operations because
they apply to single record.
The following are set at time operations that apply to a set of records
 Find All – locates all records that satisfy a given search condition
 Find n – searches for first record that satisfies a given search condition and then
continues to locate next n-1 records
 Find ordered – retrieves all the records in a file in some specified order
 Reorganize – starts reorganizing process. Reorders the file records by sorting
them in a specified order.

7.6 Files of unordered Records (Heap files):


This is the most basic type of organization where records are placed in a order in which
they are inserted. New records are placed at the end of the file. Such files are known as heap files
or pile files. It is used to collect and store records for future use. Inserting a new record is very
efficient – the last disk block of file is copied into buffer ; the new record is added and the block
is rewritten back to the disk. The address of the last file block is kept in the file header.
Searching for a record using any search condition involves linear search through the file block

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 18


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

by block. Deleting a record is also a slow process because we need to first locate the record to be
deleted. The program first find the a block, copies the block into the buffer, then delete the
record from the buffer and finally rewrite the block back to the disk. This results in wastage of
storage space. Another technique is to have an extra bit/byte for each record called the deletion
marker. A record is deleted for a certain value in the deletion marker. A different value for the
marker indicates that the record cannot be deleted. Both the above methods require periodic
reorganization of the file to reclaim the unused space. This organization is also refered to as
sequential file organization.

7.7 Files of Ordered Records (Sorted Files):


The records of a file can be physically ordered based on the values of one of their fields
called the ordering fields. If the ordering field is also a key field, then it will have a unique
value in each record. Such a field is called ordering key of the file. These files are known as
ordered sequential file. Ordered records have the following advantages over unordered files.
1. Reading the records in the ordering of the key values becomes efficient because
no sorting is required.
2. Finding a record in the ordering of the key field requires no additional block
access.
3. A search condition based on the value of the ordering key results in faster access.
Binary search method is used to search records in sorted files. Inserting and deleting a record is
expensive because the records are ordered. To insert a record, we must find a correct position in
the file to insert the record in that position. In order to overcome the above situation two files are
used. A transaction file – unordered file and a master file – ordered file. New records are added
to the end of the transaction files. Periodically this file is sorted and merged with the master file
during reorganization.

7.8 Hashing Techniques:


Hashing techniques provide very fast access to records on a certain condition. This
organization is called a hash file. The search condition specified must be an equality condition
on a single field called the hash field of the file. If the hash field is also a key field then it is
called hash key. Hashing provides a function h called the hash function that is applied to the
hash field value of the record and yields the address of the disk block in which the record is
stored. A search for the record within the block can be carried out in a main memory buffer.

7.8.1 Internal Hashing:


For internal files, hashing is typically implemented as a hash table through the use of
array of records. Suppose the array index range is from 0 to M-1. We then have M slots whose
addresses correspond to array indexes. We choose a hash function that transforms the hash field
value into an integer between 0 and M-1. One most commonly used hash function is h(K)=K
mod M which returns a remainder of an integer hash field value K after division by M. This
value is used for record address. Non integer hash field values can be transformed into integers
before the mod function is applied. For characters strings, the numeric or ASCII value associated
with the characters can be used in transformation.
One technique called folding involves applying an arithmetic function such as addition or
a logical function such as exclusive OR to calculate hash address. The problem with most
hashing functions is that they do not guarantee that distinct values will hash to distinct addresses,
because, the hash field space – the number of possible values a hash field can take is usually
larger than the address space – the number of available addresses for records.
A collision occurs when the hash field value of the record that is being inserted hashes to
an address that already contains a different record. In such situation, we must insert a new record
in some other position, since its hash address is already occupied. The process of finding another
position is called collision resolution. The different methods of collision resolution are:
 Open addressing – Proceeding from the specified position specified by hash address, the
program checks subsequent positions in order until an unused position is found.
 Chaining – Various overflow locations are kept, usually extending the array with the
number of overflow positions. In addition a pointer field is added to each record location.
A collision is resolved by placing the new record in an unused overflow location and

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 19


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

setting the pointer of the occupied hash address location to the address of that overflow
location.
 Multiple hashing – The program applies a second hash function if the first results in a
collision

7.8.2 External Hashing:


Hashing for disk files is called external hash files. To suit the characteristics of disk
storage, the target address is made up of buckets, each of which holds multiple records. A bucket
is either one disk block or a cluster of contiguous blocks. The hash function maps a key into a
relative bucket number, rather than assign an absolute block address to a bucket. A table
maintained in the file header converts the bucket number into the corresponding disk block
address.
The collision problem is less severe in with buckets, because there are many records that
can fit in a single bucket. But, in case a collision occurs, it is resolved using record pointers. The
pointers in the linked list include both the block address and the relative record position within
the block. When a fixed number of buckets are allocated, the hashing scheme is called static
hashing.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 20


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

7.9 Other Primary Organizations:


7.9.1 Files of Mixed Records
The file organizations we have studied so far assume that all records of a particular file
are of the same record type. The records could be of EMPLOYEEs, PROJECTs, STUDENTs, or
DEPARTMENTs, but each file contains records of only one type. In most database applications,
we encounter situations in which numerous types of entities are interrelated in various ways.
Relationships among records in various files can be represented by connecting fields. For
example, a STUDENT record can have a connecting field Major_dept whose value gives the
name of the DEPARTMENT in which the student is majoring. This Major_dept field refers to a
DEPARTMENT entity, which should be represented by a record of its own in the
DEPARTMENT file. If we want to retrieve field values from two related records, we must
retrieve one of the records first. Then we can use its connecting field value to retrieve the related
record in the other file. Hence, relationships are implemented by logical field references among
the records in distinct files.

7.10 RAID Technology:


RAID stands for Redundant Array of Independent Disks. The main goal of RAID is to
improve the performance of disks like disk access time and disk capacities. Here the large arrays
of small disks act as a single higher performance disk. A concept called data stripping is used,
which utilizes parallelism to improve disk performance. Data stripping distributes data
transparently over multiple disks to make them appear as a single large, fast disk. Stripping
improves the overall I/O performance. It accomplishes load balancing among disks. Reliability
can be improved by storing redundant information on disks.
One technique for introducing redundancy is mirroring /shadowing. Data is written
redundantly on two identical physical disks that are treated as one logical disk. When data is
read, it can be retrieved from the disk with shorter seek time and rotational delay. If the disk fails
another one is used until it is repaired. To incorporate redundancy, we must consider two
problems
1. Selecting a technique for computing redundant information
2. Selecting methods of distributing redundant information across disk array.
The first problem can be solved by using error correcting codes involving parity bits or hamming
codes. The second problem can be solved by storing redundant information on small number of
disks or to distribute it uniformly across all disks.
Data stripping can be applied at finer levels of granularity by breaking a byte of data into
bits and spreading the bits across disks. This concept is called bit-level data stripping that
consists of splitting a byte of data and writing bit j to the jth disk. Even blocks of a file can be
stripped across disks. This concept is called block-level stripping.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 21


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

RAID Organization and Levels:


Different RAID organizations are defined based on data interleaving and pattern used to
compute redundant information. The levels of RAID lie between 0 to 6.
RAID level 0 uses data stripping and has no redundant data. It has got good write performance.
RAID level 1 uses mirrored disks. It has good read performance. RAID level 2 uses memory
style redundancy by using Hamming codes. It uses both error detection as well as error
correction. RAID level 3 uses single parity disk relying on the disk controller to figure out which
disk has failed. RAID levels 4 and 5 make use of block level stripping, with level 5 distributing
data and parity information across all disks. RAID level 6 applies P+Q redundancy scheme using
Reed-Solomon codes to protect against upto 2 disk failures by using just two redundant disks.

7.11 New Storage Systems:


Most large organizations have moved to a concept called storage area networks (SANs).
In a SAN, online storage peripherals are configured as nodes on a high-speed network and can be
attached and detached from servers in a very flexible manner. Several companies have emerged
as SAN providers and supply their own proprietary topologies. They allow storage systems to be
placed at longer distances from the servers and provide different performance and connectivity
options. The main advantages claimed include:
 Flexible many-to-many connectivity among servers and storage devices using fiber
channel hubs and switches
 Up to 10 km separation between a server and a storage system using appropriate fiber
optic cables
 Better isolation capabilities allowing non-disruptive addition of new peripherals and
servers

Assignment
Short Answers (2 marks)
1. Define mirroring
2. Define stripping
3. Define seek time
4. Define block transfer time

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 22


DATABASE CONCEPT & ORACLE – UNIT-2 II Semester BCA [2019-2020]

5. What are the limitations in internal hashing?


6. What do you mean by track?
7. Define cylinder.

Long answers (4 or more marks)


1. What are the reasons for variable length records? What types of separator characters are
needed for each?
2. Write a note on magnetic tape storage device.
3. Explain the function of disk controller in a magnetic disk.
4. What are sorted files? What are the advantages of such a file over unordered files?
5. Explain different operations on file.
6. Discuss hash file organization with reference to disk files.
7. Explain about RAID technology.
8. Explain the concept of double buffering. How it improves data access?
9. Explain Heap (unordered) file organization. Also mention the drawbacks.
10. Explain the hardware mechanism of hard disk.
11. Explain the concept of internal hashing.
12. Explain the concept of external hashing.

[Mrs. Deepa D. Hegde-SDM College of Business Management, Mangalore] 23

You might also like