0% found this document useful (0 votes)
78 views23 pages

A Lalitha Associate Professor Avinash Degree College: Unit-II Database Integrity and Normalization

The document discusses database integrity and normalization. It defines data redundancy as repetition of the same data in multiple places in a database. This can lead to issues like insertion, update, and deletion anomalies. The document then discusses various normal forms including 1NF, 2NF, 3NF, BCNF, 4NF and 5NF. It explains the rules and requirements to satisfy each normal form. The goal of normalization is to organize data in tables to minimize redundancy and dependency.

Uploaded by

lalitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views23 pages

A Lalitha Associate Professor Avinash Degree College: Unit-II Database Integrity and Normalization

The document discusses database integrity and normalization. It defines data redundancy as repetition of the same data in multiple places in a database. This can lead to issues like insertion, update, and deletion anomalies. The document then discusses various normal forms including 1NF, 2NF, 3NF, BCNF, 4NF and 5NF. It explains the rules and requirements to satisfy each normal form. The goal of normalization is to organize data in tables to minimize redundancy and dependency.

Uploaded by

lalitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Unit-II

DATABASE INTEGRITY AND NORMALIZATION

A LALITHA
ASSOCIATE PROFESSOR
AVINASH DEGREE COLLEGE
Redundancy and associated problems

Data redundancy means repetition of same data in more than one place in a database.
Some of the associated problems due to redundancy are
Employee_ID Name Department Student_Grou
p
1. Insertion anomaly 123 J. Longfellow Accounting Beta Alpha Psi
2. Update anomaly 234 B. Rech Marketing Marketing
3. Deletion anomaly Club
234 B. Rech Marketing Management
Club
456 A. Bruchs CIS Technology
Org.
Single valued dependencies
456 A. Bruchs CIS Beta Alpha Psi
Functional dependency: It is described as “An attribute is functionally dependent if its value is determined by another attribute”.
Ex: A-> B where A is the determinant and B is the functionally dependent attribute.
NORMALIZATION

Normalisation is a process of decomposing a relation into two or more relations with a specific relationship being set up between
the tables, so that all anomalies are removed
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form (BCNF)
 Fourth Normal Form (4NF)
 Fifth Normal Form (5NF)
1NF
A relation is in first normal form if it contains no non-atomic values and each row can provide a unique
combination of values. OR
A relation is in 1NF if every attribute value in a relation is atomic which means that a relation must not contain

multi-valued attributes.
Ex: Consider a company X want to store personal details of its employees. It creates a table named Employee.
2NF
A table is said to be in 2NF if both the following conditions hold:
 Table is in 1NF (First normal form)
 Every non-key attribute must be functionally dependent on the full set of primary key attributes.
After decomposing into second normal form

Composite primarykey: CustomerID and StoreID


3NF

A table design is said to be in 3NF if both the following conditions hold:

A relation must be in second normal form

 There should not be any transitive dependencies


A transitive dependency is a type of functional dependency in which the value in a non-key field is determined by the value in
another non-key field and that field is not a candidate key.
 Project table Project Manager table
BCNF

 It is an advance version of 3NF that’s why it is also referred as 3.5NF.


 A table is said to be in BCNF if every determinant is a candidate key in that relation.
 BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be
the super key of the table.
 After decomposing it into Boyce-Codd normal form it looks like:
4NF

 Fourth normal form (4NF) is a level of database normalization where there are no non-trivial multivalued dependencies other
than a candidate key.
 It builds on the first three normal forms (1NF, 2NF and 3NF) and the Boyce- Codd Normal Form (BCNF). It states that, in
addition to a database meeting the requirements of BCNF, it must not contain more than one multivalued dependency.
(Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third
attribute.) After decomposing into 4NF
5NF
A database is said to be in 5NF, if and only if,
 It's in 4NF.

 If we can decompose table further to eliminate redundancy and anomaly, and when we re-join the decomposed tables by
means of candidate keys, we should not be losing the original data or any new record set should not arise. In simple words, joining
two or more decomposed table should not lose records nor create new records.

 After decomposing it into fifth normal form it looks like:



Decomposition

Decomposition is the process of breaking down in parts or elements. It breaks the table into multiple tables in a database.
If there is no proper decomposition of the relation, then it may lead to problems like loss of information.
Properties of Decomposition
1. Lossless Decomposition
2. Dependency Preservation
3. Attribute preservation
1. Lossless Decomposition
Decomposition must be lossless. It means that the information should not get lost from the relation that is decomposed.
Ex: Let's take 'E' is the Relational Schema, With instance 'e'; is decomposed into: E1, E2, E3, . . . . En; With instance: e1, e2, e3, . . . . en,
If e1 ⋈ e2 ⋈ e3 . . . . ⋈ en, then it is called as 'Lossless Join Decomposition'.
2. Dependency Preservation
Dependency is an important constraint on the database.
Every dependency must be satisfied by at least one decomposed table.
If {A → B} holds, then two sets are functional dependent. And, it becomes more useful for checking the dependency easily if both sets
in a same relation.
3. Attribute preservation
This is a simple requirement that involves preserving all the attributes that were there in the relation that is being decomposed.
All attributes must be preserved through the process of normalization. Starts with universal relation schema R
R= {A1,A2,…..An}, the set of attributes
D is a decomposition of R such that
D= {R1,R2…..Rn}
File organization and its types
File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks are placed on the storage
medium (disk).
Types of File organization
Heap file organization(unordered files)
It is the simplest and most basic type of organization. It works with data blocks. In heap file organization, the records are inserted at the
file's end. When the records are inserted, it doesn't require the sorting and ordering of records.

Advantages
It is a very good method of file organization for bulk insertion.
In case of a small database, fetching and retrieving of records is faster than the sequential record.
Disadvantages
This method is inefficient for the large database because it takes time to search or modify the record.
Deletion can result in unused space/need for reorganization.
Sequential File Organization

This method is the easiest method for file organization. In this method, files are stored sequentially. Two ways
1.Pile file method
 It is a quite simple method. In this method, we store the record in a sequence, i.e., one after another. Here, the record will be
inserted in the order in which they are inserted into tables.
 In case of updating or deleting of any record, the record will be searched in the memory blocks. When it is found, then it will
be marked for deleting, and the new record is inserted.

 2. Sorted File Method:


 In this method, the new record is always inserted at the file's end, and then it will sort the sequence in ascending or
descending order. Sorting of records is based on any primary key or any other key.
 In the case of modification of any record, it will update the record and then sort the file, and lastly, the updated record is
placed in the right place.
Advantages of sequential file organization
 It contains a fast and efficient method for the huge amount of data.
 In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
 It is simple in design. It requires no much effort to store the data.
 This method is used when most of the records have to be accessed like grade calculation of a student, generating the salary
slip, etc.
 This method is used for report generation or statistical calculations.
Disadvantages of sequential file organization
 It will waste time as we cannot jump on a particular record that is required but we have to move sequentially which takes our
time.
 Sorted file method takes more time and space for sorting the records.
Indexed sequential access method (ISAM)
ISAM method is an advanced sequential file organization. In this method, records are stored in the file using the primary key. An index value is
generated for each primary key and mapped with the record. This index contains the address of the record in the file.

Pros of ISAM:
 In this method, each record has the address of its data block, searching a record in a huge database is quick and easy.
 This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key values, we can retrieve
the data for the given range of value. In the same way, the partial value can also be easily searched, i.e., the student name starting with
'JA' can be easily searched.
 
 Cons of ISAM
 This method requires extra space in the disk to store the index value.
 When the new records are inserted, then these files have to be reconstructed to maintain the sequence.
 When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the database will slow down.
Hash File Organization or Direct file organization or Random or Relative

Hash File Organization uses the computation of hash function on some fields of the records. The hash function's output determines the location of disk
block where the records are to be placed.

Advantages of Hash File Organization


 Records need not be sorted after any of the transaction. Hence the effort of sorting is reduced in this method.
 Since block address is known by hash function, accessing any record is very faster. Similarly updating or deleting a record is also very
quick.
 This method can handle multiple transactions as each record is independent of other. i.e.; since there is no dependency on storage location
for each record, multiple records can be accessed at the same time.
 It is suitable for online transaction systems like online banking, ticket booking system etc.
Disadvantages of Hash File Organization
 This method may accidentally delete the data. In such case, older record will be overwritten by newer. So there will be data loss. Thus hash
columns needs to be selected with utmost care. Also, correct backup and recovery mechanism has to be established.
 Since all the records are randomly stored, they are scattered in the memory. Hence memory is not efficiently used.
 System design is complex and costly.
 File updating is more difficult as compared to sequential files.
Types of Indexes
Indexing is a data structure technique to effeciently retrieve records from the database files based on some attributes on which the
indexing has been done. Indexing in database systems is similar to what we see in books.
Primary Index:If the index is created on the basis of the primary key of the table, then it is known as primary indexing.
Secondary Index: If an index can be generated from a field which is a candidate key.
Clustering Index:A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary key
columns.
Dense index
The dense index contains an index record for every search key value in the data file. It makes searching faster but requires more
space to store index records itself.
 Sparse index
 In the data file, index record appears only for a few items. Each item points to a block.
 In this, instead of pointing to each record in the main table, the index points to the records in the main
table in a gap.

Multi-level Index
Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so small that
it can be saved in a single disk block, which can easily be accommodated anywhere in the main memory.
Tree structure
Tree is a non-linear data structure which organizes data in a hierarchical structure and this is a recursive
definition.

Binary tree
A binary tree is a special type of tree in which every node or vertex has either no child node or one child
node or two child nodes. A binary tree is an important class of a tree data structure in which a node can
have at most two children.

 
Multi-Key file organization

 Multikey file organization is a scheme that allows records to access by more than one key column. In other words it is a
technique used to sort a file based on multi-key values.
 There are numerous techniques that have been used to implement multi-key file organization. Most of these techniques based
on building indexes to provide direct access by the key value.
Two common techniques for this Organisation are
 Multi-list file Organisation
 Inverted file Organisation
Linked Lists
 A linked list is a linear data structure, in which the elements are not stored at contiguous memory locations.
Multi-list file Organisation

Multi-list file organisation is a multi-index linked file organisation. A linked file organisation is a logical organisation where
physical ordering of records is not of concern. In linked organisation the series of records is governed by the links that verify the
next record in series. Linking of records can be ordered or unordered. But such a unordered linking is very costly for searching of
information from a file. Thus, it may be a good idea to link records in the order of increasing primary key.
Consider the employee data as given in Figure. The record numbers A,B,C,D,E,…… are Suppose that the Empid is the key field
of the data records. Let us describe the Multi-list file organisation for the data file.
Married
Record Qualificati
Empid Name Job Gender City / Salary
Number on
Single
Software 15,000/
A 800 Jain B. Tech. Male New DelhiSingle
Engineer -
Software 18,000/
B 500 Inder B. Tech. Female New DelhiMarried
Manager -
Software 16,000/
C 900 Rashi MCA Female Mumbai Single
Manager -

Gurpre Software 12,000/


D 700 B. Tech. Male Mumbai Married
et Engineer -
Software 13,000/
E 600 Meena MCA Female Mumbai Single
Manager -
Inverted-List File Organisation

Like the indexed-sequential storage method, the inverted list organization maintains an index. The two methods differ, however,
in the index level and record storage. The indexed-sequential method has a multiple index for a given key, whereas the inverted
list method has a single index for each key type. In an inverted list, records are not necessarily stored in a particular sequence.
They are placed in the data storage area, but indexes are updated for the record keys and location.
 For ex, consider an employee table with the following fields RECORD_NO, EMPID, LOCATION, SKILL and SALARY.

Record no. Emp id Name Location Skill Salary


1 101 John UP accountant 25000
2 103 Ajay TS executive 30000
3 105 Vijay MP engineer 45000
4 109 Sam TS accountant 52000

5 111 Smith UP engineer 39000

6 122 Arun MP engineer 31500

7 130 James UP executive 18000

8 145 Ruth UP engineer 25000


Examples of Inverted list for above table is

1.Inverted list of SKILLS

Skills Addresses
ACCOUNTANT 1 4    
EXECUTIVE 2 7    
ENGINEER 3 5 6 8

2. Inverted list of Locations


LOCATION ADDRESSES

UP 1 5 7 8
TS 2 4    
MP 3 6    
THANK YOU

You might also like