0% found this document useful (0 votes)
57 views14 pages

Unit 4

The document discusses different types of file organization in database management systems including sequential, heap, hash, B+ tree, and clustered file organization. It provides details on how each type works, its advantages and disadvantages.

Uploaded by

Jagyandutta Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views14 pages

Unit 4

The document discusses different types of file organization in database management systems including sequential, heap, hash, B+ tree, and clustered file organization. It provides details on how each type works, its advantages and disadvantages.

Uploaded by

Jagyandutta Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Input, output and form design

File organization and database design

File organization and database design are two important topics in database
management systems (DBMS).

What is File Organization?


File Organization refers to the logical relationships among various records that
constitute the file, particularly with respect to the means of identification and
access to any specific record. In simple terms, storing the files in a certain order is
called File Organization.
 Here database holds huge amount of data, which have multiple files.
 And the file holds the data in the form of records, so the file is nothing but the
collection of records.
 Record can have more than attributes, means record is collections of
attributes or fields.
The Objective of File Organization
 It helps in the faster selection of records i.e. it makes the process faster.
 Different Operations like inserting, deleting, and updating different records are
faster and easier.
 It prevents us from inserting duplicate records via various operations.
 It helps in storing the records or the data very efficiently at a minimal cost
Types of File Organizations
Some types of File Organizations are:
 Sequential File Organization
 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization
 ISAM (Indexed Sequential Access Method)

Sequential File Organization


Sequential File Organization is a method of storing files in a sequential manner, i.e.,
one after another, in the order of insertion or sorting. It is a simple and easy technique
for file organization, but it has some drawbacks, such as slow access to specific records,
extra time and space for sorting, and difficulty in updating or deleting records. There
are two ways to implement Sequential File Organization:
 Pile File Method and
 Sorted File Method.
Pile File Method stores the records in the order of insertion, without any sorting. It
is suitable for applications that need to access all the records in a file, such as report
generation or statistical calculations.
Sorted File Method stores the records in a sorted order, based on some key
attribute. It is suitable for applications that need to access specific records based on
the key value, such as searching or indexing.

Heap File Organization

Heap file organization is a simple and basic type of file organization in DBMS. It works
with data blocks, where records are inserted at the end of the file without any sorting
or ordering. When a data block is full, a new record is stored in any other available
block. This makes insertion very efficient, but searching, updating, or deleting records
can be slow and time-consuming, as the entire file has to be scanned until the
requested record is found. Heap file organization is also known as unordered or pile
file organization.

Here is an example of how heap file organization works:


Suppose we have five records R1, R3, R6, R4 and R5 in a heap file, and we want to insert a new record
R2. If the data block 3 is full, then R2 will be inserted in any other block, say data block 1.

| Data Block 1 | Data Block 2 | Data Block 3 |


|--------------|--------------|--------------|
| R1 | R3 | R6 |
| R2 | R4 | R5 |
| | | |
Hash File Organization

Hash file organization is a method of storing and accessing records in a database using
a hash function. A hash function takes a value of an attribute or a set of attributes,
called the hash key, and maps it to the address of a disk block, called the hash bucket,
where the record is stored. This allows for direct and fast access to records without
using an index structure.

However, hash file organization also has some drawbacks, such as:

 It is difficult to support range queries, as the records are not stored in any sorted order.
 It may cause bucket overflow, when more than one record is mapped to the same
bucket. This can be handled by using overflow buckets, chaining, or rehashing.
 It may suffer from poor space utilization, if the hash function does not distribute the
records evenly among the buckets.

Here is an example of how hash file organization works:


Suppose we have a table of students with the following schema:

Student (ID, Name, Age, GPA)

We use the ID attribute as the hash key, and apply a mod (5) hash function
to generate the bucket address. For example, if ID = 104, then the bucket
address is 104 mod 5 = 4.

| Data Block 1 | Data Block 2 | Data Block 3 | Data Block 4 | Data Block 5
|
|--------------|--------------|--------------|--------------|--------------
|
| ID = 100 | ID = 101 | ID = 102 | ID = 103 | ID = 104 |
| Name = Alice | Name = Bob | Name = Carol | Name = David | Name = Eve |
| Age = 20 | Age = 21 | Age = 19 | Age = 22 | Age = 18 |
| GPA = 3.5 | GPA = 3.2 | GPA = 3.8 | GPA = 3.4 | GPA = 3.6 |
| | | | | |

If we want to insert a new record with ID = 105, Name = Frank, Age = 20, and
GPA = 3.7, then the bucket address is 105 mod 5 = 0. Since data block 1 is
not full, we can insert the record there.

| Data Block 1 | Data Block 2 | Data Block 3 | Data Block 4 | Data Block 5
|
|--------------|--------------|--------------|--------------|--------------
|
| ID = 100 | ID = 101 | ID = 102 | ID = 103 | ID = 104 |
| Name = Alice | Name = Bob | Name = Carol | Name = David | Name = Eve |
| Age = 20 | Age = 21 | Age = 19 | Age = 22 | Age = 18 |
| GPA = 3.5 | GPA = 3.2 | GPA = 3.8 | GPA = 3.4 | GPA = 3.6 |
| ID = 105 | | | | |
| Name = Frank | | | | |
| Age = 20 | | | | |
| GPA = 3.7 | | | | |

If we want to search for the record with ID = 103, then the bucket address
is 103 mod 5 = 3. We can directly go to data block 4 and retrieve the record.

If we want to delete the record with ID = 102, then the bucket address is
102 mod 5 = 2. We can directly go to data block 3 and remove the record.

If we want to update the record with ID = 104, then the bucket address is
104 mod 5 = 4. We can directly go to data block 5 and modify the record.

B+ Tree File Organization


o B+ tree file organization is the advanced method of an indexed sequential
access method. It uses a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the
records. For each primary key, the value of the index is generated and mapped
with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than
two children. In this method, all the records are stored only at the leaf node.
Intermediate nodes act as a pointer to the leaf nodes. They do not contain any
records.
The above B+ tree shows that:

o There is one root node of the tree, i.e., 25.


o There is an intermediary layer with nodes. They do not store the actual record.
They have only pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and
nodes to the right contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and
29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path
and accessed easily.

B+ Tree File Organization has some advantages, such as:

 It makes searching easy and fast, as the records are sorted and can be accessed by
traversing a single path in the tree.
 It can grow or shrink dynamically, as the number of records increases or decreases.
 It is a balanced tree, so the performance is not affected by insertions, deletions, or
updates.

B+ Tree File Organization also has some disadvantages, such as:

 It is inefficient for static files, where the records do not change frequently.
 It requires extra space for storing the pointers and the index values

Clustered File Organization


o When the two or more records are stored in the same file, it is known as clusters.
These files will have two or more tables in the same data block, and key
attributes which are used to map these tables together are stored only once.
o This method reduces the cost of searching for various records in different files.
o The cluster file organization is used when there is a frequent need for joining
the tables with the same condition. These joins will give only a few records from
both tables. In the given example, we are retrieving the record for only
particular departments. This method can't be used to retrieve the record for the
entire department.
In this method, we can directly insert, update or delete any record. Data is sorted based
on the key with which searching is done. Cluster key is a type of key with which joining
of the table is performed.

Advantages:

 It makes joining faster and easier, as the related records are stored together and the
key attributes are stored only once.
 It can handle dynamic changes in the number and size of records.

Disadvantages:

 It is not suitable for static files, where the records do not change often.
 It requires extra space for storing the pointers and the index values.

Database design

Database design is the process of organizing data according to a database model. It


involves determining what data must be stored and how the data elements interrelate.
With this information, the designer can begin to fit the data to the database model. A
good database design can improve the performance, consistency, security, and
scalability of the database system. Database design has several steps, such as:

 Identifying the purpose and scope of the database


 Listing the entities and attributes of the data
 Defining the relationships and constraints among the data
 Normalizing the data to reduce redundancy and anomalies
 Choosing a suitable database management system (DBMS)
 Implementing the physical design of the database

File structure
A file structure is a combination of representations for data in files. It is also a
collection of operations for accessing the data. It enables applications to read, write,
and modify data. File structures may also help to find the data that matches certain
criteria. An improvement in file structure has a great role in making applications
hundreds of times faster.

A good file structure should:

 Fast access to a great capacity


 Reduce the number of disk accesses
 Manage growth by splitting these collections.

It is relatively easy to develop file structure designs that meet these goals when the files
never change. However, as files change, grow, or shrink, designing file structures that
can have these qualities is more difficult.

Database design

1. Database designs provide the blueprints of how the data is going to be stored in
a system. A proper design of a database highly affects the overall performance
of any application.
2. The designing principles defined for a database give a clear idea of the
behaviour of any application and how the requests are processed.
3. Another instance to emphasize the database design is that a proper database
design meets all the requirements of users.
4. Lastly, the processing time of an application is greatly reduced if the constraints
of designing a highly efficient database are properly implemented.

Life Cycle
Requirement Analysis

First of all, the planning has to be done on what are the basic requirements of the
project under which the design of the database has to be taken forward. Thus, they can
be defined as:-

Planning - This stage is concerned with planning the entire DDLC (Database
Development Life Cycle). The strategic considerations are taken into account before
proceeding.

System definition - This stage covers the boundaries and scopes of the proper
database after planning.

Database Designing

The next step involves designing the database considering the user-based
requirements and splitting them out into various models so that load or heavy
dependencies on a single aspect are not imposed. Therefore, there has been some
model-centric approach and that's where logical and physical models play a crucial
role.

Physical Model - The physical model is concerned with the practices and
implementations of the logical model.

Logical Model - This stage is primarily concerned with developing a model based on
the proposed requirements. The entire model is designed on paper without any
implementation or adopting DBMS considerations.

Implementation

The last step covers the implementation methods and checking out the behaviour that
matches our requirements. It is ensured with continuous integration testing of the
database with different data sets and conversion of data into machine understandable
language. The manipulation of data is primarily focused on these steps where queries
are made to run and check if the application is designed satisfactorily or not.
Data conversion and loading - This section is used to import and convert data
from the old to the new system.

Testing - This stage is concerned with error identification in the newly implemented
system. Testing is a crucial step because it checks the database directly and compares
the requirement specifications.

Objective of database

The objective of a database is to provide a systematic way of storing, organizing, and


retrieving data that can serve many applications efficiently and reliably. Some of the
specific objectives of a database are:
Mass Storage
DBMS can store a lot of data in it. So for all the big firms, DBMS is really ideal technology
to use. It can store thousands of records in it and one can fetch all that data whenever it is
needed.

Removes Duplicity
If you have lots of data then data duplicity will occur for sure at any instance. DBMS
guarantee it that there will be no data duplicity among all the records. While storing new
records, DBMS makes sure that same data was not inserted before.

To reduce data redundancy and inconsistency by storing data in a single place and
enforcing rules and constraints on the data.

Multiple Users Access


No one handles the whole database alone. There are lots of users who are able to access
database. So this situation may happen that two or more users are accessing database.
They can change whatever they want, at that time DBMS makes it sure that they can work
concurrently.

To facilitate data access and manipulation by using a standard query language


(such as SQL) and providing various functions and tools for data processing.

To enable data sharing and collaboration by allowing multiple users and


applications to access and modify the data concurrently and consistently.
Data Protection
Information such as bank details, employee’s salary details and sale purchase details
should always be kept secured. Also all the companies need their data secured from
unauthorized use. DBMS gives a master level security to their data. No one can alter or
modify the information without the privilege of using that data.

To protect data security and privacy by implementing mechanisms to control the


access and usage of the data and prevent unauthorized or malicious actions.

Data Back-up and recovery


Sometimes database failure occurs so there is no option like one can say that all the data
has been lost. There should be a backup of database so that on database failure it can be
recovered. DBMS has the ability to backup and recover all the data in database.

So it support data durability and recovery by creating backups and logs of the data
and restoring the data in case of system failures or crashes.

Integrity
Integrity means your data is authentic and consistent. DBMS has various validity checks
that make your data completely accurate and consistence.

So it ensure data integrity and quality by maintaining the accuracy and validity of
the data and preventing data anomalies and errors.

Everyone can work on DBMS

There is no need to be a master of programming language if you want to work on DBMS.


Any accountant who is having less technical knowledge can work on DBMS. All the
definitions and descriptions are given in it so that even a non-technical background person
can work on it

Platform Independent
One can run dbms at any platform. No particular platform is required to work on database
management system.
Normalization

A large database defined as a single relation may result in data duplication. This
repetition of data may result in:

o Making relations very large.


o It isn't easy to maintain and update data as it would involve searching many
records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.

What is Normalization?

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal form
if it satisfies constraints.

Following are the various types of Normal forms:


Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no
multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.

Advantages of Normalization

o Normalization helps to minimize data redundancy.


o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization

o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious
problems.

You might also like