Module 5 File Organization 1

The document discusses different file organization techniques used in database management systems. It describes heap file organization, sequential file organization, and indexed sequential file organization. Key aspects like structure, advantages, and disadvantages of each technique are explained.

Uploaded by

CSE-41-Rituparna Meher

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Module 5 File Organization 1

Uploaded by

CSE-41-Rituparna Meher

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

FILE ORGANIZATION

File Organization
A file organization is a way of arranging the records in a file when tl1e file is stored on
secondary storage (disk, tape etc.).

The different ways of arranging the records enable different operations to be carried out efficiently
over the file.

A database management system supports several file organization techniques. The most important
task of DBA is to choose a best - organization for each file, based on its use.
File Organization
The organization of records in a file is influenced by number of factors that must be taken into
consideration while choosing a particular technique. These factors are:
a) fast retrieval, updation and transfer of records,
b) efficient use of disk space,
c) high throughput,
d) type of use,
e) efficient manipulation,
f) security from unauthorized access,
g) scalability,
h) reduction in cost,
i) protection from failure.
File & Record
• File is a collection of related sequence of records.
• A collection of field names and their corresponding data types constitutes a record.
• A data type, associated with each field, specifies the types of values a field can take. All records
in a file are of the same record type.
• Records and Record Types
Data is generally stored in the form of records. A record is a collection of fields or data items
and data items is formed of one or more bytes. Each record has a unique identifier called
record-id. The records in a file are one of the following two types :
(i) Fixed Length records.
(ii) Variable Length records.
Fixed Length Record
 Each record in the file has exactly of same size.
 The record slots are uniform and are arranged in continuous manner in a file.
Advantage
1. Insertion and deletion of records in the file are simple to implement since the space made
available by a deleted record is same as needed to insert a new record.
Disadvantages
1. In fixed length records, since the length of record is fixed, it causes wastage of memory
space. For example, if the length is set up to 50 characters and most of the are less than 25
characters, it causes wastage of precious memory space.
2. It is an inflexible approach. For example, if it is required to increase the length the record,
then major changes in program and database are needed.
Variable Length Record
 Every record in the file need not be of the same size. Therefore, the records in the file have
different sizes.
Advantage
1. It reduces manual mistakes as database automatically adjust the size of record
2. It saves lot of memory space in case of records of variable lengths.
3. It is a flexible approach since future enhancements are very easy to implement.
Disadvantages
1. It increases the overhead of DBMS because database have to keep record of the size of all
records.
Types Of Files
The following three types of files are used in database systems :
1. Master file
2. Transaction file
3. Report file.

1. Master file : This file contains information of permanent nature about the entities. The master
file act as a source of reference data for processing transactions. They accumulate the
information based on the transaction data.

2. Transaction file : This file contains records that describe the activities carried out by the
organization. This file is created as a result of processing transactions and preparing transaction
documents. These are also used to update the master file permanently.

3. Report file : This file is created by extracting data from the different records to prepare a report
e.g. A report file about the weekly sales of a particular item
File Organization Techniques
A file organization is a way of arranging the records in a file wl1en the file is stored on secondary
storage (disk, tape etc). There are different types of file organizations that are used by
applications. The operations to be performed and the selection of storage device are the major
factors that influence the choice of a particular file organization. The different types of
file organizations are as follows :
1. Heap file organization
2. Sequential file organization
3. lndexed-Sequential. file organization
4. Hashing or Direct file organization
Heap File Organization
 A heap file is an unordered set of records.
 Heap File Organization is the most basic form of file organization, based on data chunks.
 In such an organization, records are stored in the file in the order in which they are inserted, and
new records are always placed at the end of the file.
 In the file, every record has a unique id, and every page in a file is of the same size. It is the
DBMS responsibility to store and manage the new records.
Heap File Organization
 Inserting a New Record
The insertion of a new record is very efficient. It is performed in the following steps:
• The last disk block of the file is copied into a buffer.
• The new record is added.
• The block in the buffer is then rewritten back to the disk.

Let's consider a scenario where we have five

records in a heap, R1, R3, R6, R4, and R5,
and we need to add a new record, R2.
If data block 3 is filled, the DBMS will place it in
a selected database, such as data block 1.
Heap File Organization
 Accessing a record, in this method, we need to traverse from the beginning of the file till we get
the requested record. Hence fetching the records in very huge tables, it is time consuming. This
is because there is no sorting or ordering of the records. We need to check all the data.

 Delete or update a record, first we need to search for the record. Again, searching a record is
similar to retrieving it-start from the beginning of the file till the record is fetched. If it is a small
file, it can be fetched quickly. But larger the file, greater amount of time needs to be spent in
fetching.

 In addition, while deleting a record, the record will be deleted from the data block. But it will
not be freed and it cannot be re-used. Hence as the number of record increases, the memory
size also increases and hence the efficiency. For the database to perform better, DBA has to free
this unused memory periodically.
Heap File Organization
Advantages
1. Insertion of new record is fast & efficient.
2. The filling factor of this file organization is 100%.
3. Space is fully utilized & conserved.

Disadvantages
1. Searching & accessing of record is too slow.
2. Deletion of many records result in wastage of space.
3. Updation cost of data is comparatively high.
Sequential File Organization
 In sequential file organization, records are stored in a sequential order according to the "search
key".
 A search Key is an attribute or a set of attributes which are used to serialize the records.
 It is not necessary that search key must be primary key.
 It is the simplest method of file organization.
 Sequential method is based on tape model. Devices who support sequential access are
magnetic tapes, cassettes, card readers etc. Editors and compilers also use this approach to
access files.
Sequential File Organization
 Structure of a sequential file is shown in Figure.
 The records are stored in sequential order one after another.
 To reach at the consecutive record from any record pointers are used. The Pointers are used
for fast retrieval of records.
 Sequential file organization is a type of file organization used in database management
systems (DBMS) where data is stored sequentially in a file or table.
 In this method, data is accessed sequentially in the order it is stored, starting from the
beginning of the file and proceeding towards the end.
Sequential File Organization
 In a sequential file, each record is stored one after the other, without any index or key. This
makes it easy to add new records to the end of the file, but searching for specific records can be
slow and inefficient because the system has to search through the entire file to find the desired
record.

 Sequential file organization is suitable for situations where data is accessed in a serial manner,
such as batch processing or generating reports. It is also used in situations where data is not
frequently updated, as adding or deleting a record can cause the entire file to be rewritten.

 However, this method is not efficient for situations that require frequent updates or random
access to data. To improve the efficiency of accessing data in a sequential file, various
techniques like buffering, caching, and memory-mapped files can be used.
Sequential File Organization
Advantages
1. It is easy to understand.
2. Efficient file system for small size files.
3. Construction and reconstruction are much easier in comparison to other files
4. Supports tape media, editors and compilers.
5. It contains sorted records.
Disadvantages
1. Inefficient file system for medium and large size files.
2. Updations and maintenance are not easy.
3. Inefficient use of storage space because of fixed size blocks.
4. Linear search takes more time.
5. Before Updations all transactions are stored sequentially.
Indexed Sequential File Organization
 Index sequential file organization is used to overcome the disadvantages of sequential file
organization.
 It also preserves the advantages of sequential access.
 This organization enables fast searching of records with the use of index.
 Some basic terms associated with indexed sequential file organization are as follows :
 Block: Block is a unit of storage in which records are saved.
 Index: Index is a table with a search key by which block of a record can be find.
 Pointer: Pointer is a variable which points from index entry to starting address of block.
Indexed Sequential File Organization
Components Of Indexed Sequential File Organization
Data File: The records in the data file are stored in sorted order based on a key field.
Index File: The index file contains index entries, which are pointers to blocks (or records) in the
data file. These index entries are also sorted based on the key.
Overflow Area: This is used to store new records that are inserted but cannot fit into the primary
data area. The overflow area is also organized in a sorted manner.

Operations of ISAM
Searching: When searching for a record, the system first looks in the index to find the disk block
that contains the record. Then, that specific block is fetched to retrieve the record. This usually takes
two disk I/O operations.
Insertion: If a new record needs to be inserted, it is placed in its correct sorted position. If the block
is full, the record is placed in the overflow area.
Deletion: Deleting a record involves marking it as deleted. The space is then reclaimed during a
subsequent reorganization of the data file.
Updating: Updating a record involves either modifying it if the key is not changed, or deleting and
re-inserting it if the key is modified.
Components Of Indexed Sequential File Organization
Example of ISAM
Let's consider a simplified library database where we have a file containing book information sorted
by ISBN numbers.
Data File Index File
ISBN TITLE ISBN Block Address
123 Book A 123 1
234 Book B 234 2
345 Book C 345 3
456 Book D 456 4

Searching for ISBN 345

1. Search in the index for ISBN 345, and find that it is located at block 3.
2. Go to block 3 in the data file to retrieve the record for ISBN 345 ("Book C").
Components Of Indexed Sequential File Organization
Inserting a new book with ISBN 133
1. The new book would ideally go into block position 1 (between ISBN 123 and 234).
2. Since the block is full, the new record with ISBN 133 ("New Book") will go into the overflow area.
3. An index entry for ISBN 133 pointing to the overflow area is added.

Updated Overflow Area

ISBN TITLE
133 New Book
Components Of Indexed Sequential File Organization
Advantages
1. Efficient file system for medium and large size files.
2. Easy to update.
3. Easy to update than direct files.
4. Efficient use of storage.
5. Searching of records is fast.
6. Maintain advantages of sequential file.

Disadvantages
1. Inefficient file system for small size files.
2. It is an expensive method.
3. Typical structure than sequential file.
4. Indexes need additional storage space.
5. Performance degradation with respect to growth of files.
Hash File Organization
 Hash file organization is a type of file organization where data is stored in a file or table using a
hash function.
 A hash function is a mathematical function that converts a key value into a hash code, which is
used to map the key to the location in the file or table where the data is stored.

Hash Function
Key Field Address of records
Hash File Organization
 Any type of mathematical function can be used as a hash function. It can be simple or complex.

 Hash function is applied to columns or attributes to get the block address. The records are
stored randomly. So, it is also known as Direct or Random file organization.

 If the generated hash function is on the column which is considered as key, then the column
can be called as hash key and if the generated hash function is on the column which is
considered as non-key, then the column can be called as hash column.

 Hash file organization is suitable for situations where data needs to be accessed quickly and
efficiently based on the value of its key, and where the data is not frequently updated. However,
if the hash function is poorly designed or if there are collisions (where two different keys map to
the same location), it can result in poor performance and decreased efficiency.
Hash File Organization
 Any type of mathematical function can be used as a hash function. It can be simple or complex.

 Hash function is applied to columns or attributes to get the block address. The records are
stored randomly. So, it is also known as Direct or Random file organization.

 Hash file organization is suitable for situations where data needs to be accessed quickly and
efficiently based on the value of its key, and where the data is not frequently updated. However,
if the hash function is poorly designed or if there are collisions (where two different keys map to
the same location), it can result in poor performance and decreased efficiency.
Hash File Organization
 In the diagram above, a hash table maps the hashed keys to a specific data block in memory.
Data blocks are represented as rectangular boxes, with each box containing one or more data
records. Each data record is shown as a separate row in the data block.

 When a new data record is inserted into the hash file, the hash function is used to hash its key,
and the resulting hashed key is used to determine which data block it should be stored in. Then
the data record is inserted, including the appropriate data block.

 In this example, Record 1 and Record 4 are stored in Data Block 1, Record 5 and Record 2 in
Data Block 2, and Record 3, Record 7, and Record 6 are stored in Data Block 3, respectively.
Hash File Organization
 There are various hashing techniques. These are as follows :
 Mid Square Method
 Folding Method
 Division
 Division-Remainder Method
 Radix Information Method
 Polynomial Conversion Method
 Truncation Method
 Conversion using Digital Gates
Direct File Organization
 Direct File Organization is used to access the records of file randomly.
 In direct file organization, records can be stored anywhere in storage area, but can be
accessed directly without any sequential searching.
 It overcomes the drawbacks of sequential, indexed sequential and B-tree organization.
 For efficient organization and direct access of individual record, some mapping and
transformation process is required that converts key field of a record into its physical address
location.
 Actually, direct file organization depends upon the hashing that provides the base of mapping
procedure.
 To overcome the drawbacks of hashing algorithm, collision resolution techniques are needed.
Devices that support direct access are CD’s, Floppy etc.
 Direct file organization is known as Random File Organization.
Direct File Organization
 Searching (Reading or retrieving from direct file): To read a record from direct file, just enter
the key field of the record. With the help of hashing algorithm that key field is mapped into
physical location of that record.
 Updation of records in direct file:
1. Adding a new record: To add a new record in direct file, specify its key field. With the help
of mapping procedure and collision resolution technique get the free address location for
that record.
2. Deleting record from direct file: To delete a record, first search that record and after,
searching, change its status code to deleted or vacant.
3. Modify any record: To modify any record, first search that record, then make the
necessary modifications. Then re-write the modified record to the same location.
Direct File Organization
Advantages
1. Records are not needed to sorted in order during addition.
2. It gives fastest retrieval of records.
3. It gives efficient use of memory.
4. Supports fast storage devices.
5. Searching time depends upon mapping procedure not logarithm of the number of searchkeys
as in B tree.
Disadvantages
1. Wastages of space if hashing method is not chosen properly.
2. It does not support sequential storage devices.
3. Direct file system is complex and expensive.
4. Extra overhead due to collision resolution techniques.
Indexing
 Index is a collection of data entries which is used to locate a record in a file.
 Index table records consist of two parts, the first part consists of value of prime or non-
prime attributes of file record known as indexing field. and, the second part consists of a
pointer to the location where the record is physically stored in memory.
 In general, index table is like the index of a book, that consists of the name of topic and the
page number.
 During searching of a file record, index is searched to locate the record memory address
instead of searching a record in secondary memory.
Indexing
 On the basis of properties that affect the efficiency of searching, the indexes can be
classified into two categories.
1. Ordered indexing
2. Hashed indexing
 Ordered Indexing In ordered indexing, records of file are stored in some sorted order in
physical memory. The values in the index are ordered (sorted) so that binary search can be
performed on the index.

 Hashing allow us to avoid accessing an index structure. A hashed index consists of two
fields, the first field consists of search key attribute values and second field consists of
pointer to the hash file structure. Hashing index is based on values of records being
uniformly distributed using a hashed function.
Ordering Indexing
Ordered indexes can be divided into two categories.
1. Dense indexing
2. Sparse indexing.
Dense and Sparse Indexes
Dense index : In dense indexing there is a record in index table for each unique value of the
search-key attribute of file and a pointer to the first data record with that value. The other records
with the same value of search-key attribute are stored sequentially after the first record. The order
of data entries in the index differs from the order of data records as shown in Figure.

Advantages
1. It is efficient technique for small and medium sized data files.
2. Searching is comparatively fast and efficient.

Disadvantages
1. Index table is large and require more memory space.
2. Insertion and deletion is comparatively complex.
3. In-efficient for large data files.
Dense and Sparse Indexes
Sparse index : On contrary, in sparse indexing there are only some records in index table for
unique values of the search-key attribute of file and a pointer to the first data record with that value.
To search a record in sparse index we search for a value that is less than or equal to value in index
for which we are looking. After getting the first record, linear search is performed to retrieve the
desired record. There is at most one sparse index since it is not possible to build a sparse index
that is not clustered.

Advantages
1. Index table is small and hence save memory space (specially in large files).
2. Insertion and deletion is comparatively easy.

Disadvantages
1. Searching is comparatively slower, since index table is searched and then linear search is
performed inside secondary memory.
Dense and Sparse Indexes
Clustered and Non-clustered Indexes
Clustered index : ln clustering index file, records are stored physically in order on a non-prime key.
attribute that does not have a unique value for each record. The non-prime key field is known as
clustering field and index is known as clustering index. It is same as dense index. A file can have at
most one clustered index as it can be clustered on at most one search key attribute. It may be
sparse.
Non-Clustered index : An index that is not clustered is known as non-clustered index. Data file
can have more than one non-clustered index.

File Organization in DBMS
No ratings yet
File Organization in DBMS
23 pages
Database Specification Template
100% (1)
Database Specification Template
14 pages
SQL Full
No ratings yet
SQL Full
75 pages
MCA File Structures MCA 212
No ratings yet
MCA File Structures MCA 212
31 pages
Unit 6
No ratings yet
Unit 6
20 pages
ADBMS Lec#2
No ratings yet
ADBMS Lec#2
42 pages
DBMS File Organization
No ratings yet
DBMS File Organization
69 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
Unit 7
No ratings yet
Unit 7
46 pages
Unitv Part1
No ratings yet
Unitv Part1
53 pages
DBMS Book Special Notes PDF
No ratings yet
DBMS Book Special Notes PDF
68 pages
File Organization in RDBMS
No ratings yet
File Organization in RDBMS
9 pages
Chapter 5: File Organization
No ratings yet
Chapter 5: File Organization
13 pages
dbms 5
No ratings yet
dbms 5
38 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
24 pages
Database File Organisation Lecture
No ratings yet
Database File Organisation Lecture
32 pages
Unit 5 Dbms
No ratings yet
Unit 5 Dbms
12 pages
Data 1
No ratings yet
Data 1
43 pages
File Organization
No ratings yet
File Organization
16 pages
Unit v Dbms Question and Answer
No ratings yet
Unit v Dbms Question and Answer
9 pages
Unit-1-Lecture-9
No ratings yet
Unit-1-Lecture-9
22 pages
file organization
No ratings yet
file organization
9 pages
CIT-503 DAM Week 3
No ratings yet
CIT-503 DAM Week 3
50 pages
"File Organization": Prof. Anand N. Gharu
No ratings yet
"File Organization": Prof. Anand N. Gharu
66 pages
File Organization in DBMS
No ratings yet
File Organization in DBMS
13 pages
Integrity Constraints-1 - 241109 - 150808
No ratings yet
Integrity Constraints-1 - 241109 - 150808
24 pages
Ds Mod 5
No ratings yet
Ds Mod 5
17 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
53 pages
Lec 3.File organizations concepts
No ratings yet
Lec 3.File organizations concepts
12 pages
Dbms 5
No ratings yet
Dbms 5
26 pages
UNIT 5 File Organization in DBMS
No ratings yet
UNIT 5 File Organization in DBMS
22 pages
File Organization Unit 4 Notes
No ratings yet
File Organization Unit 4 Notes
29 pages
Week 14 Persistent Data Storage
No ratings yet
Week 14 Persistent Data Storage
7 pages
File Organization in DBMS
100% (1)
File Organization in DBMS
23 pages
10 File Organization in DBMS
No ratings yet
10 File Organization in DBMS
15 pages
LM2 File Organisation
No ratings yet
LM2 File Organisation
31 pages
File Structure
No ratings yet
File Structure
18 pages
Database basics 1
No ratings yet
Database basics 1
42 pages
F - DataBase Chapter 5
No ratings yet
F - DataBase Chapter 5
20 pages
File and Database Design
No ratings yet
File and Database Design
28 pages
File Organization
No ratings yet
File Organization
4 pages
Chapter 11 File Management
No ratings yet
Chapter 11 File Management
13 pages
Module_3_DbMs(merrin)
No ratings yet
Module_3_DbMs(merrin)
28 pages
Module_3_DM
No ratings yet
Module_3_DM
34 pages
File Organization
No ratings yet
File Organization
2 pages
UNIT-6 Important Questions & Answers
No ratings yet
UNIT-6 Important Questions & Answers
20 pages
Database Assignment
No ratings yet
Database Assignment
11 pages
Storage System Hierarchy in DBMS
No ratings yet
Storage System Hierarchy in DBMS
20 pages
File Organization in Dbms
No ratings yet
File Organization in Dbms
11 pages
unit 3 part 1
No ratings yet
unit 3 part 1
4 pages
Ashish (File Oganization) - 1
No ratings yet
Ashish (File Oganization) - 1
12 pages
2022 - CMP 262 - File Organisation - Slides
No ratings yet
2022 - CMP 262 - File Organisation - Slides
19 pages
Lecture 3.3.3 Sequential, Relative
No ratings yet
Lecture 3.3.3 Sequential, Relative
16 pages
file organisation Dp ss2 wk 1
No ratings yet
file organisation Dp ss2 wk 1
9 pages
Class 6
No ratings yet
Class 6
15 pages
Unit -5 - part 1
No ratings yet
Unit -5 - part 1
49 pages
DBMS-Unit5
No ratings yet
DBMS-Unit5
25 pages
heap file org GROUP 7
No ratings yet
heap file org GROUP 7
34 pages
MODULE-5 FILE & Their Organization
No ratings yet
MODULE-5 FILE & Their Organization
13 pages
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Oracle Database 12c Quickstart
From Everand
Oracle Database 12c Quickstart
Michael Elliott
5/5 (5)
Python File Handling Made Easy: A Practical Guide with Examples
From Everand
Python File Handling Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Iscsi Lun Linux 2014 PDF 2120982 PDF
No ratings yet
Iscsi Lun Linux 2014 PDF 2120982 PDF
18 pages
Relieving The Pain Points of Federal It Modernization
No ratings yet
Relieving The Pain Points of Federal It Modernization
24 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
Internal Failure Costs
No ratings yet
Internal Failure Costs
1 page
BCA-SAD-Unit 1
No ratings yet
BCA-SAD-Unit 1
181 pages
IBM Lotus Domino Development Best Practices (011212)
No ratings yet
IBM Lotus Domino Development Best Practices (011212)
217 pages
LabTask2 F2070
No ratings yet
LabTask2 F2070
8 pages
Homework Normalization
No ratings yet
Homework Normalization
2 pages
WebLogic 12c Dynamic Clusters
No ratings yet
WebLogic 12c Dynamic Clusters
8 pages
Abap BW
No ratings yet
Abap BW
7 pages
CS 321 - Compilers: Outline
No ratings yet
CS 321 - Compilers: Outline
8 pages
Emerging - 2021 - Module 2 PDF
No ratings yet
Emerging - 2021 - Module 2 PDF
61 pages
Vulnerability Management Overview
No ratings yet
Vulnerability Management Overview
3 pages
NETAPP
0% (1)
NETAPP
137 pages
IPG IntegrationGuide API 2023-1
No ratings yet
IPG IntegrationGuide API 2023-1
162 pages
Vulnerability Assessment Best Practices
100% (1)
Vulnerability Assessment Best Practices
25 pages
Ch4 Storage
No ratings yet
Ch4 Storage
23 pages
ChandraSekhar 7 8 Yrs Exp
No ratings yet
ChandraSekhar 7 8 Yrs Exp
5 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
34 pages
Computer MCQ
No ratings yet
Computer MCQ
67 pages
Hà Minh Chí (BKC12327) - Assignment 1 lần 1 - SDLC
No ratings yet
Hà Minh Chí (BKC12327) - Assignment 1 lần 1 - SDLC
19 pages
CAIQ Lite
No ratings yet
CAIQ Lite
12 pages
AWS Dumps 2
No ratings yet
AWS Dumps 2
7 pages
Apache Proxy Inotes
No ratings yet
Apache Proxy Inotes
16 pages
Waterfall Model: Cycle Model. It Is Very Simple But Idealistic. Earlier This Model Was Very
No ratings yet
Waterfall Model: Cycle Model. It Is Very Simple But Idealistic. Earlier This Model Was Very
12 pages
Is The World's Most Complete, Tested, and Popular Distribution of Apache Hadoop and Related Projects. A. MDH B. CDH C. ADH
No ratings yet
Is The World's Most Complete, Tested, and Popular Distribution of Apache Hadoop and Related Projects. A. MDH B. CDH C. ADH
21 pages
Canteen Automation System
No ratings yet
Canteen Automation System
62 pages
Final Intership Report
No ratings yet
Final Intership Report
27 pages