0% found this document useful (0 votes)

11 views51 pages

File Organizations and Indexes

The document discusses file organizations and indexing structures in databases, detailing how records are stored on disks and accessed efficiently through various methods such as heap, sorted, and hashed files. It explains the importance of indexing for optimizing database performance, including primary and clustering indexes, and addresses challenges like collisions in hashing and the limitations of primary indexes. Additionally, it covers the characteristics of dense and sparse indexes and their applications in improving data retrieval speed.

Uploaded by

Inpagaran Shinthujen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views51 pages

File Organizations and Indexes

Uploaded by

Inpagaran Shinthujen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

File Organizations and Indexes

Rangana Jayashanka
Introduction
• Databases are stored physically as files of
records, which are typically stored on
magnetic disks.
• We will discuss organization of databases in
storage and the techniques for accessing them
efficiently using various algorithms, some of
which require auxiliary data structures called
indexes.
Introduction
• The data stored in disk is organized as files of
records.
• There are primary file organizations, which
determine how the records of a file are
physically placed on the disk, and hence how
the records can be accessed.
Introduction
• Heap file (unordered file) – places the records on
disk in no particular order by appending new
records at the end of the file.
• Sorted file (sequential file) – keeps the records
ordered by the value of a particular field (sort key).
• Hashed file – uses a hash function applied to a
particular field (hash key) to determine a record’s
placement on disk.
• B-trees – uses tree structures.
Introduction
• A secondary organization or auxiliary access
structure allows efficient access to the records
of a file based on alternate fields than those
that have been used for the primary file
organization.
• Most of these exist as indexes.
Files
• A file is a sequence of records.
• In many cases, all records in a file are of the
same record type.
• If every record in the file has exactly the same
size (in bytes), the file is said to be made up of
fixed-length records.
• If different records in the file has have different
sizes, the file is said to be made up of variable-
length records.
Variable length records
• A file may have variable-length records for
several reasons.
1. One or more of the fields are of varying size.
(variable length fields) Ex: NAME field
2. One or more field may have multiple values
for individual records.
3. One or more fields are optional.
Hashing Technology
• Provides very fast access to records on certain
search conditions.
• This organization is usually called a hash file.
• The search condition must be an equality
condition on a single field, called the hash field
of the file.
• In most cases, the hash field is also a key field
of the file, in which case it is called the hash
key.
Hashing Technology
• The idea behind hashing is to provide a
function h, called a hash function or
randomization function.
• Hash function applied to the hash field value
of a record and yields the address of the disk
block in which the record is stored.
Hashing Technology
Name ENO JOB Salary
0
1
2
3
……………………………….
……………………………….
……………………………….
……………………………….
M-2
M-1
Hashing Technology
• Hashing is typically implemented as a hash
table through the use of an array of records.
• Assume array index range is from 0 to M-1;
then we have M slots whose addresses
corresponds to the array indexes.
• We choose a hash function that transforms
the hash field value into an integer between 0
and M-1. Eg: h(K) = K mod M.
Collision
• Most hashing functions is that they do not
guarantee that distinct values will hash to
distinct addresses.
• Hash field space – the number of possible
values a hash field can take is usually much
larger than the address space – the number of
available addresses for records.
• The hashing function maps the hash field
space to the address space.
Collision
• A collision occurs when the hash field value of
a record that is being inserted hashes to an
address that already contains a different
record.
• The process of finding another position is
called collision resolution.
Collision resolution methods
1. Open Addressing
2. Chaining
3. Multiple Hashing
Blocking
• The records of a file must be allocated to disk
blocks because a block is the unit of data
transfer between disk and memory.
• Blocking: refers to storing a number of records
in one block on the disk.
• Blocking factor ( bfr ) refers to the number of
records per block.
Blocking
• Blocking factor bfr = B/r
B - block size (bytes)
r - record length (bytes) maximum no. of
records that can be stored in a block.

• Normally B may not divide by r exactly,

therefore there is unused space in each block.
Indexing Structures
• We assume that a file already exists with some
primary organization such as the unordered,
ordered, or hashed organizations.
• We will describe additional auxiliary access
structure called indexes.
• Used to speed up the retrieval of records in
response to certain search conditions.
Indexing Structures
• The Indexing Structures typically provide
secondary access paths, which provide
alternative ways of accessing the records
without affecting the physical placement of
records on disk.
• They enable efficient access to records based
on the indexing fields that are used to
construct the index.
Indexing Structures
• Any field of the file can be used to create an
index and multiple indexes on different fields
can be constructed on the same file.
• Indexes based on ordered files – single – level
indexes
• Indexes based on tree data structures –
multilevel indexes, B+ trees
Indexing Structures
• Single-level ordered indexes.
primary
secondary
clustering
• By viewing a single-level index as an ordered
file, one can develop additional indexes for it,
giving rise to the concept of multilevel
indexes.
Single-level Ordered Indexes
• Ordered index access structure is similar to
that behind the index used in textbook, which
lists important terms at the end of the book in
alphabetical order along with a list of page
numbers where the term appears in the book.
Single-level Ordered Indexes
• An indexing access structure is usually defined
on a single field or a file, called an indexing
field.
• The index typically stores each value of the
index field along with a list of pointers to all
disk blocks that contain records with that field
value.
• The values in the index are ordered so that we
can do a binary search on the index.
Single-level Ordered Indexes
• The index file is much smaller than the data
file, so searching the index using a binary
search is reasonably efficient.
• Types of single-level indexes
1. Primary indexes
2. Clustering indexes
3. Secondary indexes
Single-level Ordered Indexes
OrderID CustID Value Date
001 1 1000 2023-01-10
002 2 1500 2023-02-15
003 1 2000 2023-01-17
………. 3
………. 1
………. 1
020 2 2750 2023-02-10
Single-level Ordered Indexes
• SELECT * FROM Orders WHERE
CustID = 1 ORDER BY Date;
• EXPLAIN SELECT * FROM Orders
WHERE CustID = 1 ORDER BY Date;
• CREATE INDEX CustIndex ON
Orders(CustID);
• EXPLAIN SELECT * FROM Orders
WHERE CustID = 1 ORDER BY Date;
Primary Indexes
• Ordered file whose records are of fixed length
with two fields.
• First field – ordering key field (primary key).
• Second field – pointer to the disk block (a
block address).
• There is one index entry in the index file for
each block in the data file.
Primary Indexes
• Each index entry has the value of the primary
key field for the first record in a block and a
pointer to that block as its two field values.
• Index entry i as <k(i), P(i) >
Primary Indexes
Primary Indexes
• We use the NAME field to create a primary index
on the ordered file.
• Assume that each value of NAME is unique.
• Each entry in the index has a NAME value and a
pointer.
<K(1) = (Abbas), P(1) = address of block 1>
<K(2) = (Agarkar), P(2) = address of block 2>
<K(3) = (Akthar), P(3) = address of block 3>
Primary Indexes
• The total number of entries in the index is the
same as the number of disk blocks in the
ordered data file.
• The first record in each block of the data file is
called the anchor record of the block, or
simply the block anchor.
• Indexes can also be characterized as dense or
sparse.
Primary Indexes
• Dense index – has an index entry for every
search key value in the data file.
• Sparse index – has index entries for only some
of the search values.
• Primary index is a sparse (nondense) index. It
includes an entry for each disk block of the
data file and the keys of its anchor record.
Primary Indexes
• Index file for a primary index needs
substantially fewer blocks than does the data
file, for two reasons.
1. There are fewer index entries than there are
records in the data file.
2. Each index entry is typically smaller in size
than a data record because it has only two
fields.
Primary Indexes
• A binary search on the index file hence requires
fewer block accesses than a binary search on the
data file.
• Binary search for an ordered data file required
log2b block accesses.
• Primary index file contains bi blocks required total
of log2bi + 1 accesses.
• This is to locate a record with a search key and
access to the block.
Average Access Time for Basic File
organizations

Type Access Method Average time to access a

specific record
Heap (Unordered) Linear Search b/2
Ordered Linear Search b/2
Ordered Binary Search log2b
Primary Indexes – Example 1
Suppose that we have an ordered file with r = 30000 records stored on a disk
with block size B = 1024 bytes. File records are of fixed size and are unspanned
with record length R = 100 bytes. How many block accesses are required to
search a record on the data file?

The blocking factor for the file bfr = L(B/ R)˩

= L (1024/100 )˩
= 10 records per block.
The number of blocks needed for the file is b = ⌈(r/ bfr) ⌉
= ⌈(30,000/ 10)⌉= 3000 blocks.
A binary search on the data file would need approximately
= ⌈log2b⌉
= ⌈log23000⌉ = 12 block accesses
Primary Indexes – Example 1
Now suppose that the ordering key field of the file is V = 9 bytes
long, a block pointer is P = 6 bytes long, and we have constructed
a primary index for the file. How many block accesses are
required to search a record using the index?

The size of each index entry is Ri= (9+6) = 15 bytes, so the

blocking factor for the index is bfri = L(B/Ri)˩ = L(1024/15)˩ = 68
entries per block.

The total number of index entries ri is equal to the number of

blocks in the data file, which is 3000.
Primary Indexes – Example 1
• The number of index blocks is hence bi =r/bfri
= (3000/68) = 45 blocks .
• To perform a binary search on the index file
would need log2bi = (log245) = 6 block accesses.
• To search for a record using the index, we need
one additional block access to the data file for a
total of 6 + 1 = 7 block accesses -- an
improvement over binary search on the data file,
which required 12 block accesses.
Limitations of Primary indexes
• Insertion and deletion of records.
• If we attempt to insert a record in its correct
position in the data file, we have to not only
move records to make space for new record
but also change some index entries, since
moving records will change the anchor records
of some blocks.
Summary
Indexing is used to optimize the performance
of a database by minimizing the number of
disk accesses required when a query is
processed.
The index is a type of data structure.
It is used to locate and access the data in a
database table quickly.
Summary
• Index structure:

Indexes can be created using some database

columns.
Summary

 The first column of the database is the search key that

contains a copy of the primary key or candidate key of the
table.
 The values of the primary key are stored in sorted order
so that the corresponding data can be accessed easily.

 The second column of the database is the data

reference. It contains a set of pointers holding the
address of the disk block where the value of the
particular key can be found.
Indexing Methods
Indexing Methods
• Primary Index − Primary index is defined on
an ordered data file. The data file is ordered
on a key field. The key field is generally the
primary key of the relation.

• Clustering Index − Clustering index is defined

on an ordered data file. The data file is
ordered on a non-key field.
Indexing Methods
• The primary index can be classified into two
types: Dense index and Sparse index.
• The dense index contains an index record for
every search key value in the data file. It
makes searching faster.
• In the sparse index data file, index record
appears only for a few items. Each item points
to a block.
Indexing Methods
Dense index

The dense index contains an index record for every

search key value in the data file. It makes searching
faster.
In this, the number of records in the index table is
same as the number of records in the main table.
It needs more space to store index record itself. The
index records have the search key and a pointer to
the actual record on the disk.
Indexing Methods
Dense index
Indexing Methods
Sparse index

In the data file, index record appears only for

a few items. Each item points to a block.
In this, instead of pointing to each record in
the main table, the index points to the records
in the main table in a gap.
Indexing Methods
Sparse index
Indexing Methods
Clustering Index
A clustered index can be defined as an
ordered data file. Sometimes the index is
created on non-primary key columns which
may not be unique for each record.
In this case, to identify the record faster, we
will group two or more columns to get the
unique value and create index out of them.
This method is called a clustering index.
Indexing Methods
Clustering Index

Example: suppose a company contains several employees in

each department. Suppose we use a clustering index, where all
employees which belong to the same Dept_ID are considered
within a single cluster, and index pointers point to the cluster as
a whole. Here Dept_Id is a non-unique key.
Indexing Methods
Clustering Index

1 Wire Command Set
No ratings yet
1 Wire Command Set
46 pages
Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
No ratings yet
Chapter 17 Disk Storage, Basic File Structures, and Hashing Disk Storage Devices
10 pages
Course Design Dressmaking NCII
No ratings yet
Course Design Dressmaking NCII
81 pages
l3 Phono Stage v2
100% (1)
l3 Phono Stage v2
110 pages
The Internet of Things: Architecture and Applications (ELEC423)
No ratings yet
The Internet of Things: Architecture and Applications (ELEC423)
48 pages
Smart Scale P2 Pro - EU - Manual
No ratings yet
Smart Scale P2 Pro - EU - Manual
60 pages
Chapter - 8 1 97
No ratings yet
Chapter - 8 1 97
97 pages
ActivClient WIN UserGuide
No ratings yet
ActivClient WIN UserGuide
84 pages
SAP HANA 2.0 Cockpit Central Release Note
No ratings yet
SAP HANA 2.0 Cockpit Central Release Note
4 pages
Compilation Techniques
No ratings yet
Compilation Techniques
15 pages
It Project Class X 22-23
No ratings yet
It Project Class X 22-23
36 pages
ECE 546 - VLSI Systems Design Lecture 16: SRAM: Fall 2012 W. Rhett Davis NC State University
No ratings yet
ECE 546 - VLSI Systems Design Lecture 16: SRAM: Fall 2012 W. Rhett Davis NC State University
24 pages
File Structures Indexing Kopyası
No ratings yet
File Structures Indexing Kopyası
76 pages
Brocade 300 8 GB Fibre Channel Switch Up To 24 Ports: Issue
No ratings yet
Brocade 300 8 GB Fibre Channel Switch Up To 24 Ports: Issue
3 pages
CV Porto Vickyab - Compressed
No ratings yet
CV Porto Vickyab - Compressed
8 pages
Chapter 11. File Organisation and Indexes
No ratings yet
Chapter 11. File Organisation and Indexes
56 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
CSE-113: Structured Programming Language CSE-114: Structured Programming Language Lab
No ratings yet
CSE-113: Structured Programming Language CSE-114: Structured Programming Language Lab
21 pages
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
No ratings yet
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
32 pages
CHAPTER THREE Edited Handout
No ratings yet
CHAPTER THREE Edited Handout
13 pages
DBMS Chapter 4 Record Organization and Dile Management
No ratings yet
DBMS Chapter 4 Record Organization and Dile Management
36 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Value Chain Analysis Thesis PDF
100% (2)
Value Chain Analysis Thesis PDF
5 pages
History of File Structures
No ratings yet
History of File Structures
26 pages
DBMS - R2017 - Anna University
No ratings yet
DBMS - R2017 - Anna University
20 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Unit 4 Chapter 1 Storage and Querying
No ratings yet
Unit 4 Chapter 1 Storage and Querying
37 pages
Imperva - SecureD Data Protection v1.5 HSL v1.2
No ratings yet
Imperva - SecureD Data Protection v1.5 HSL v1.2
32 pages
Chetan CV
No ratings yet
Chetan CV
2 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
Unit 5
No ratings yet
Unit 5
185 pages
Ch17Notes Indexing Structures For Files
No ratings yet
Ch17Notes Indexing Structures For Files
39 pages
SingleLevelIndexing Examples
No ratings yet
SingleLevelIndexing Examples
24 pages
Indexing
No ratings yet
Indexing
62 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
PL 400notes230926
No ratings yet
PL 400notes230926
104 pages
CO3 Notes Indexing
No ratings yet
CO3 Notes Indexing
11 pages
File Org
No ratings yet
File Org
10 pages
Authentic - V1?
No ratings yet
Authentic - V1?
5 pages
Solution 3
No ratings yet
Solution 3
7 pages
Indexing Dbms
No ratings yet
Indexing Dbms
22 pages
Dbms Unit III Notes
No ratings yet
Dbms Unit III Notes
27 pages
Byu Pathway Online Degree Maps
No ratings yet
Byu Pathway Online Degree Maps
12 pages
9 Files, Indices and Database Tuning
No ratings yet
9 Files, Indices and Database Tuning
17 pages
Elmasri Storage Hashing
No ratings yet
Elmasri Storage Hashing
27 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
File Organization
No ratings yet
File Organization
41 pages
Single-Level Ordered Indexes
No ratings yet
Single-Level Ordered Indexes
12 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Parallel Query Processing in PostgreSQL
No ratings yet
Parallel Query Processing in PostgreSQL
15 pages
FALLSEM2024-25 BCSE302L TH VL2024250101553 2024-09-02 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE302L TH VL2024250101553 2024-09-02 Reference-Material-I
48 pages
Single Level Indexing
No ratings yet
Single Level Indexing
9 pages
Chapter 3
No ratings yet
Chapter 3
50 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Web Development Guide
No ratings yet
Web Development Guide
5 pages
DSTN Merged - CSI ZC447 ES ZC447IS ZC447SS ZC447 CH 12-14
No ratings yet
DSTN Merged - CSI ZC447 ES ZC447IS ZC447SS ZC447 CH 12-14
23 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
Indexing
No ratings yet
Indexing
27 pages
A Guide To UX Design and Development: Developer's Journey Through The UX Process 1st Edition Tom Green All Chapters Instant Download
100% (5)
A Guide To UX Design and Development: Developer's Journey Through The UX Process 1st Edition Tom Green All Chapters Instant Download
66 pages
8.physical Database Design
No ratings yet
8.physical Database Design
20 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Types of Indexes
No ratings yet
Types of Indexes
9 pages
Aplikasi DB-MKG 7
No ratings yet
Aplikasi DB-MKG 7
22 pages
Index 1
No ratings yet
Index 1
25 pages
Computer Programming and Data Structures, CS-322
No ratings yet
Computer Programming and Data Structures, CS-322
3 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
No ratings yet
ZTNA - Cloudflare Access - Product-Overview 2024 Q2 EN
7 pages
A Transfer Alignment Algorithm Study Based On Actual Flight Test Data From A Tactical Air-To-Ground Weapon Launch
No ratings yet
A Transfer Alignment Algorithm Study Based On Actual Flight Test Data From A Tactical Air-To-Ground Weapon Launch
8 pages
File Organization
No ratings yet
File Organization
11 pages
Cyber Security Lab
No ratings yet
Cyber Security Lab
19 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
Lec 5DB
No ratings yet
Lec 5DB
40 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Indexing
No ratings yet
Indexing
89 pages
Indexing
No ratings yet
Indexing
53 pages
09 FIle
No ratings yet
09 FIle
22 pages
Cola2 Manual
No ratings yet
Cola2 Manual
29 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
Smart Lock System Project Report
No ratings yet
Smart Lock System Project Report
2 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
Week 15 Physical Database Design Index - CH 17 Updated
No ratings yet
Week 15 Physical Database Design Index - CH 17 Updated
35 pages
Indexing
No ratings yet
Indexing
41 pages

File Organizations and Indexes

Uploaded by

File Organizations and Indexes

Uploaded by

File Organizations and Indexes

• Normally B may not divide by r exactly,

Type Access Method Average time to access a

The blocking factor for the file bfr = L(B/ R)˩

The size of each index entry is Ri= (9+6) = 15 bytes, so the

The total number of index entries ri is equal to the number of

Indexes can be created using some database

 The first column of the database is the search key that

 The second column of the database is the data

• Clustering Index − Clustering index is defined

The dense index contains an index record for every

In the data file, index record appears only for

Example: suppose a company contains several employees in

You might also like