0% found this document useful (0 votes)

97 views48 pages

Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)

The document discusses different file organization techniques for databases including unordered files, ordered files, hashed files, and the average access times for each. It also covers topics like blocking, disk storage devices, and basic operations on files.

Uploaded by

Amali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views48 pages

Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)

Uploaded by

Amali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

IT6405: Database

Systems II
BIT – 3rd Year
Semester 6
IT6405: Database Systems II – Topic 2

Learning Outcome
After successful completion of this
course students will be able to:
– create stored procedures and triggers
– describe data storage & access and
manipulate query processing techniques
– demonstrate transaction processing
techniques of database systems
– determine designs for distributed databases

Outline of Syllabus
1. Stored Procedures and Triggers
2. Data Storage and Querying
3. Transaction Management
4. Distributed Databases

References
1. Elmasri, Navathe, Somayajulu, and Gupta,
“Fundamentals of Database Systems”, 5th Edition,
Pearson Education (2008)
Note: 6th Edition released in 2011
2. Silberschatz A., Korth H.F. and Sudarshan S., “Database
System Concepts”, 5th Edition, McGraw Hill (2006).
Note: 6th Edition released in 2010
3. Ramakrishnan, Gehrke, “Database Management
Systems”, 3rd edition, McGraw Hill

IT6405: Database Systems II

Data Storage, Indexing

Duration: 15 hours

File Organization and Storage

Structures
Primary Storage (Main Memory)
• Fast
• Volatile
• Expensive

Secondary Storage (Files in disks or tapes)

• Non-Volatile

Disk Storage Devices

• Preferred secondary storage device for high
storage capacity and low cost.
• Data stored as magnetized areas on magnetic
disk surfaces.
• A disk pack contains several magnetic disks
connected to a rotating spindle.
• Disks are divided into concentric circular
tracks on each disk surface. Track capacities
vary typically from 4 to 50 Kbytes.

Disk Storage Devices

• Since a track usually contains a large amount of
information, it is divided into smaller blocks or
sectors.

• The block size B is fixed for each system.

• Typical block sizes range from B=512 bytes to

B=4096 bytes. Whole blocks are transferred
between disk and main memory for processing.

Disk Storage Devices

• A read-write head moves to the track that contains
the block to be transferred.
• Disk rotation moves the block under the readwrite
head for reading or writing.
• Reading or writing a disk block is time consuming
because of the seek time s and rotational delay
(latency) rd.

Blocking
• Blocking: refers to storing a number of records in
one block on the disk.
• Blocking factor (bfr) refers to the number of
records per block.
• There may be empty space in a block if an integral
number of records do not fit in one block.

Files of Records
• A file is a sequence of records, where each record is a
collection of data values (or data items).

• A file descriptor (or file header ) includes information

that describes the file, such as the field names and
their data types, and the addresses of the file blocks
on disk.

• Records are stored on disk blocks. The blocking

factor bfr for a file is the (average) number of file
records stored in a disk block.

Operation on Files
• OPEN: Readies the file for access, and associates
a pointer that will refer to a current file record at
each point in time.
• FIND: Searches for the first file record that satisfies
a certain condition, and makes it the current file
record.
• FINDNEXT: Searches for the next file record (from
the current record) that satisfies a certain condition,
and makes it the current file record.
• READ: Reads the current file record into a
program variable.
• INSERT: Inserts a new record into the file, and makes
it the current file record.

Operation on Files
• DELETE: Removes the current file record from the
file, usually by marking the record to indicate that it
is no longer valid.
• MODIFY: Changes the values of some fields of the
current file record.
• CLOSE: Terminates access to the file.
• REORGANIZE: Reorganizes the file records. For
example, the records marked deleted are physically
removed from the file or a new organization of the
file records is created.
• READ_ORDERED: Read the file blocks in order of
a specific field of the file.

Unordered Files
• Also called a heap or a pile file.
• New records are inserted at the end of the file.
• To search for a record, a linear search through
the file records is necessary. This requires
reading and searching half the file blocks on the
average, and is hence quite expensive.
• Record insertion is quite efficient.
• To delete a record, the record is marked as
deleted. Space is reclaimed during periodical
reoganization.

Ordered Files
• Also called a sequential file.
• File records are kept sorted by the values of an
ordering field.
• Insertion is expensive: records must be inserted in the
correct order.
• A binary search can be used to search for a record on its
ordering field value. This requires reading and searching
log2 of the file blocks on the average, an improvement
over linear search.
• Reading the records in order of the ordering field is
quite efficient.

Ordered Files

10, University of Colombo School of Computing

Average Access Times

The following table shows the average access time to
access a specific record for a given type of file

Hashed Files
• The file blocks are divided into M equal-sized
buckets, numbered bucket0, bucket1, ..., bucket M-1.

• One of the file fields is designated to be the hash key

of the file.

• The record with hash key value K is stored in bucket

i, where i=h(K), and h is the hashing function.

• Search is very efficient on the hash key.

• Collisions occur when a new record hashes to a

bucket that is already full. An overflow file is kept for
storing such records.

Hashed Files
• There are numerous methods for collision
resolution, including the following:

Open addressing: Proceeding from the occupied

position specified by the hash address, the program
checks the subsequent positions in order until an
unused (empty) position is found.

Chaining: A collision is resolved by placing the new

record in an unused overflow location and setting
the pointer of the occupied hash address location to
the address of that overflow location.

Multiple hashing: The program applies a second

hash function if the first results in a collision.

Hashed Files
• The hash function h should distribute the records
uniformly among the buckets; otherwise, search
time will be increased because many overflow
records will exist.
• Main disadvantages of static hashing:
Fixed number of buckets M is a problem if the
number of records in the file grows or shrinks.

Hashed Files

Hashed Files Limitation

• Inappropriate for some retrievals:
• based on pattern matching
• eg. Find all students with ID like 98xxxxxx.
• Involving ranges of values
•eg. Find all students from 50100000 to
50199999.
• Based on a field other than
the hash field

Indexes
• Index: A data structure that allows particular records in a
file to be located more quickly
~ Index in a book

• An index can be sparse or dense:

– Sparse: record for only some of the search key

values (eg. Staff Ids: CS001, EE001, MA001).
Applicable to ordered data files only.
– Dense: record for every search key value. (eg.
Staff Ids: CS001, CS002, .. CS089, EE001,
EE002, ..)

Indexes
• Data file: a file containing the logical
records
• Index file: a file containing the index
records
• Indexing field: the field used to order the
index records in the index file

Dense Index

–The index is usually specified on one

field of the file (although it could be
specified on several fields)
–One form of an index is a file of
entries <field value, pointer to
record>, which is ordered by field
value
– The index is called an access path
on the field.

of Colombo School of Computing

Sparse Index

Primary Index
• Defined on an ordered data file.

• The data file is ordered on a key field.

• Includes one index entry for each block in the data file;
the index entry has the key field value for the first
record in the block, which is called the block anchor.

• A primary index is a nondense (sparse) index, since it

includes an entry for each disk block of the data file and
the keys of its anchor record rather than for every
search value.

© 2010, University of Colombo School of Computing

Clustering Index
• Defined on an ordered data file

• The data file is ordered on a non-key field unlike primary

index, which requires that the ordering field of the data
file have a distinct value for each record.

• Includes one index entry for each distinct value of

the field; the index entry points to the first data block
that contains records with that field value.

• It is another example of nondense index.

Secondary Index
• A secondary index provides a secondary means of
accessing a file for which some primary access already
exists.

• The secondary index may be on a field which is a candidate

key and has a unique value in every record, or a non-key with
duplicate values.

• The index is an ordered file with two fields.

• The first field is of the same data type as some non-

ordering field of the data file that is an indexing field.

• The second field is either a block pointer or a record

pointer.

Secondary Index
• There can be many secondary indexes (and
hence, indexing fields) for the same file.

• Includes one entry for each record in the data file;

hence, it is a dense index.

Multi-Level Indexes
• Since a single-level index is an ordered file, we
can create a primary index to the index itself;
• In this case, the original index file is called the first-level
index and the index to the index is called the
second- level index.
• We can repeat the process, creating a third, fourth,
..., top level until all entries of the top level fit in one
disk block.
• A multi-level index can be created for any type of first
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block.

Multi-Level Indexes
• Such a multi-level index is a form of search tree.

• However, insertion and deletion of new index

entries is a severe problem because every level
of the index is an ordered file.

Dynamic Multilevel Indexes Using B+-

Trees
• Most multi-level indexes use B+-tree data
structure because of the insertion and deletion problem
• This leaves space in each tree node (disk block) to
allow for new index entries
• The data structure is a variation of search trees
that allow efficient insertion and deletion of new
search values.
• In B+-Tree data structure, each node
corresponds to a disk block.
• Each node is kept between half-full and completely full

Dynamic Multilevel Indexes Using B+-Trees

• An insertion into a node that is not full is quite

efficient.

• If a node is full the insertion causes a split into two

nodes.

• Splitting may propagate to other tree levels

Dynamic Multilevel Indexes Using B+-Trees

• A deletion is quite efficient if a node does not

become less than half full.
• If a deletion causes a node to become less than
half full, it must be merged with neighboring
nodes.

B+ tree
The structure of the internal nodes of a B+ tree
of order p is as follows:
• Each internal node is of the form
<P1,K1,P2, K2…..,Kq-1,Pq-1,Pq>
where q ≤ p. Each Pi is a tree pointer.
• Within each node K1 < K2 < ….<Kq-1
• Each node has at most p tree pointers.
• Each node with q tree pointers, q ≤ p, has q-1
search key field values.

B+ tree
The structure of the leaf nodes of a B+ tree of
order p is as follows:
• Each leaf node is of the form
<K1,Pr1>,<K2,Pr2>,…..,<Kq-1,Prq-1>,Pnext>
where q ≤ p. Each Pri is a data pointer. Pnext
points to the next leaf node of the B+ tree.
• Within each node K1 < K2 < ….<Kq-1
• All leaf nodes are at the same level.

Difference between B-tree and B+-tree

• In a B-tree, pointers to data records exist at all

levels of the tree.

• In a B+-tree, all pointers to data records exists at

the leaf-level nodes.

• A B+-tree can have less levels (or higher capacity

of search values) than the corresponding B-tree.

Pe s4hc PR Dd2 Wa
No ratings yet
Pe s4hc PR Dd2 Wa
8 pages
Top 100 Os Ccee MCQ Explain
No ratings yet
Top 100 Os Ccee MCQ Explain
43 pages
Aeropuerto de Pasto Antonio Nariño
No ratings yet
Aeropuerto de Pasto Antonio Nariño
1 page
Smart Lock System Project Report
No ratings yet
Smart Lock System Project Report
2 pages
Cola2 Manual
No ratings yet
Cola2 Manual
29 pages
होरारत्नम् at DuckDuckGo
No ratings yet
होरारत्नम् at DuckDuckGo
3 pages
Seven Segment Decoder
No ratings yet
Seven Segment Decoder
4 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
From Everand
Bash Shell from Zero to Hero: An SRE's Practical Guide to Terminal Skills, Scripting, and Automation
Nolan Reeves
No ratings yet
Computer Programming and Data Structures, CS-322
No ratings yet
Computer Programming and Data Structures, CS-322
3 pages
Arrays
No ratings yet
Arrays
11 pages
Analyzing Malicious Documents Cheat Sheet
No ratings yet
Analyzing Malicious Documents Cheat Sheet
7 pages
Week1Lecture 158702
No ratings yet
Week1Lecture 158702
39 pages
Computer Image Corporation Brochure
No ratings yet
Computer Image Corporation Brochure
8 pages
Catalogo
No ratings yet
Catalogo
37 pages
Python Full Stack
No ratings yet
Python Full Stack
37 pages
Cisco Webex Rooms Brochure
No ratings yet
Cisco Webex Rooms Brochure
22 pages
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet
Inf214 2024 W1L1
No ratings yet
Inf214 2024 W1L1
26 pages
Design of Power-Efficient High-Speed 4-Bit Compara
No ratings yet
Design of Power-Efficient High-Speed 4-Bit Compara
8 pages
UNIT 4 Updated - 121124
No ratings yet
UNIT 4 Updated - 121124
52 pages
WINSEM2024-25 CBS1003 ETH VL2024250505129 2025-04-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CBS1003 ETH VL2024250505129 2025-04-08 Reference-Material-I
12 pages
IGCSEFM Factorisation
No ratings yet
IGCSEFM Factorisation
10 pages
HF-3 Instruction Manual
No ratings yet
HF-3 Instruction Manual
11 pages
BIT - Semester 6 - University of Colombo - DB Systems 2
No ratings yet
BIT - Semester 6 - University of Colombo - DB Systems 2
96 pages
Bookmap Masterclass Basic and Advanced Englunlockeda4 PDF Free Pages 3 - Compressed
No ratings yet
Bookmap Masterclass Basic and Advanced Englunlockeda4 PDF Free Pages 3 - Compressed
70 pages
Multimedia SYsytem Unit 1
No ratings yet
Multimedia SYsytem Unit 1
20 pages
BIT - University of Colombo - Fundamentals of DB Systems
No ratings yet
BIT - University of Colombo - Fundamentals of DB Systems
41 pages
Database Systems - BIT University of Colombo
No ratings yet
Database Systems - BIT University of Colombo
47 pages
Sslmicf9quarter3week1 779332714012337
No ratings yet
Sslmicf9quarter3week1 779332714012337
4 pages
Database Systems - BIT - University of Colombo
No ratings yet
Database Systems - BIT - University of Colombo
31 pages
BIT University of Colombo - Middleware Architecture Lesson 5
No ratings yet
BIT University of Colombo - Middleware Architecture Lesson 5
18 pages
BIT University of Colombo - Middleware Architecture Lesson 4
No ratings yet
BIT University of Colombo - Middleware Architecture Lesson 4
18 pages
BIT University of Colombo - Middleware Architecture Lesson 1
No ratings yet
BIT University of Colombo - Middleware Architecture Lesson 1
17 pages
Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 7)
No ratings yet
Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 7)
23 pages
Data Storage, Indexing Structures For Files
No ratings yet
Data Storage, Indexing Structures For Files
83 pages
BIT University of Colombo - Middleware Architecture Lesson 3
No ratings yet
BIT University of Colombo - Middleware Architecture Lesson 3
15 pages
Unit I - Database Management System
No ratings yet
Unit I - Database Management System
77 pages
Chief Seattle Letter Note
No ratings yet
Chief Seattle Letter Note
1 page
English Literature and Language Sample Paper Lyceum International School
No ratings yet
English Literature and Language Sample Paper Lyceum International School
5 pages
Chief Seattle's Letter
No ratings yet
Chief Seattle's Letter
2 pages
Homework Set No. 5, Numerical Computation: 1. Bisection Method
No ratings yet
Homework Set No. 5, Numerical Computation: 1. Bisection Method
4 pages
SERDES
No ratings yet
SERDES
47 pages
DBMS & Data Structure - BCA
No ratings yet
DBMS & Data Structure - BCA
155 pages
Unit 5 Dbms Updated
No ratings yet
Unit 5 Dbms Updated
44 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Unit-7 Indexing
No ratings yet
Unit-7 Indexing
37 pages
Leadership 1
No ratings yet
Leadership 1
43 pages
OPT B1plus Unit Test 11 Higher
No ratings yet
OPT B1plus Unit Test 11 Higher
6 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
File Organization
No ratings yet
File Organization
47 pages
CHAPTER-1.2: Example Networks
No ratings yet
CHAPTER-1.2: Example Networks
17 pages
Ifm OGD592 20180314 IODD11 en
No ratings yet
Ifm OGD592 20180314 IODD11 en
14 pages
Felix Randal - G. M. Hopkins
100% (1)
Felix Randal - G. M. Hopkins
2 pages
Chapter 1 Introduction To DB
No ratings yet
Chapter 1 Introduction To DB
50 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
File Organization in RDBMS
No ratings yet
File Organization in RDBMS
9 pages
OSY Chapter 6 SSP
No ratings yet
OSY Chapter 6 SSP
24 pages
File and Database Design
No ratings yet
File and Database Design
28 pages
File Handling
No ratings yet
File Handling
27 pages
Information Assurance and Security
No ratings yet
Information Assurance and Security
4 pages
CHAPTER 1 Final
No ratings yet
CHAPTER 1 Final
13 pages
C Co Ob Ba As S C 311 Analyzer: Experience The Benefits of Standardizing With Solutions
No ratings yet
C Co Ob Ba As S C 311 Analyzer: Experience The Benefits of Standardizing With Solutions
2 pages
CH 13
No ratings yet
CH 13
6 pages
CAIE-A2 Level-Computer Science - Theory
No ratings yet
CAIE-A2 Level-Computer Science - Theory
27 pages
Chapter 1 Slides
No ratings yet
Chapter 1 Slides
50 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
File Organization Notes
No ratings yet
File Organization Notes
21 pages
Unit 5
No ratings yet
Unit 5
185 pages
Database 2 Notes
No ratings yet
Database 2 Notes
42 pages
8 DataStorageIndexingStructures Updated
No ratings yet
8 DataStorageIndexingStructures Updated
57 pages
ADBMS Answer Bank
No ratings yet
ADBMS Answer Bank
90 pages
Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 3)
No ratings yet
Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 3)
64 pages
Lecture 01 - File Storage - Part 1
No ratings yet
Lecture 01 - File Storage - Part 1
48 pages
Indexing
No ratings yet
Indexing
62 pages
Tatsuo Nakamura - Gaijin: Notes
No ratings yet
Tatsuo Nakamura - Gaijin: Notes
4 pages
Self-Organization in Autonomous Sensor/Actuator Networks (Selforg)
No ratings yet
Self-Organization in Autonomous Sensor/Actuator Networks (Selforg)
40 pages
Hyderabad Calls
No ratings yet
Hyderabad Calls
14 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Unit 6 (22516)
No ratings yet
Unit 6 (22516)
40 pages
File Organization and Indexing: Prof P Sreenivasa Kumar Department of CS&E, IITM 1
No ratings yet
File Organization and Indexing: Prof P Sreenivasa Kumar Department of CS&E, IITM 1
23 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Layers of A DBMS: Query Optimization Query Processor Query
No ratings yet
Layers of A DBMS: Query Optimization Query Processor Query
15 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
DBMS Indexing and Storage
No ratings yet
DBMS Indexing and Storage
53 pages
6 Data Storage and Querying
100% (1)
6 Data Storage and Querying
58 pages
8 9day
No ratings yet
8 9day
23 pages
Dbms Notes
100% (1)
Dbms Notes
84 pages
Unit 1 Introduction To Dbms
No ratings yet
Unit 1 Introduction To Dbms
27 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
90 pages
Physical Data Organization: Department of Computer Science
No ratings yet
Physical Data Organization: Department of Computer Science
18 pages
Introduction To Databases: Course Introduction A Review of Database Concepts
No ratings yet
Introduction To Databases: Course Introduction A Review of Database Concepts
43 pages
Managing Data Resources
No ratings yet
Managing Data Resources
21 pages
Lecture No. 1 PDF
No ratings yet
Lecture No. 1 PDF
57 pages
The Bare Basics: Storing Data On Disks and Files
No ratings yet
The Bare Basics: Storing Data On Disks and Files
33 pages
Data Storage and Access Methods: Min Song IS698
No ratings yet
Data Storage and Access Methods: Min Song IS698
50 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
DBMS Notes PDF
No ratings yet
DBMS Notes PDF
38 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
58 pages

Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)

Uploaded by

Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)

Uploaded by

IT6405: Database

IT6405: Database Systems II

Data Storage, Indexing

File Organization and Storage

Secondary Storage (Files in disks or tapes)

Disk Storage Devices

Disk Storage Devices

• The block size B is fixed for each system.

• Typical block sizes range from B=512 bytes to

Disk Storage Devices

Disk Storage Devices

• A file descriptor (or file header ) includes information

• Records are stored on disk blocks. The blocking

10, University of Colombo School of Computing

Average Access Times

• One of the file fields is designated to be the hash key

• The record with hash key value K is stored in bucket

• Search is very efficient on the hash key.

• Collisions occur when a new record hashes to a

Open addressing: Proceeding from the occupied

Chaining: A collision is resolved by placing the new

Multiple hashing: The program applies a second

Hashed Files Limitation

• An index can be sparse or dense:

– Sparse: record for only some of the search key

–The index is usually specified on one

of Colombo School of Computing

• The data file is ordered on a key field.

• A primary index is a nondense (sparse) index, since it

© 2010, University of Colombo School of Computing

• The data file is ordered on a non-key field unlike primary

• Includes one index entry for each distinct value of

• It is another example of nondense index.

© 2010, University of Colombo School of Computing

© 2010, University of Colombo School of Computing

• The secondary index may be on a field which is a candidate

• The index is an ordered file with two fields.

• The first field is of the same data type as some non-

• The second field is either a block pointer or a record

• Includes one entry for each record in the data file;

© 2010, University of Colombo School of Computing

© 2010, University of Colombo School of Computing

© 2010, University of Colombo School of Computing

• However, insertion and deletion of new index

Dynamic Multilevel Indexes Using B+-

Dynamic Multilevel Indexes Using B+-Trees

• An insertion into a node that is not full is quite

• If a node is full the insertion causes a split into two

• Splitting may propagate to other tree levels

Dynamic Multilevel Indexes Using B+-Trees

• A deletion is quite efficient if a node does not

© 2010, University of Colombo School of Computing

Difference between B-tree and B+-tree

• In a B-tree, pointers to data records exist at all

• In a B+-tree, all pointers to data records exists at

• A B+-tree can have less levels (or higher capacity

© 2010, University of Colombo School of Computing

You might also like