0% found this document useful (0 votes)

66 views19 pages

Lecture 16

This document provides an overview of storage and indexing concepts. It discusses different file organizations like heap files, sorted files, and indexed files. It also covers different types of indexes like hash-based and tree-based indexes. The document compares the costs of common operations like scans, searches, inserts and deletes on different file organizations and indexes. It provides assumptions used for the cost analysis and comparisons.

Uploaded by

Chandrika Surya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views19 pages

Lecture 16

Uploaded by

Chandrika Surya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Overview of Storage and Indexing

Chapter 8

Comp 521 – Files and Databases Fall 2016 1

Data on External Storage
 Solid State Disks, Secure Digital (SD) non-volatile memory:
 Block addressable storage device, relatively symmetric R/W speeds,
Access latency
 Disks: Can retrieve random page at fixed cost
 But reading consecutive pages is much cheaper than
reading them in random order
 Tapes: Can only read pages sequentially
 Cheaper than disks; used for archival storage
 File organization: Method of arranging a file of records on external
storage.
 Record id (rid) is sufficient to physically locate record
 Indexes are data structures that allow us to find the record ids of
records with given values in index search key fields
 Architecture: Buffer manager stages pages from external storage to
main memory buffer pool. File and index layers make calls to the
buffer manager.

Comp 521 – Files and Databases Fall 2016 2

Alternative File Organizations
Many alternatives exist, each ideal for some
situations, and not so good in others:
 Heap (random order) files: Suitable when typical
access is a file scan retrieving all records.
 Sorted Files: Best if records must be retrieved in
some order, or only a `range’ of records is needed.
 Indexes: Data structures to organize records via
trees or hashing.
• Like sorted files, they speed up searches for a subset of
records, based on values in certain (“search key”) fields
• Updates are much faster than in sorted files.

Comp 521 – Files and Databases Fall 2016 3

Indexes
 An index is an axillary data structure that
speeds up selections on the search key fields of
the index.
 Any subset of attributes from a relation can be a
search key.
 Search key is not necessarily a relation key (a set of
fields that uniquely identify a tuple in a relation).
 An index contains a collection of data entries,
and supports efficient retrieval of all data
entries k* with a given key value k.
 Given data entry k*, we can find record with key k
in at most one disk I/O. (Details soon …)

Comp 521 – Files and Databases Fall 2016 4

Hash-Based Index
 Place all records with a common attribute
together.
 Index is a collection of buckets.
 Bucket = primary page plus zero or
more overflow pages. key
 Buckets contain data entries. H(x)

 Hashing function, r = h(key) :

Mapping from the index’s search key to a
bucket in which the (data entry for) record r
belongs.

Comp 521 – Files and Databases Fall 2016 5

Tree-Based Index

Non-leaf
Pages

Leaf
Pages
(“Ordered” by search key)

 Leaf pages contain data entries, and are chained (prev & next)
 Non-leaf pages have index entries; only used to direct searches:
index entry

P0 K 1 P1 K 2 P 2 K m Pm

Comp 521 – Files and Databases Fall 2016 6

Alternative Data/Index Organizations
 In data entry k* we store one of the following:
 The actual data record with its key k (clustered)
 <k, rid of data record with search key value k>
 <k, list of rids of data records with search key k>
 Data organization choice is independent of the
indexing method.
 Clustered indices save on accesses, but you can only
have 1 clustered index per relation
 Unclustered alternatives tradeoff uniformity of
index entries verses size considerations
 Often, indices contains auxiliary information
Comp 521 – Files and Databases Fall 2016 7
Index Classifications
 Primary vs. Secondary: If search key contains
primary key, then it is called a primary index.
 Unique index: Search key contains a candidate key.
 Clustered vs. Unclustered:
 Clustered: tuples are sorted by search key and stored
sequentially in data blocks
 A file can be clustered on at most one search key.
 Unclustered: search keys are stored with record ids
(rids) that identify the block containing the
associated tuple

Comp 521 – Files and Databases Fall 2016 8

Clustered vs. Unclustered Index
 Index type (Hash or Tree) is independent of the data’s
organization (clustered or unclustered).
 To build clustered index, we must first sort the records (perhaps
allowing for some free space on each page for future inserts).
 Later inserts might create overflow pages. Thus, eventual order
of data records is “close to”, but not identical to, the sort order.

Index entries
CLUSTERED direct search for UNCLUSTERED
data entries

Data entries Data entries

(Index Blocks)
(Data Blocks)

Data Records Data Records

Comp 521 – Files and Databases Fall 2016 9
Costs / Benefits of Indexing
 Adding an index incurs
 Storage overhead
 Maintenance overhead
 Without indexing, searching the records of a
database for a particular record would
require on average

Number of Records * Cost to read a Record * 0.5

(assumes records are in random order)

Comp 521 – Files and Databases Fall 2016 10

Cost Model for Our Analysis
We ignore CPU costs, for simplicity:
 B: The number of data pages
 R: Number of records per page
 D: (Average) time to read or write a block
 Measuring number of page I/O’s ignores gains of
pre-fetching a sequence of pages; thus, even I/O
cost is only approximated.
 Average-case analysis; based on several simplistic
assumptions.

 Good enough to show the overall trends!

Comp 521 – Files and Databases Fall 2016 11
Comparing File Organizations
 Heap file (random record order;
insert at eof)
 Sorted files, sorted on <age, sal>
 Clustered B+ tree file, clustered on search
key <age, sal>
 Heap file with unclustered B+ tree index
on search key <age, sal>
 Heap file with unclustered hash index
on search key <age, sal>

Comp 521 – Files and Databases Fall 2016 12

Operations to Compare
 Scan: Fetch all records from disk SELECT *
FROM Emp
 Equality search SELECT *
FROM Emp
 Range selection WHERE Age = 25 SELECT *
 Insert a record FROM Emp
WHERE Age > 30
 Delete a record
INSERT
INTO Emp(Name, Age, Salary)
VALUES(‘Jordan’, 49, 3000000)

DELETE
FROM Emp
WHERE Name =‘Bristow’

Comp 521 – Files and Databases Fall 2016 13

Assumptions in Our Analysis
 Heap Files:
 Equality selection is on key  exactly one match
 Sorted Files:
 Files compacted after deletions.
 Indexes:
 Search key overhead = 10% size of record
 Hash: No overflow buckets.
• 80% page occupancy => File size = 1.25 data size
 Tree: 67% occupancy (this is typical).
• Implies file size = 1.5 data size
• Tree Fan-out = F

Comp 521 – Files and Databases Fall 2016 14

Assumptions (contd.)
 Scans:
 Leaf levels of a tree-index are chained.
 Index data-entries plus actual file scanned for
unclustered indexes.
 Range searches:
 We use tree indexes to restrict the set of data
records fetched, but ignore hash indexes.

Comp 521 – Files and Databases Fall 2016 15

Cost of Operations
File Type Scan Equality Range Search Insert Delete
Search
Heap BD 0.5BD BD 2D Search + D

Sorted BD Dlog2B Dlog2B + Search + BD Search + BD

#matches
Clustered 1.5BD DlogF1.5B DlogF1.5B + Search + D Search + D
#matches
Unclustered BD(R+0.15) D(1+ D(1+logF0.15B+ D(logF0.15B) Search + 2D
tree index logF0.15B) #matches)

Unclustered BD(R+0.125) 2D BD 4D Search + 2D

hash index

 Several assumptions underlie these (rough) estimates!

We’ll cover them in the next few lectures.

Comp 521 – Files and Databases Fall 2016 16

Indexes and Workload
 For each query in the workload:
 Which relations does it access?
 Which attributes are retrieved?
 Which attributes are involved in selection/join conditions?
How selective are the conditions applied likely to be?
 For each update in the workload:
 Which attributes are involved in selection/join conditions?
How selective are these conditions likely to be?
 The type of update (INSERT/DELETE/UPDATE), and the
attributes that are affected.

Comp 521 – Files and Databases Fall 2016 17

Index-Only Plans
 Some queries <E.dno> SELECT E.dno, COUNT(*)
Index stores a
can be answered count of tuples FROM Emp E
without with the same GROUP BY E.dno
retrieving any key
tuples from one A Tree index on
SELECT E.dno, MIN(E.sal)
or more of the <E.dno,E.sal> FROM Emp E
relations would give the
GROUP BY E.dno
involved if a anwser
suitable
index is <E. age,E.sal> SELECT AVG(E.sal)
or FROM Emp E
available.
<E.sal, E.age> WHERE E.age=25 AND
Average the E.sal BETWEEN 3000 AND 5000
index keys
Comp 521 – Files and Databases Fall 2016 18
Summary
 Alternative file organizations, each suited for
different situations.
 If selection queries are frequent, data
organization and indices are important.
 Hash-based indexes
 Sorted files
 Tree-based indexes
 An index maps search-keys to associated tuples.
 Understanding the workload of an application,
and its performance goals, is essential for a
good design.
Comp 521 – Files and Databases Fall 2016 19

File Storage and Indexing Guide
No ratings yet
File Storage and Indexing Guide
13 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Database File Organization Guide
No ratings yet
Database File Organization Guide
26 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Lec 7
No ratings yet
Lec 7
34 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
File Organization and Indexing in Databases
No ratings yet
File Organization and Indexing in Databases
45 pages
V Unit
No ratings yet
V Unit
36 pages
V Unit
No ratings yet
V Unit
15 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Indexing
No ratings yet
Indexing
62 pages
Efficient File Indexing Methods
No ratings yet
Efficient File Indexing Methods
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Storage and Indexing Methods
No ratings yet
Storage and Indexing Methods
43 pages
Database Storage and Indexing
No ratings yet
Database Storage and Indexing
14 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
45 pages
Reorganizing Indexed Sequential Files
No ratings yet
Reorganizing Indexed Sequential Files
77 pages
Index 1
No ratings yet
Index 1
25 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Indexing vs Hashing in DBMS
No ratings yet
Indexing vs Hashing in DBMS
31 pages
File Organization
No ratings yet
File Organization
19 pages
DBMS Unit-5 Notes
No ratings yet
DBMS Unit-5 Notes
23 pages
File Organization and Indexing Methods
No ratings yet
File Organization and Indexing Methods
35 pages
Storage and File Management
100% (1)
Storage and File Management
16 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
RDBMS File Organization Guide
No ratings yet
RDBMS File Organization Guide
58 pages
DBMS File & Index Organization
No ratings yet
DBMS File & Index Organization
10 pages
m5 Index PDF
No ratings yet
m5 Index PDF
60 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
60 pages
Database Indexing Essentials
No ratings yet
Database Indexing Essentials
16 pages
Lesson 9 Mod2l2
No ratings yet
Lesson 9 Mod2l2
16 pages
IT3031 L06 Indexing
No ratings yet
IT3031 L06 Indexing
45 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
CSE 544: Data Storage and Indexing
No ratings yet
CSE 544: Data Storage and Indexing
52 pages
File Organization and Storage Access Guide
No ratings yet
File Organization and Storage Access Guide
185 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
File Organization
No ratings yet
File Organization
11 pages
Database Indexing Basics
No ratings yet
Database Indexing Basics
31 pages
C Language String Handling Basics
No ratings yet
C Language String Handling Basics
7 pages
NoSQL Databases UNIT-3
No ratings yet
NoSQL Databases UNIT-3
20 pages
NoSQL Database Distribution and Replication
No ratings yet
NoSQL Database Distribution and Replication
29 pages
Unit - 4 Pointers
No ratings yet
Unit - 4 Pointers
12 pages
DHTML With Javascript
No ratings yet
DHTML With Javascript
25 pages
Oop With Python Lab
No ratings yet
Oop With Python Lab
48 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
18 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
XML Structure for Book Catalog
No ratings yet
XML Structure for Book Catalog
1 page
Physics Ned 2024
No ratings yet
Physics Ned 2024
6 pages
Group 00: WWW - Caproni.bg Caproni@caproni - BG
No ratings yet
Group 00: WWW - Caproni.bg Caproni@caproni - BG
20 pages
Tri-Band Antenna Specifications
No ratings yet
Tri-Band Antenna Specifications
2 pages
Lecture 3 SQL
No ratings yet
Lecture 3 SQL
32 pages
DWDM Alarm Troubleshooting Guide
No ratings yet
DWDM Alarm Troubleshooting Guide
186 pages
TP01+02+03 MDSEC DR - LABIOD
No ratings yet
TP01+02+03 MDSEC DR - LABIOD
14 pages
Electronics Principles Review
No ratings yet
Electronics Principles Review
13 pages
A Systematic Literature Review On Hardware Implementation of Image Processing
No ratings yet
A Systematic Literature Review On Hardware Implementation of Image Processing
10 pages
Understanding Wi-Fi: Wireless Fidelity Explained
No ratings yet
Understanding Wi-Fi: Wireless Fidelity Explained
9 pages
Rapidly Transforming PTFE From Hydrophobic To Hydrophilic Through Plasma Treatment
No ratings yet
Rapidly Transforming PTFE From Hydrophobic To Hydrophilic Through Plasma Treatment
16 pages
June 2018 QP - Paper 1 OCR (A) Chemistry AS-Level
No ratings yet
June 2018 QP - Paper 1 OCR (A) Chemistry AS-Level
24 pages
Moos Ros Bridge
No ratings yet
Moos Ros Bridge
3 pages
KayCataylolesson Plan
No ratings yet
KayCataylolesson Plan
9 pages
Tutorial - 4: Practice Problems
No ratings yet
Tutorial - 4: Practice Problems
4 pages
Uncertainty in AI: Fuzzy Logic & Probability
No ratings yet
Uncertainty in AI: Fuzzy Logic & Probability
36 pages
S0300-BB-MAN-010 - Underwater Cutting
No ratings yet
S0300-BB-MAN-010 - Underwater Cutting
200 pages
7 Me 4 Steam Turbines and Steam Power Plant
No ratings yet
7 Me 4 Steam Turbines and Steam Power Plant
3 pages
Local Hero
No ratings yet
Local Hero
371 pages
QbD & AQbD in Pharma Development
No ratings yet
QbD & AQbD in Pharma Development
32 pages
Eco312 PS4
No ratings yet
Eco312 PS4
2 pages
12-RF Electronics Kikkert Ch9 MatchingPowerAmplifiers
100% (1)
12-RF Electronics Kikkert Ch9 MatchingPowerAmplifiers
20 pages
MP Module 1 - Modified
No ratings yet
MP Module 1 - Modified
15 pages
Crafting Queen Anne Cabriole Legs
No ratings yet
Crafting Queen Anne Cabriole Legs
4 pages
Lecture 1-3 Eda
No ratings yet
Lecture 1-3 Eda
129 pages
Mercedes-Benz C-Class (W203 2000-2007) Fuses and Relays
100% (4)
Mercedes-Benz C-Class (W203 2000-2007) Fuses and Relays
24 pages
Additional Math Exam Paper 2013
No ratings yet
Additional Math Exam Paper 2013
2 pages
HSC Board Practice Paper PHYSICS
No ratings yet
HSC Board Practice Paper PHYSICS
20 pages
DME 2 Design of Spur Gear
No ratings yet
DME 2 Design of Spur Gear
3 pages
Fontana FPK Mini
No ratings yet
Fontana FPK Mini
8 pages
React JS Interview Qusestions
No ratings yet
React JS Interview Qusestions
9 pages

Lecture 16

Uploaded by

Lecture 16

Uploaded by

Overview of Storage and Indexing

Comp 521 – Files and Databases Fall 2016 1

Comp 521 – Files and Databases Fall 2016 2

Comp 521 – Files and Databases Fall 2016 3

Comp 521 – Files and Databases Fall 2016 4

 Hashing function, r = h(key) :

Comp 521 – Files and Databases Fall 2016 5

Comp 521 – Files and Databases Fall 2016 6

Comp 521 – Files and Databases Fall 2016 8

Data entries Data entries

Data Records Data Records

Number of Records * Cost to read a Record * 0.5

(assumes records are in random order)

Comp 521 – Files and Databases Fall 2016 10

 Good enough to show the overall trends!

Comp 521 – Files and Databases Fall 2016 12

Comp 521 – Files and Databases Fall 2016 13

Comp 521 – Files and Databases Fall 2016 14

Comp 521 – Files and Databases Fall 2016 15

Sorted BD Dlog2B Dlog2B + Search + BD Search + BD

Unclustered BD(R+0.125) 2D BD 4D Search + 2D

 Several assumptions underlie these (rough) estimates!

Comp 521 – Files and Databases Fall 2016 16

Comp 521 – Files and Databases Fall 2016 17

You might also like