0% found this document useful (0 votes)

30 views60 pages

Index and Hashing 2017 Combined

Indexing and hashing are techniques used to improve data retrieval efficiency in databases. Indexes are data structures that allow the database management system to locate records more quickly. There are two main types of indexes: ordered indexes where search keys are stored in sorted order, and hash indexes where keys are distributed uniformly using a hash function. Files can be organized using different indexing techniques like primary, secondary, sparse, and dense indexes to support efficient querying of data.

Uploaded by

munawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views60 pages

Index and Hashing 2017 Combined

Uploaded by

munawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Indexing and Hashing

Week 4
National College of Ireland
Dublin, Ireland.
1
Indexing and Hashing

• Introduction to Indexing
• Basic Concepts
• Ordered Indices
• Static Hashing
• Dynamic Hashing
• Comparison of Ordered Indexing and Hashing
• Index Definition in SQL
• Tree Structures
2
Introduction to Indexing
• Computer memory (RAM or ROM) is significantly faster than HDD.

• SSDs are faster but could not eliminate this difference yet.

• Databases are stored on HDDs as files.

• Files are stored as collections of BLOCKS on HDD.

• Many blocks can be read into a PAGE or SEGMENT in the RAM.

• For simplicity, we estimate the cost of an operation by counting the number of

blocks that are read or written to hard disk.

• Possibility of blocked access could significantly lower the cost of I/O.

• But generally, we assume that each relation (table) is stored in a separate file
with: B blocks and R records per block. 3
File Organizations
• Technique for physically arranging records of a file on secondary
storage
• Factors for selecting file organization:
▪ Fast data retrieval and throughput ▪ Minimizing need for reorganization
▪ Efficient storage space utilization ▪ Accommodating growth
▪ Protection from failure and data loss ▪ Security from unauthorized use

• Types of file organizations

i. Heap – no particular order
ii. Sequential
iii. Indexed
iv. Hashed
Sequential file
organization

Sequential
Records of the file are
storage:
stored in sequence by
Average time to
the primary key field find desired record
values. = log2n

If this were a
heap,
Average time to
find desired record
= n/2
Indexed File Organizations
• Storage of records sequentially or
nonsequentially with an index that
allows software to locate individual
records

• Index: a table or other data structure

used to determine in a file the location
of records that satisfy some condition

• Primary keys are automatically

uses a tree search
indexed Average time to find desired record based
on depth of the tree and length of the list
• Other fields or combinations of fields
can also be indexed; these are called
secondary keys (or nonunique keys)
Introduction to Indexing
 The use of indexes makes the retrieval of data
more efficient.
• An index is a data structure that allows the DBMS to
locate particular records in a file more quickly, and
thereby speed up response to the user queries.

 Heap File
 no specific structure
 Use sequential (linear) access to records
 Hash File
 Uses hash function of a set of hash fields
 Allows direct access if hash fields values are known 7
Introduction to Indexing
 Indexes
 An index access structure is associated with a particular search key
and contains records consisting of the key value and the address of
the logical record in the data file containing the key value

• Data File: the file containing the logical records

• Index File: the file containing the index records
 Values in the index are usually sorted (ordered) according to the
indexing field which is usually based on a single attribute.
 When an index is ordered, we can perform an efficient binary
search on the index.
 NOTE: files can have more than one indices. 8
Introduction to Indexing
• Indexing mechanisms used to speed up access to desired data.
– e.g., author catalog in library
• Search Key – A specific attribute to set of attributes used to look
up records in a file.
• An index file consists of records (called index entries) of the form
search-key pointer

• Index files are typically much smaller than the original file
• Two basic kinds of indices:
i. Ordered indices: search keys are stored in sorted order
ii. Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”. 9
Files without an Index

 Ordered File
 Consider the following example of Staff table:
Sno Lname Position NIN Bno • Find Sno = SG36
SG14 White Manager WK4416 B5
• Find Sno from SG36 to
SG21 Black Snr Asst WL7868 B3
WL8767 8
SL37
SG24 Ford Deputy B3
SG36 Brown Assistant WF7656 75 B4 • Find Lname of Murphy

SG37 Black Assistant WD7867 B4 • Find Lname of Black

SL20 Red Manager WG786 B5
• Find all Managers
SL21 Murphy Assistant WF7666 75 B4
SL37 Whyte Deputy WD7816 7 B3 • Insert new record

SL66 Blue Manager WG7816 B5 • Delete record

10
i. Primary Indexes
ii. Clustering Indexes
iii. Secondary Indexes
Types of Index iv. Multilevel Indexes

Primary Indexes
• If the data file is sequentially
ordered and the indexing field
is a key field of the file (i.e., it
is guaranteed to have a unique
value in each record) then the
index is called a primary
index.
Primary index on the ordering
11
key field of the file
Clustering Index
• If the data file is sequentially
ordered on a non-key field (values
may be repeated) and the indexing
field corresponds to this non-key
field, then SQL Server adds an
additional hidden column to make
the key unique. This is called as the
Clustering Index. A clustered index
is the most common type of table
organization. A clustering index on the Dept_number
ordering nonkey field of an EMPLOYEE12 file
Secondary Index

• A Secondary Index is a
data structure that contains a
subset of attributes from a
table, along with an alternate
key to
support Query Operations.

Dense secondary index (with

block pointers) on a nonordering
key field of a file.
13
Unique and Nonunique Indexes
• Unique (primary) Index
– Typically done for primary keys, but could also apply
to other unique fields

• Nonunique (secondary) index

– Done for fields that are often used to group individual
entities (e.g. zip code, product category)
Single-level Ordered Indexes
• A file can have at most one primary index or one clustering
index, but not both.

• A file can have several secondary indexes

• Secondary indexes do not affect the physical organisation of
records.

• Further, an index can be sparse or dense

• A sparse index has an index record for only some of the search
key values in the file.
• A dense index has an index record for every search key value
in the file. 15
Sparse Index Files
• Sparse index – index record appears for only some of the
search key values in the file.
Brighton Brighton A-217 750
Miami Downtown A-101 500
Redwood Downtown A-110 600
• In sparse index, index records are not created for Miami A-215 700
every search key. Perryridge A-102 400
• An index record here contains a search key and an Perryridge A-201 900
actual pointer to the data on the disk. Perryridge A-218 700
• To search a record, we first proceed by index Redwood A-222 700
record and reach at the actual location of the data. Round Hill A-305 350

• If the data we are looking for is not where we

directly reach by following the index, then the
system starts sequential search until the desired 16
data is found.
Dense Index Files

• Dense index – index record appears for every search-

key value in the file.

Brighton Brighton A-217 750

Downtown Downtown A-101 500
Miami Downtown A-110 600
Perryridge Miami A-215 700
Redwood Perryridge A-102 400
Round Hill
Perryridge A-201 900
Perryridge A-218 700
Redwood A-222 700
Round Hill A-305 350

17
Dense Index Files
• Dense index — Index record appears for every search-key value in the file.
• e.g., index on ID attribute of instructor relation
Single-level Ordered Indexes
Primary Index Performance
• The index file requires significantly fewer blocks than the data
file
i. Sparse index
ii. Index file record typically smaller in size than data file record
• A binary search on the index file requires fewer block accesses
than a binary search on the data file
• Insertion and deletion of records is problematic
• Not only have we to move records in the data file we also have to change
some index entries
• Storage Overhead is not a serious problem 19
Single-level Ordered Indexes

 Primary Indexes The first record in each block

 Example: of the data file is called the
anchor record
Primary Block sId Level Address
Key Value Address Block 1 9666162
9667145
9666162 Block 1
9674545
9684535 Block 2 Block 2 9684535
9716262 Block 3 9695352
9706363
Total number of entries in the Block 3 9716262
index is the same as the number 9723437
9733255
of disk blocks in the data file 20
Single-level Ordered Indexes
Clustering Index (Example)

Clustering Block Level sId Address

Field Value Address Block 1 0
0
0 Block 1 1

1 Block 1 Block 2 1
1
2 Block 2
2
3 Block 3
Block 3 2
3
3
21
Single-level Ordered Indexes
Secondary Index Performance
• A secondary index is built for a data file sorted on a non-ordering field
• The index file is itself another sorted file whose records are of fixed or
variable length consisting of two fields
• The first field is of the same data type as the indexing field of the data file
• The second field is a pointer to a disk block or a record

Indexing Field Value Block Address / Record Pointer

• We can consider two cases

i. The index access structure is constructed on a key field
ii. The index access structure is constructed on a non-key field
22
Secondary Indices Example
Secondary index on salary field of instructor

• Index record points to a bucket that contains pointers to all the

actual records with that particular search-key value.
• Secondary indices have to be dense
Single-level Ordered Indexes
(Summary)
Index Type Number of Index Entries Dense / Use Block Anchor
Sparse
Primary Equal to the number of Sparse Yes
blocks in the data file

Clustering Equal to the number of Sparse Yes if separate blocks

distinct indexing field are used for records
values with different
indexing field values.
No otherwise
Secondary on a key field Equal to number of Dense No
records in the data file

Secondary on a non-key Equal to the number of Dense for

field records for option 1. Equal option 1. Sparse
to the number of distinct for options 2
indexing field values for and
options 2 and 3. 3
24
Multi-level Indexes
• When an index file becomes large and extends over many
pages, the search time for the required index increases

• A binary search requires approximately log2p page accesses

for an index with p pages.

• A multi-level index attempts to overcome this problem by

reducing the search range

‒ Treat the index like any other file

‒ Split the index into a number of smaller indexes

‒ Maintain an index to the indexes 25

Multi-level Indexes

26
Multi-level Indexes
Multi-level Indexes Performance
• Search performance is increased when searching for a record based
on a specified indexing field value

• Problems with insertions and deletions still evident

• To retain the benefits of using multi-level indexing while reducing

index insertion and deletion problems, an approach is taken to
adopt a multi-level structure that leaves some space in each of its
blocks for inserting new entries.

• This is called a dynamic multi-level index and is often

implemented using data structures called B-trees and B+trees.
27
Multi-level Indexes

• If an index does not fit in memory, access becomes expensive.

• To reduce number of disk accesses to index records, treat the index

kept on disk as a sequential file and construct a sparse index on it.

• Outer index – a sparse index on main index

• Inner index – the main index file
• If even outer index is too large to fit in main memory, yet another
level of index can be created, and so on.

• Indices at all levels must be updated on insertion or deletion from

the file. 28
Multi-level Indexes
outer index inner index

Data
Index Block 0
Block 0

M
M Data
Block 1

Index M
Block 1
M
M

M
M
29
Index Evaluation Index Update:
Metrics Deletion
• If the deleted record
was the only record
in the file with its
particular search-
key value, the
search- key is deleted
from the index also.
30
Index Update: Insertion
Single-level index insertion:
• Perform a lookup using the search-key value appearing in the
record to be inserted.

• Dense indices – if the search-key value does not appear in

the index, insert it.

• Sparse indices – if the index stores an entry for each block

of the file, no change needs to be made to the index unless a
new block is created. In this case, the first search-key value
appearing in the new block is inserted into the index.

• Multilevel insertion (as well as deletion) algorithms are

31
simple extensions of the single-level algorithms
Introduction to Hashing
• Hashing is the transformation
of a string of characters into a
usually shorter fixed-length
value or key that represents the
original string.

• Hashing is used to index and retrieve items in a database

because it is faster to find the item using the shorter hashed key
than to find it using the original value.

• It is also used in many encryption algorithms. 32

Static Hashing
• Hashing is an effective technique to calculate the direct location of a data
record on the disk without using index structure.

• In static hashing, when a search-key value

is provided, the hash function always
computes the same address.

• For example, if mod-4 hash function is

used, then it shall generate only 5 values.
The output address shall always be same
for that function.
• The number of buckets provided remains unchanged at all times.

• A bucket is a unit of storage containing one or more records (a bucket is

typically a disk block). 33
Operations
(Access, Insertion and Deletion)

• Insertion: When a record is required to be entered using static

hash, the hash function h computes the bucket address for search key
K, where the record will be stored.

• Bucket address = h(K)

• Search: When a record needs to be retrieved, the same hash

function can be used to retrieve the address of the bucket where the
data is stored.

• Delete: This is simply a search followed by a deletion operation.

34
Hash Functions
• Worst hash function maps all search-key values to the same bucket; this
makes access time proportional to the number of search-key values in the file.
• An ideal hash function is:
i. Uniform: Each bucket is assigned the same number of search-key
values from the set of all possible values.
ii. Random: Each bucket will have the same number of records assigned
to it irrespective of the actual distribution of search-key values in the
file.
• Typical hash functions perform computation on the internal binary
representation of the search-key.
• For example, for a string search-key, the binary representations of all
the characters in the string could be added and the sum modulo number
35
of buckets could be returned.
Example of
Hash File Organization
bucket 0 bucket 5
• Hash file organization of account Perryridge A-102 400
Perryridge A-201 900

file, using branch-name as key. Perryridge

Miami
A-218
A-215
700
700

bucket 1 bucket 6
• (See figure in previous slide.)

• There are 10 buckets. bucket 2 bucket 7

• The binary representation of the

ith character is assumed to be the bucket 3
Brighton A-217 750
bucket 8
Downtown A-101 500
Round Hill A-305 350 Downtown A-110 600
integer i.

• The hash function returns the bucket 4 bucket 9

Redwood A-222 700
sum of the binary representations
of the characters modulo 10. 36
Handling of
Bucket Overflows
• Although the probability of
bucket overflow can be reduced,
it can not be eliminated; it is
handled by using overflow
buckets.

• Overflow chaining — the overflow buckets of a given bucket

are chained together in a linked list

• Above scheme is called closed hashing. An alternative, called

open hashing, is not suitable for database applications. 37
Hash Indices

• Hashing can be used not only for file organization, but also
for index-structure creation. A hash index organizes the
search keys, with their associated record pointers, into a
hash file structure.

• Hash indices are always secondary indices — if the file itself

is organized using hashing, a separate primary hash index
on it using the same search-key is unnecessary.

• However, the term hash index is used to refer to both

secondary index structures and hash organized files.
38
Example of Hash Indices
bucket 0

bucket 1
A-215 Brighton A-217 750
A-305 Downtown A-101 500
bucket 2 Downtown A-110 600
A-101
A-110
Miaimi A-215 700
Perryridge A-102 400
bucket 3
A-217 A-201 Perryridge A-201 900
A-102 Perryridge A-218 700
bucket 4 Redwood A-222 700
A-218 Round Hill A-305 350

bucket 5

bucket 6
A-222
39
Deficiencies of Static Hashing
• In static hashing, function (h) maps search-key values to a fixed
set (B) of bucket addresses.
• Databases grow with time. If initial number of buckets is too small,
performance will degrade due to too much overflows.

• If file size at some point in the future is anticipated and number of buckets
allocated accordingly, significant amount of space will be wasted initially.

• If database shrinks, again space will be wasted.

• One option is periodic, re-organization of the file with a new hash function,
but it is very expensive.

• These problems can be avoided by using techniques that allow

40
the number of buckets to be modified dynamically.
Dynamic Hashing
• Good for database that grows and shrinks rapidly in size
• Allows the hash function to be modified dynamically
• Extendable hashing – one form of dynamic hashing
• Hash function generates values over a large range — typically b-bit integers,
with b = 32.
• At any time use only the last i bits of the hash function to index into a table of
bucket addresses, where:
• 0 ≤ i ≤ 32
• Initially i = 0
• Value of i grows and shrinks as the size of the database grows and shrinks.
• Actual number of buckets is < 2i, and this also changes dynamically due to
merging and splitting of buckets. 41
Use of Hash Structure Example

42
Comparison of
Ordered Index and Hashing
• Issues to consider:
▪ Cost of periodic re-organization
▪ Relative frequency of insertions and deletions
▪ Is it desirable to optimize average access time at the expense
of worst-case access time?
▪ Expected type of queries:
▪ Hashing is generally better at retrieving records having a
specified value of the key.
▪ If range queries are common, ordered indices are to be
preferred 43
Comparison of
Different File Organizations
B-Tree Index
• Provide multi-level
access structure
• Tree is always balanced
• Space wasted by
deletion never becomes
excessive
– Each node is at least
half-full
B-tree structures (a) A node in a B-tree with q−1
• Each node in a B-tree search values (b) A B-tree of order p=3. The values
of order p can have at were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6
most p-1 search values
Advantages of B-Tree Index
▪ B-Tree Index speeds up data access
 Storage engine traverses from root node to leaf node with the
help of pointers
▪ Increase performance of following query patterns:
 Full Value (e.g. ‘London’, ‘Bristol’)
 Leftmost Value or Column Prefix (e.g. ‘Lon’ from ‘London’,
‘Mary’ from ‘Mary Hwe’)
 Range of Value (e.g. 1 to 99, Aaron to Fritz, Aaron to Kei%)
▪ B-Tree structure helps ORDER BY clause to increase
the performance 46
B+ -Trees
• Data pointers stored only at
the leaf nodes
– Leaf nodes have an entry for
every value of the search
field, and a data pointer to
the record if search field is a
key field
– For a nonkey search field,
the pointer points to a block
containing pointers to the
data file records The nodes of a B+-tree (a) Internal node of a
• Internal nodes B+-tree with q−1 search values (b) Leaf
node of a B+-tree with q−1 search values
– Some search field values
from the leaf nodes repeated
and q−1 data pointers
to guide search
Difference
between B-tree and B+-tree
• In a B-tree, pointers to data records exist at all
levels of the tree

• In a B+-tree, all pointers to data records exists at

the leaf-level nodes

• A B+-tree can have less levels (or higher capacity

of search values) than the corresponding B-tree
Visual Representations

▪ B-Tree

▪ Clustered Index

▪ Secondary Index

49
Clustered Index B-Tree

Root Node 1 ….

Intermediate Node 1 -3 0 31…

Leaf Node 1 -1 0 … 2 1 -3 0 … … …

50
Building Clustered B-Tree Index

▪ Assumption: Each page contains 10 rows

Row1
Row2
Row3
Row4
Row5
0:1 Row6
Row7
Row8
Row9
Row1 0

51
Building Clustered B-Tree Index

▪ Assumption: Each page contains 10 rows

Row1 Row11
Row2 Row12
Row3 Row13
Row4 Row14
Row5 Row15
Row6 Row16
Row7 Row17
Row8 Row18
Row9 Row19
Row10 Row20
0:1 0:2

52
Building Clustered B-Tree Index
▪ Assumption: Each page contains 10 rows

Row 1 -1 0 -> Page0:1

Row11 -2 0 -> Page0:2

1:1

Row1 Row11
Row2 Row12
Row3 Row13
Row4 Row14
Row5 Row15
Row6 Row16
Row7 Row17
Row8 Row18
Row9 Row19 53
0:1 Row10 0:2 Row20
Traversing Clustered B-Tree Index

Root Node 1 ….

Intermediate Node 1 -3 0 31…

Leaf Node 1 -1 0 … 2 1 -3 0 … … …

Leaf node will contain entire data ordered by key column 54

Traversing Clustered B-Tree Index

Root Node 1 ….

Intermediate Node 1 -3 0 31…

Leaf Node 1 -1 0 … 2 1 -3 0 … … …

55
Building Secondary Index B-Tree

Root Node 1 ….

Intermediate Node 1 -3 0 31…

Leaf Node 1 -1 0 … 2 1 -3 0 … … …

Instead of data leaf node will contain pointers to clustered index

56
Building Hash Index

FirstName LastName
SELECT FirstName, LastName
Aaron Skonnard
FROM TableName
Fritz Onion WHERE FirstName = ‘Jeff’

Keith Brown
Mike Woodring
Jeff Ross
Megan Russell
57
Building Hash Index

FirstName LastName HashFn() Value

Aaron Skonnard 1254 Pointer to row 1
Fritz Onion 5487 Pointer to row 2
Keith Brown 6587 Pointer to row 3
Mike Woodring 6842 Pointer to row 4
Jeff Ross 4786 Pointer to row 5
Megan Russell 9587 Pointer to row 6
58
Building Hash Index
SELECT FirstName, LastName
FROM TableName HashFn(‘Jeff’) = 4786
WHERE FirstName = ‘Jeff’

SELECT Value
FROM TableName
WHERE HashFn() = 4786

HashFn() Value FirstName LastName

1254 Pointer to row 1 Aaron Skonnard
5487 Pointer to row 2 Fritz Onion
6587 Pointer to row 3 Keith Brown
6842 Pointer to row 4 Mike Woodring
4786 Pointer to row 5 Jeff Ross
9587 Pointer to row 6
Megan Russell 59
Module Resources
Recommended Book Resources
• Thomas Connolly, Carolyn Begg 2014, Database Systems: A Practical Approach
to Design, Implementation, and Management, 6th Edition Ed., Pearson
Education [ISBN: 1292061189] [Present in our Library]
Supplementary Book Resources
• Gordon S. Linoff, Data Analysis Using SQL and Excel, Wiley [ISBN:
0470099518]
• Eric Redmond, Jim Wilson, Seven Databases in Seven Weeks, Pragmatic
Bookshelf [ISBN: 1934356921]
• Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, High Performance MySQL,
O'Reilly Media [ISBN: 1449314287]
Other Resources
• Website: http://www.thearling.com • Website:
• Website: http://www.mongodb.org https://www.tutorialspoint.com/dbms/db
• Website: https://app.pluralsight.com ms_hashing.htmpluralsight.com
60

DP Ss3 Note First Term
100% (2)
DP Ss3 Note First Term
43 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
Unit - 5 - Part 2
No ratings yet
Unit - 5 - Part 2
33 pages
Screenshot 2025-03-12 at 9.41.04 AM
No ratings yet
Screenshot 2025-03-12 at 9.41.04 AM
41 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
Chapter - 2 - Revision
No ratings yet
Chapter - 2 - Revision
26 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Indexing Lecture Nov 2023 Summary
No ratings yet
Indexing Lecture Nov 2023 Summary
41 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Index Structures
No ratings yet
Index Structures
34 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Index 1
No ratings yet
Index 1
25 pages
File Organization
No ratings yet
File Organization
41 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
28 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
Dbms Mod3
No ratings yet
Dbms Mod3
54 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
SS3 Term 1
No ratings yet
SS3 Term 1
18 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Week 15 Physical Database Design Index - CH 17 Updated
No ratings yet
Week 15 Physical Database Design Index - CH 17 Updated
35 pages
Indexes
No ratings yet
Indexes
70 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
Unit 5
No ratings yet
Unit 5
20 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
Module 4 Indexing
No ratings yet
Module 4 Indexing
20 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
13 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
23 pages
Index Architecture: Febriliyan Samopa
No ratings yet
Index Architecture: Febriliyan Samopa
110 pages
Indexing
No ratings yet
Indexing
62 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Chapter 3
No ratings yet
Chapter 3
50 pages
Link
No ratings yet
Link
4 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
Co2 - Index in DBMS 1
No ratings yet
Co2 - Index in DBMS 1
29 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
6 pages
Indexing
No ratings yet
Indexing
6 pages
Ad3311 - Ai Lab Manual
No ratings yet
Ad3311 - Ai Lab Manual
37 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
Lesson 4 - Indexing
No ratings yet
Lesson 4 - Indexing
6 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
5 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Indexing
No ratings yet
Indexing
6 pages
Inde
No ratings yet
Inde
10 pages
CMP 312
No ratings yet
CMP 312
2 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Lesson 0: Martingales: Le Thi Xuan Mai
No ratings yet
Lesson 0: Martingales: Le Thi Xuan Mai
50 pages
Data Structures and Algorithm MANUAL
No ratings yet
Data Structures and Algorithm MANUAL
92 pages
At Codechef: Take U Forward
No ratings yet
At Codechef: Take U Forward
8 pages
Python For Control Engineering
No ratings yet
Python For Control Engineering
54 pages
Entity Relationship Modeling
No ratings yet
Entity Relationship Modeling
40 pages
Minemax Scheduler White Paper
No ratings yet
Minemax Scheduler White Paper
12 pages
Mid Term P2
No ratings yet
Mid Term P2
6 pages
Chapter 6 - Network Flows Optimization
No ratings yet
Chapter 6 - Network Flows Optimization
50 pages
Enhancing Malware Detection and Analysis Using Deep Learning and Explainable Ai (Xai)
No ratings yet
Enhancing Malware Detection and Analysis Using Deep Learning and Explainable Ai (Xai)
19 pages
Statistics For Business and Economics: 7 Edition
No ratings yet
Statistics For Business and Economics: 7 Edition
60 pages
06 Searching and Sorting (DONE)
No ratings yet
06 Searching and Sorting (DONE)
187 pages
Data Manipulation (Part II)
No ratings yet
Data Manipulation (Part II)
42 pages
02 Regression and Classification Problems
No ratings yet
02 Regression and Classification Problems
7 pages
Final Year Mathematics Syllabus
No ratings yet
Final Year Mathematics Syllabus
5 pages
WINSEM2023-24 BCSE304L TH VL2023240501037 2024-01-09 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE304L TH VL2023240501037 2024-01-09 Reference-Material-I
42 pages
Module 10 Math 8
No ratings yet
Module 10 Math 8
6 pages
Normalization
No ratings yet
Normalization
41 pages
Data Manipulation (Part - I)
No ratings yet
Data Manipulation (Part - I)
46 pages
Ict351 4 Logical Formulas
No ratings yet
Ict351 4 Logical Formulas
32 pages
Lecture1 Asymptotic Anal
No ratings yet
Lecture1 Asymptotic Anal
74 pages
Cse1001 - Assignment No 1
No ratings yet
Cse1001 - Assignment No 1
4 pages
7 Statistical Thermodynamics-II
No ratings yet
7 Statistical Thermodynamics-II
30 pages
Multiple Discriminant Analysis - PPT - SayakSritiRajti
No ratings yet
Multiple Discriminant Analysis - PPT - SayakSritiRajti
17 pages
19 - Lecture - 19 - Routh Hurwithz Criteria - Special Cases
No ratings yet
19 - Lecture - 19 - Routh Hurwithz Criteria - Special Cases
38 pages
Automatic Image Analysis: Berlin University of Technology
No ratings yet
Automatic Image Analysis: Berlin University of Technology
13 pages
(INTI
No ratings yet
(INTI
9 pages
L1, L2 and Huber Loss
No ratings yet
L1, L2 and Huber Loss
8 pages
A General Study On Genetic Fuzzy Systems: Editor Jenny Smith C 1993 John Wiley & Sons LTD
No ratings yet
A General Study On Genetic Fuzzy Systems: Editor Jenny Smith C 1993 John Wiley & Sons LTD
25 pages
Linear Feedback Shift Registers
No ratings yet
Linear Feedback Shift Registers
22 pages
Quantum Breakdown.: Catalysts Coding Contest 3 April 2020
No ratings yet
Quantum Breakdown.: Catalysts Coding Contest 3 April 2020
7 pages
IMOmath - Basic Methods For Solving Functional Equations
No ratings yet
IMOmath - Basic Methods For Solving Functional Equations
2 pages
DFS Algorithm For Graph
No ratings yet
DFS Algorithm For Graph
4 pages
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
No ratings yet
Data Science & Data Analytics Lab Project CS695A: Datasets: (Source
2 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Index and Hashing 2017 Combined

Uploaded by

Index and Hashing 2017 Combined

Uploaded by

Indexing and Hashing

• Databases are stored on HDDs as files.

• Files are stored as collections of BLOCKS on HDD.

• Many blocks can be read into a PAGE or SEGMENT in the RAM.

• For simplicity, we estimate the cost of an operation by counting the number of

• Possibility of blocked access could significantly lower the cost of I/O.

• Types of file organizations

• Index: a table or other data structure

• Primary keys are automatically

• Data File: the file containing the logical records

SG37 Black Assistant WD7867 B4 • Find Lname of Black

SL66 Blue Manager WG7816 B5 • Delete record

Dense secondary index (with

• Nonunique (secondary) index

• A file can have several secondary indexes

• Further, an index can be sparse or dense

• If the data we are looking for is not where we

• Dense index – index record appears for every search-

Brighton Brighton A-217 750

 Primary Indexes The first record in each block

Clustering Block Level sId Address

Indexing Field Value Block Address / Record Pointer

• We can consider two cases

• Index record points to a bucket that contains pointers to all the

Clustering Equal to the number of Sparse Yes if separate blocks

Secondary on a non-key Equal to the number of Dense for

• A binary search requires approximately log2p page accesses

• A multi-level index attempts to overcome this problem by

‒ Treat the index like any other file

‒ Split the index into a number of smaller indexes

‒ Maintain an index to the indexes 25

• Problems with insertions and deletions still evident

• To retain the benefits of using multi-level indexing while reducing

• This is called a dynamic multi-level index and is often

• If an index does not fit in memory, access becomes expensive.

• To reduce number of disk accesses to index records, treat the index

• Outer index – a sparse index on main index

• Indices at all levels must be updated on insertion or deletion from

• Dense indices – if the search-key value does not appear in

• Sparse indices – if the index stores an entry for each block

• Multilevel insertion (as well as deletion) algorithms are

• Hashing is used to index and retrieve items in a database

• It is also used in many encryption algorithms. 32

• In static hashing, when a search-key value

• For example, if mod-4 hash function is

• A bucket is a unit of storage containing one or more records (a bucket is

• Insertion: When a record is required to be entered using static

• Bucket address = h(K)

• Search: When a record needs to be retrieved, the same hash

• Delete: This is simply a search followed by a deletion operation.

file, using branch-name as key. Perryridge

• There are 10 buckets. bucket 2 bucket 7

• The binary representation of the

• The hash function returns the bucket 4 bucket 9

• Overflow chaining — the overflow buckets of a given bucket

• Above scheme is called closed hashing. An alternative, called

• Hash indices are always secondary indices — if the file itself

• However, the term hash index is used to refer to both

• If database shrinks, again space will be wasted.

• These problems can be avoided by using techniques that allow

• In a B+-tree, all pointers to data records exists at

• A B+-tree can have less levels (or higher capacity

Intermediate Node 1 -3 0 31…

▪ Assumption: Each page contains 10 rows

▪ Assumption: Each page contains 10 rows

Row 1 -1 0 -> Page0:1

Intermediate Node 1 -3 0 31…

Leaf node will contain entire data ordered by key column 54

Intermediate Node 1 -3 0 31…

Intermediate Node 1 -3 0 31…

Instead of data leaf node will contain pointers to clustered index

FirstName LastName HashFn() Value

HashFn() Value FirstName LastName

You might also like