0% found this document useful (0 votes)
19 views25 pages

Index 1

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views25 pages

Index 1

Uploaded by

Mx A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Big Data Search

Web Embedded Interactive


Applications Forms SQL SQL

SQL Commands
Engine
Query
Evaluation Engine

Files and Access Methods


Concurrency Recovery
Control Buffer Manager Manager
Disk Space Manager

Database Data Indexes Catalog

2
Learning Objectives

 Data Storage
 How is data physically stored?
 How does data storage affect performance?

 Indexing
 Index types
 How are they used?
 How are they maintained?
 How do they improve data processing performance?

3
Alternative File Organizations

 Many alternatives exist, each ideal for some


situations, and not so good in others:
 Heap (random order) files: Suitable when typical access is a
file scan retrieving all records
 Sorted Files: Best if records must be retrieved in some order,
or only a ‘range’ of records is needed
 Indexes: Data structures to organize records via trees or
hashing
 Like sorted files, they speed up searches for a subset of
records, based on values in certain (“search key”) fields of the
index
 Updates are much faster than in sorted files
 Scan operation: allows us to step through all records
in the file one at a time  accesses all pages!
5

Queries are slow!


Book Index

6
Book Index
 A database index is similar to a book index!
 Book index: lists important terms in alphabetical order
with a list of page number(s) where the term appears
 Searching in a book:
1. Search the book index for a term to find a list of
addresses (i.e., page numbers)

2. Use these addresses to retrieve the specified page

3. Search for the term in each page

 Otherwise, search through the entire book word by


word (≈linear search)

7
Indexes

 An index on a file speeds up selections on the


search key fields for the index
 Any subset of the fields of a relation can be the
search key for an index on the relation
 Search key is NOT the same as key (minimal
set of fields that uniquely identify a record in a
relation)
 An index contains a collection of data entries,
and supports efficient retrieval of all data
entries k* with a given search key value k
Index Structure

 One form of an index is a file of data entries (records):


K*: <field value, pointer to record>

 The field is called a search key (any filed!)


 The pointer points to the record that includes the field
value

 The index is called access path or access method

9
Example index on GPA

An index contains a collection of data entries, and supports efficient


retrieval of records matching a given search condition (e.g., GPA > 2.5)

10
Indexes Alternatives

 Alternatives for data entry k* in an Index:


1. Actual data record with key value k
2. <k, rid of data record with search key value k>
3. <k, list of rids of data records with search key k>
 First alternative is like a special file organization that
can be used instead of sorting a file
 Second and third alternatives contain data entries that
point to data records and are independent of the file
organization of the indexed file
 Examples of indexing techniques: B+ trees, hash-
based structures
Indexes Alternatives Cont.

 Alternative 1:
 If this is used, index structure is a file
organization for data records (instead of a
Heap file or sorted file)
 At most one index on a given collection of data
records can use Alternative 1. (Otherwise,
data records are duplicated, leading to
redundant storage and potential inconsistency)
Indexes Alternatives Cont.

 Alternatives 2 and 3:
 Data entries typically much smaller than data
records. So, better than Alternative 1 with
large data records (Portion of index structure
used to direct search, which depends on size of
data entries, is much smaller than with
Alternative 1.)
 Alternative 3 more compact than Alternative 2,
but leads to variable sized data entries even if
search keys are of fixed length
Indexes Classifications

 Primary vs. Secondary: If search key contains primary


key, then called primary index
 Unique index: Search key contains a candidate key
 Clustered vs. Unclustered: If order of data records in
the file is the same as, or `close to’, order of data
entries in the index, then called clustered index
 Alternative 1 implies clustered; in practice, clustered also
implies Alternative 1 (since sorted files are rare)
 A file can be clustered on at most one search key
 Cost of retrieving data records through index varies greatly
based on whether index is clustered or not!
Indexes Classifications Cont.

 Suppose that Alternative (2) is used for data entries,


and that the data records are stored in a Heap file:
 To build clustered index, first sort the Heap file (with some
free space on each page for future inserts)

CLUSTERED UNCLUSTERED
Index entries
direct search for
data entries

Data entries Data entries


(Index File)
(Data file)

Data Records Data Records


Index Structures
 Single-level Indexes
1. Primary Indexes
2. Clustering Indexes
3. Secondary Indexes

 Multi-level Indexes

16
Primary Index
 Defined on an ordered data file

 The data file is ordered on the primary


key field

 A primary index is an ordered file


 An entry ”i” in the primary index is:
 Fixed length with 2 fields:
<K(i), P(i)>
 K(i): is the search key value of the PK
 P(i): is a pointer to disk block
(i.e., block address)

17
Primary Index
Example: entry 1 = <K(1), P(1)>
<K(1) = “Aaron, Ed”, P(1)= block 1>

18
Primary Index
Example: entry 1 = <K(1), P(1)>
<K(1) = “Aaron, Ed”, P(1)= block 1>

What is entry 2?!

<K(2) = “Abbot, Diane”, P(2)= block 1>


Or
<K(2) = “Adams, John”, P(2)= block 2>

19
Primary Index
Example: entry 1 = <K(1), P(1)>
<K(1) = “Aaron, Ed”, P(1)= block 1>

What is entry 2?!

<K(2) = “Abbot, Diane”, P(2)= block 1> ✗


Or
<K(2) = “Adams, John”, P(2)= block 2> ✓
<K(2) = “Abbot, Diane”, P(2)= block 1>
is redundant because: since the file is sorted
then “Abbot, Diane” has to be in between
“Aaron, Ed” and “Adams, John” (i.e., block 1)
Block anchor next slide…

20
Index File
<K(i), P(i)> entries

P(i) is a pointer to disk


block (i.e., block address)

Block anchor Block


primary key value K(i) Pointer P(i)

Entry

21
An index file might be
stored in multiple blocks!

Block anchor Block


primary key value K(i) Pointer P(i)

22
Primary Index
 One index entry for each block in the data file
 The index entry has the key field value for the first
record in the block (block anchor)

 A primary index is nondense (sparse)


 It includes an entry for each disk block of the data
NOT for each search value

 A primary index typically occupies multiple blocks

 “Create an index!”
 But does a primary index make queries run faster?!

23
24

Create Index!
25

Create Index!
Advantages of a Primary Index
 A primary index might be stored in multiple blocks, but:
it occupies much smaller space than a data file, because:
1. There are fewer index entries than records
2. Each index entry is typically smaller than a data record
 Index record only 2 fields
 More index entries than data records fit in a block
 Binary search is more efficient on index file!
 Let size of data file = Bdata blocks
 Let size of index file = Bindex blocks
 Typically: Bindex << Bdata (much smaller)
 Then: Log2 Bindex < log2 Bdata

26

You might also like