Lt20 21 Index
Lt20 21 Index
Physical models :
Car: engine, transmission, master cylinder, break lines, brake pads, Bicycle: chain from pedal to wheels, gears, wire from handle to brake pads
Central Processing Unit (CPU) Input, output devices, e.g. mouse, keyword, monitors, printers Communication mechanisms, e.g. internal bus, network card, modem Storage Hierarchy
Main memories - fast but content is lost when power is off Secondary storage - slower, retains content without power Tertiary storage - very slow, retains content, very large capacity
on secondary storage, e.g. disks Use main memory to improve performance User tertiary storage (e.g. tapes) for backup, archival etc.
Concepts
Field presents a property or attribute of a relation or an entity
Records represent a row in a relational table Collection of fields for attributes in relational schema of the table Files are collections of records /A relation is typically stored as a file of records. Homogeneous collection of records may represent a relation Heterogeneous collections may be a union of related relations.
File organization
DBMS ARCHITECTURE
Web Forms
SQL Interface
Parser Optimizer
File and Access Methods Transaction Manager Buffer Manager Lock Manager Concurrency Control% Disk Space Manager Recovery Manager
System Catalog
Architecture: Buffer manager fetches pages from external storage to main memory buffer pool. File and index layers make calls to the buffer manager.
Indexes
An index on a file speeds up selections on the search key fields for the index.
Any subset of the fields of a relation can be the search key for an index on the relation. Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation).
An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k.
Choice of alternative for data entries is orthogonal (list of axes) to the indexing technique used to locate data entries with a given key value k.
Examples of indexing techniques: B+ trees, hashbased structures. Typically, index contains auxiliary information that directs searches to the desired data entries.
Example of Alternative 1
Locatio shape n 1 2 3 4 5 6 round square colour Red Red holes 2 4 8 2 4 8 6 data entries, sorted by colour
rectangle blue
Example of Alternative 2
Location colour 1 2 3 4 5 6 Red Red Red blue blue blue
Example of Alternative 3
Location colour s 1, 2, 3 4,5,6 Red Blue
Index Classification
Primary vs. secondary: If search key contains primary key, then called primary index.
Unique index: Search key contains a candidate key.
Clustered vs. unclustered: If order of data records is the same as, or `close to, order of data entries, then called clustered index.
Alternative 1 implies clustered; in practice, clustered also implies Alternative 1 (since sorted files are rare). A file can be clustered on at most one search key. Cost of retrieving data records through index varies greatly based on whether index is clustered or not!
Index Classification(cntd)
Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file.
To build clustered index, first sort the Heap file (with some free space on each page for future inserts). Overflow pages may be needed for inserts. (Thus, order of data recs is `close to, but not identical to, the sort order.)
Data entries
Data entries
Data
Hash-Based Indexes
Good for equality selections. Index is a collection of buckets. Bucket = primary page plus zero or more overflow pages. Hashing function h: h(r) = bucket in which record r belongs. h looks at the search key fields of r. If Alternative (1) is used, the buckets contain the data records; otherwise, they contain <key, rid> or <key, rid-list> pairs.
Samu, 44,3000 h(age)= 00 age h1 h(age)= 01 Jones,40,6 003 Tracy,44,5 Sahu, 004 25,3000 Kones,33,4 003 h(age)= 10 Tincy,29,20 Sanju, 07 50,5004 John,22,60 03
3000 3000 5004 5004 h2 4003 2007 6003 6003 h(sal)= 11 h(sal)= 00 sal
Tree Indexes
Non-leaf pages
leaf pages
Leaf pages contain data entries and are chained (prev. & next). Non-leaf pages contain index entries and direct searches.
Measuring number of page I/Os ignores gains of pre-fetching a sequence of pages; thus, even I/O cost is only approximated. Average-case analysis; based on several simplistic assumptions. * Good enough to show the overall trends!
Operations to Compare
Scan: Fetch all records from disk Equality search Range selection Insert a record Delete a record
Sorted Files:
Files compacted after deletions.
Indexes:
Alt (2), (3): data entry size = 10% size of record Hash: No overflow buckets.
80% page occupancy => File size = 1.25 data size
(c) range BD
(d) insert 2D
Dlog2B+# Search Searc matching +BD h+BD pages Search Searc +D h+D
3.Clustered
1.5BD
BD(R+ 0.15)
D(1+logF D(logF0.1 D(3+lo Searc 0.15B) 5B+#mat gF0.15 h+2D ching B) records) 2D BD 4D Searc h+2D
BD(R+ 0.125)