0% found this document useful (0 votes)

19 views25 pages

Index 1

Uploaded by

Mx A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views25 pages

Index 1

Uploaded by

Mx A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Big Data Search

Web Embedded Interactive

Applications Forms SQL SQL

SQL Commands
Engine
Query
Evaluation Engine

Files and Access Methods

Concurrency Recovery
Control Buffer Manager Manager
Disk Space Manager

Database Data Indexes Catalog

2
Learning Objectives

 Data Storage
 How is data physically stored?
 How does data storage affect performance?

 Indexing
 Index types
 How are they used?
 How are they maintained?
 How do they improve data processing performance?

3
Alternative File Organizations

 Many alternatives exist, each ideal for some

situations, and not so good in others:
 Heap (random order) files: Suitable when typical access is a
file scan retrieving all records
 Sorted Files: Best if records must be retrieved in some order,
or only a ‘range’ of records is needed
 Indexes: Data structures to organize records via trees or
hashing
 Like sorted files, they speed up searches for a subset of
records, based on values in certain (“search key”) fields of the
index
 Updates are much faster than in sorted files
 Scan operation: allows us to step through all records
in the file one at a time  accesses all pages!
5

Queries are slow!

Book Index

6
Book Index
 A database index is similar to a book index!
 Book index: lists important terms in alphabetical order
with a list of page number(s) where the term appears
 Searching in a book:
1. Search the book index for a term to find a list of
addresses (i.e., page numbers)

2. Use these addresses to retrieve the specified page

3. Search for the term in each page

 Otherwise, search through the entire book word by

word (≈linear search)

7
Indexes

 An index on a file speeds up selections on the

search key fields for the index
 Any subset of the fields of a relation can be the
search key for an index on the relation
 Search key is NOT the same as key (minimal
set of fields that uniquely identify a record in a
relation)
 An index contains a collection of data entries,
and supports efficient retrieval of all data
entries k* with a given search key value k
Index Structure

 One form of an index is a file of data entries (records):

K*: <field value, pointer to record>

 The field is called a search key (any filed!)

 The pointer points to the record that includes the field
value

 The index is called access path or access method

9
Example index on GPA

An index contains a collection of data entries, and supports efficient

retrieval of records matching a given search condition (e.g., GPA > 2.5)

10
Indexes Alternatives

 Alternatives for data entry k* in an Index:

1. Actual data record with key value k
2. <k, rid of data record with search key value k>
3. <k, list of rids of data records with search key k>
 First alternative is like a special file organization that
can be used instead of sorting a file
 Second and third alternatives contain data entries that
point to data records and are independent of the file
organization of the indexed file
 Examples of indexing techniques: B+ trees, hash-
based structures
Indexes Alternatives Cont.

 Alternative 1:
 If this is used, index structure is a file
organization for data records (instead of a
Heap file or sorted file)
 At most one index on a given collection of data
records can use Alternative 1. (Otherwise,
data records are duplicated, leading to
redundant storage and potential inconsistency)
Indexes Alternatives Cont.

 Alternatives 2 and 3:
 Data entries typically much smaller than data
records. So, better than Alternative 1 with
large data records (Portion of index structure
used to direct search, which depends on size of
data entries, is much smaller than with
Alternative 1.)
 Alternative 3 more compact than Alternative 2,
but leads to variable sized data entries even if
search keys are of fixed length
Indexes Classifications

 Primary vs. Secondary: If search key contains primary

key, then called primary index
 Unique index: Search key contains a candidate key
 Clustered vs. Unclustered: If order of data records in
the file is the same as, or `close to’, order of data
entries in the index, then called clustered index
 Alternative 1 implies clustered; in practice, clustered also
implies Alternative 1 (since sorted files are rare)
 A file can be clustered on at most one search key
 Cost of retrieving data records through index varies greatly
based on whether index is clustered or not!
Indexes Classifications Cont.

 Suppose that Alternative (2) is used for data entries,

and that the data records are stored in a Heap file:
 To build clustered index, first sort the Heap file (with some
free space on each page for future inserts)

CLUSTERED UNCLUSTERED
Index entries
direct search for
data entries

Data entries Data entries

(Index File)
(Data file)

Data Records Data Records

Index Structures
 Single-level Indexes
1. Primary Indexes
2. Clustering Indexes
3. Secondary Indexes

 Multi-level Indexes

16
Primary Index
 Defined on an ordered data file

 The data file is ordered on the primary

key field

 A primary index is an ordered file

 An entry ”i” in the primary index is:
 Fixed length with 2 fields:
<K(i), P(i)>
 K(i): is the search key value of the PK
 P(i): is a pointer to disk block
(i.e., block address)

17
Primary Index
Example: entry 1 = <K(1), P(1)>
<K(1) = “Aaron, Ed”, P(1)= block 1>

18
Primary Index
Example: entry 1 = <K(1), P(1)>
<K(1) = “Aaron, Ed”, P(1)= block 1>

What is entry 2?!

<K(2) = “Abbot, Diane”, P(2)= block 1>

Or
<K(2) = “Adams, John”, P(2)= block 2>

19
Primary Index
Example: entry 1 = <K(1), P(1)>
<K(1) = “Aaron, Ed”, P(1)= block 1>

What is entry 2?!

<K(2) = “Abbot, Diane”, P(2)= block 1> ✗

Or
<K(2) = “Adams, John”, P(2)= block 2> ✓
<K(2) = “Abbot, Diane”, P(2)= block 1>
is redundant because: since the file is sorted
then “Abbot, Diane” has to be in between
“Aaron, Ed” and “Adams, John” (i.e., block 1)
Block anchor next slide…

20
Index File
<K(i), P(i)> entries

P(i) is a pointer to disk

block (i.e., block address)

Block anchor Block

primary key value K(i) Pointer P(i)

Entry

21
An index file might be
stored in multiple blocks!

Block anchor Block

primary key value K(i) Pointer P(i)

22
Primary Index
 One index entry for each block in the data file
 The index entry has the key field value for the first
record in the block (block anchor)

 A primary index is nondense (sparse)

 It includes an entry for each disk block of the data
NOT for each search value

 A primary index typically occupies multiple blocks

 “Create an index!”
 But does a primary index make queries run faster?!

23
24

Create Index!
25

Create Index!
Advantages of a Primary Index
 A primary index might be stored in multiple blocks, but:
it occupies much smaller space than a data file, because:
1. There are fewer index entries than records
2. Each index entry is typically smaller than a data record
 Index record only 2 fields
 More index entries than data records fit in a block
 Binary search is more efficient on index file!
 Let size of data file = Bdata blocks
 Let size of index file = Bindex blocks
 Typically: Bindex << Bdata (much smaller)
 Then: Log2 Bindex < log2 Bdata

Linuxnotes
No ratings yet
Linuxnotes
17 pages
Ultimate SnowPro Core Certification Course Slides by Tom Bailey
No ratings yet
Ultimate SnowPro Core Certification Course Slides by Tom Bailey
333 pages
Database System Concepts, 7 Ed: ©silberschatz, Korth and Sudarshan See For Conditions On Re-Use
No ratings yet
Database System Concepts, 7 Ed: ©silberschatz, Korth and Sudarshan See For Conditions On Re-Use
50 pages
Dbms PPT For Chapter 7
No ratings yet
Dbms PPT For Chapter 7
45 pages
SQL All-in-One For Dummies
From Everand
SQL All-in-One For Dummies
Allen G. Taylor
3.5/5 (15)
Database Design With UML and SQL
100% (1)
Database Design With UML and SQL
76 pages
Information System and Databases
No ratings yet
Information System and Databases
10 pages
Indexing
No ratings yet
Indexing
62 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
File Organization
No ratings yet
File Organization
41 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
26 - Databse Indexes
No ratings yet
26 - Databse Indexes
48 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
V Unit
No ratings yet
V Unit
36 pages
V Unit
No ratings yet
V Unit
15 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
UNIT 4 Updated - 121124
No ratings yet
UNIT 4 Updated - 121124
52 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
CH 3 Index
No ratings yet
CH 3 Index
40 pages
Co2 - Index in DBMS 1
No ratings yet
Co2 - Index in DBMS 1
29 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
SS3 Term 1
No ratings yet
SS3 Term 1
18 pages
Indexing
No ratings yet
Indexing
41 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
28 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Indexes
No ratings yet
Indexes
70 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
Index Method1
No ratings yet
Index Method1
24 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Python Data Structures Explained: A Practical Guide with Examples
From Everand
Python Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
From Everand
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
Bharvi Dixit
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
BI Apps796 Perf Tech Note V9
No ratings yet
BI Apps796 Perf Tech Note V9
134 pages
Unit 3 - Relational Database Design
No ratings yet
Unit 3 - Relational Database Design
5 pages
Oracle
No ratings yet
Oracle
11 pages
LATIHAN SOAL MTA DATABASE FUNDAMENTALS - Baru PDF
100% (2)
LATIHAN SOAL MTA DATABASE FUNDAMENTALS - Baru PDF
66 pages
Oracle 19c - Complete Checklist For Upgrading To Oracle Database 19c (19.x) Using DBUA - 2545064.1
No ratings yet
Oracle 19c - Complete Checklist For Upgrading To Oracle Database 19c (19.x) Using DBUA - 2545064.1
23 pages
DW Mod 1
No ratings yet
DW Mod 1
25 pages
Venkat Resume
No ratings yet
Venkat Resume
3 pages
Power BI Vs Excel
No ratings yet
Power BI Vs Excel
3 pages
Sap Technical
No ratings yet
Sap Technical
15 pages
EmpTech Week 3 CONTEXTUALIZED ONLINE SEARCH AND RESEARCH SKILLS
No ratings yet
EmpTech Week 3 CONTEXTUALIZED ONLINE SEARCH AND RESEARCH SKILLS
12 pages
OBI Query For Report Names and Tables
No ratings yet
OBI Query For Report Names and Tables
7 pages
L6H - Processing Data Using Impala
No ratings yet
L6H - Processing Data Using Impala
4 pages
Commonly Used SAP T-Codes
No ratings yet
Commonly Used SAP T-Codes
3 pages
Detailed Azure Data Factory Presentation
No ratings yet
Detailed Azure Data Factory Presentation
30 pages
Soal Oracle 1
No ratings yet
Soal Oracle 1
2 pages
DBMS Lab Record 2020-21
No ratings yet
DBMS Lab Record 2020-21
36 pages
Semester 1 Mid Term Exam Part 1 PDF
No ratings yet
Semester 1 Mid Term Exam Part 1 PDF
16 pages
Section 9
No ratings yet
Section 9
6 pages
SQL Tutorial: Saad Bashir Alvi 1
No ratings yet
SQL Tutorial: Saad Bashir Alvi 1
50 pages
MySQL Made Easy - Joseph C Scott
No ratings yet
MySQL Made Easy - Joseph C Scott
971 pages
JP 6 2 Practice Solution
No ratings yet
JP 6 2 Practice Solution
4 pages
Database Management System (Data Modelling) Answer Key - Activity 2
No ratings yet
Database Management System (Data Modelling) Answer Key - Activity 2
17 pages
Dbms Project
No ratings yet
Dbms Project
24 pages
Library Management System (LMS)
No ratings yet
Library Management System (LMS)
14 pages
Optimize UNDO Tablespace
No ratings yet
Optimize UNDO Tablespace
4 pages

Index 1

Uploaded by

Index 1

Uploaded by

Big Data Search

Web Embedded Interactive

Files and Access Methods

Database Data Indexes Catalog

 Many alternatives exist, each ideal for some

Queries are slow!

2. Use these addresses to retrieve the specified page

3. Search for the term in each page

 Otherwise, search through the entire book word by

 An index on a file speeds up selections on the

 One form of an index is a file of data entries (records):

 The field is called a search key (any filed!)

 The index is called access path or access method

An index contains a collection of data entries, and supports efficient

 Alternatives for data entry k* in an Index:

 Primary vs. Secondary: If search key contains primary

 Suppose that Alternative (2) is used for data entries,

Data entries Data entries

Data Records Data Records

 The data file is ordered on the primary

 A primary index is an ordered file

What is entry 2?!

<K(2) = “Abbot, Diane”, P(2)= block 1>

What is entry 2?!

<K(2) = “Abbot, Diane”, P(2)= block 1> ✗

P(i) is a pointer to disk

Block anchor Block

Block anchor Block

 A primary index is nondense (sparse)

 A primary index typically occupies multiple blocks

You might also like