0% found this document useful (0 votes)

147 views

Data Indexing Presentation

Data indexing is a data structure added to files to provide faster access to data by reducing the number of blocks that must be checked. There are two main types of indices: ordered indices which access sorted data and hash indices which distribute data uniformly across buckets. B-tree indexing is commonly used as it can dynamically grow or shrink and provides efficient insertion and deletion of records.

Uploaded by

marvie123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

147 views

Data Indexing Presentation

Uploaded by

marvie123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 38

Data Indexing

Herbert A. Evans
Purposes of Data Indexing

What is Data Indexing?

Why is it important?

Concept of File Systems

Stores and organizes data into computer
files.

Makes it easier to find and access data at
any given time.

Database Management Systems

The file system that manages a database.

Database - is an organized collection of
logically related data.

How DBMS Accesses Data?
The operations read, modify, update, and
delete are used to access data from
database.

DBMS must first transfer the data
temporarily to a buffer in main memory.

Data is then transferred between disk and
main memory into units called blocks.
Time Factors

The transferring of data into blocks is a
very slow operation.

Accessing data is determined by the
physical storage device being used.

Physical Storage Devices
Random Access Memory Fastest to
access memory, but most expensive.

Direct Access Memory In between for
accessing memory and cost

Sequential Access Memory Slowest to
access memory, and least expensive.
More Time Factors

Querying data out of a database requires
more time.

DBMS must search among the blocks of
the database file to look for matching
tuples.

Purpose of Data Indexing

It is a data structure that is added to a file
to provide faster access to the data.

It reduces the number of blocks that the
DBMS has to check.
Properties of Data Index

It contains a search key and a pointer.

Search key - an attribute or set of attributes that
is used to look up the records in a file.

Pointer - contains the address of where the data
is stored in memory.

It can be compared to the card catalog system
used in public libraries of the past.

Two Types of Indices

Ordered index (Primary index or clustering
index) which is used to access data
sorted by order of values.

Hash index (secondary index or non-
clustering index ) - used to access data
that is distributed uniformly across a range
of buckets.
Ordered Index

Hash Index

Definition of Bucket

Bucket - another form of a storage unit
that can store one or more records of
information.

Buckets are used if the search key value
cannot form a candidate key, or if the file
is not stored in search key order.
Choosing Indexing Technique
Five Factors involved when choosing the
indexing technique:
access type
access time
insertion time
deletion time
space overhead
Indexing Definitions
Access type is the type of access being used.
Access time - time required to locate the data.
Insertion time - time required to insert the new
data.
Deletion time - time required to delete the data.
Space overhead - the additional space occupied
by the added data structure.
Types of Ordered Indices

Dense index - an index record appears for
every search-key value in the file.

Sparse index - an index record that
appears for only some of the values in the
file.
Dense Index

Dense Index Insertion
if the search key value does not appear in the index, the
new record is inserted to an appropriate position

if the index record stores pointers to all records with the
same search-key value, a pointer is added to the new
record to the index record

if the index record stores a pointer to only the first record
with the same search-key value, the record being
inserted is placed right after the other records with the
same search-key values.
Dense Index Deletion
if the deleted record was the only record with its unique
search-key value, then it is simply deleted

if the index record stores pointers to all records with the
same search-key value, delete the point to the deleted
record from the index record.

If the index record stores a pointer to only the first record
with the same search-key value, and if the deleted
record was the first record, update the index record to
point to the next record.
Sparse Index

Sparse Index Insertion
first the index is assumed to be storing an
entry of each block of the file.

if no new block is created, no change is
made to the index.

if a new block is created, the first search-
key value in the new block is added to the
index.
Sparse Index Deletion
if the deleted record was the only record with its search
key, the corresponding index record is replaced with an
index record for the next search-key value

if the next search-key value already has an index entry,
then the index record is deleted instead of being
replaced;

if the record being deleted is one of the many records
with the same search-key value, and the index record is
pointing particularly to it, the index record pointing to the
next record with the same search-key value is updated
as the reference instead.
Index Choice
Dense index requires more space overhead and
more memory.

Data can be accessed in a shorter time using
Dense Index.

It is preferable to use a dense index when the
file is using a secondary index, or when the
index file is small compared to the size of the
memory.
Choosing Multi-Level Index
In some cases an index may be too large for efficient
processing.

In that case use multi-level indexing.

In multi-level indexing, the primary index is treated as a
sequence file and sparse index is created on it.

The outer index is a sparse index of the primary index
whereas the inner index is the primary index.
Multi-Level Index

B-Tree Index

B-tree is the most commonly used data
structures for indexing.

It is fully dynamic, that is it can grow and
shrink.
Three Types B-Tree Nodes
Root node - contains node pointers to
branch nodes.
Branch node - contains pointers to leaf
nodes or other branch nodes.
Leaf node - contains index items and
horizontal pointers to other leaf nodes.

Full B-Tree Structure

B-Tree Insertion

First the DBMS looks up the search key value

if the search key value exists in a leaf node, then a file is added to
the record and a bucket pointer if necessary

if a search-key value does not exist, then a new record is inserted
into the file and a new bucket (if necessary) are added

if there is no search key value and there is no room in the node,
then the node is split. In this case, the two resulting leaves are
adjusted to a new greatest and least search-key value. After a split,
a new node is inserted to the parent. The process of splitting
repeats when it gets full.
B-Tree Root Node
Insertion into B-Tree
B-Tree Structure
This process results in a four-level tree,
with one root node, two branch levels, and
one leaf level.

The B-tree structure can continue to grow
in this way to a maximum of 20 levels.
Branch Node Example
Branch Nodes Pointing to Leaf
Nodes
The first item in the left branch node contains the same
key value as the largest item in the leftmost leaf node
and a node pointer to it.

The second item contains the largest item in the next
leaf node and a node pointer to it. The third item in the
branch node contains only a pointer to the next higher
leaf node.

Depending on the index growth, this third item can
contain the actual key value in addition to the pointer at a
later point during the lifespan of the index.
B-Tree Deletion
the DBMS first look up the record and removes it from
file

if no bucket is associated with its search-key value or is
empty, the search-key value is removed

if there are too few pointers in a node, the pointers is
then transferred to a sibling node, and it is delete
thereafter

if transferring pointers gives a node to many pointers, the
pointers are redistributed.
Insertion/Deletion Examples
References
Dr. Lees Data Indexing Lecture.

A. Silberschatz, H.F. Korth, S. Sudarshan: Database
System Concepts, 5th Ed., McGraw-Hill, 2006.

Umanath, Narayan S. Scamell, Richard W, Data
Modeling and Database Design: Thomson, 2007.

http://publib.boulder.ibm.com/infocenter/idshelp/v10/inde
x.jsp?topic=/com.ibm.adref.doc/adref235.htm

http://publib.boulder.ibm.com/infocenter/idshelp/v10/inde
x.jsp?topic=/com.ibm.adref.doc/adref235.htm

Unified Modelling Language
71% (7)
Unified Modelling Language
33 pages
Executive Summary
100% (1)
Executive Summary
1 page
Quiz On Computer History and Hardware
No ratings yet
Quiz On Computer History and Hardware
4 pages
Block 3 MLI 101 Unit 12
No ratings yet
Block 3 MLI 101 Unit 12
29 pages
Information Science Curriculum Final
No ratings yet
Information Science Curriculum Final
169 pages
Laboratory Chapter 1
100% (1)
Laboratory Chapter 1
8 pages
IML555 Subject Cataloging and Classification: Week 1 Introduction To Subject Cataloging (Text Book Chap. 10)
100% (1)
IML555 Subject Cataloging and Classification: Week 1 Introduction To Subject Cataloging (Text Book Chap. 10)
22 pages
Pre-Board Exam For Org, Management and Laws With Key Answers
No ratings yet
Pre-Board Exam For Org, Management and Laws With Key Answers
12 pages
Week 4 Conceptualization and Operationalization of Emerging Technologies
No ratings yet
Week 4 Conceptualization and Operationalization of Emerging Technologies
32 pages
Archival Principles Respect Des Fonds and Principe de Provenance
100% (1)
Archival Principles Respect Des Fonds and Principe de Provenance
2 pages
1.living in A Network Centric World
No ratings yet
1.living in A Network Centric World
34 pages
CTS-285 Study Guide
100% (1)
CTS-285 Study Guide
95 pages
LIS 111 - Introduction To Records Mgt. and Archives
No ratings yet
LIS 111 - Introduction To Records Mgt. and Archives
13 pages
PROF ELEC 2 - Integrative Programming and Technologies 2
No ratings yet
PROF ELEC 2 - Integrative Programming and Technologies 2
1 page
Test1 - SECJ1013 - 20192020 - 01
No ratings yet
Test1 - SECJ1013 - 20192020 - 01
16 pages
Course Module 02 IT ERA
No ratings yet
Course Module 02 IT ERA
10 pages
It0423 Ipt Manual 2012-13
100% (1)
It0423 Ipt Manual 2012-13
78 pages
Information User Lecture Note
No ratings yet
Information User Lecture Note
23 pages
Lesson 7-Future of Reference and Information Services
No ratings yet
Lesson 7-Future of Reference and Information Services
12 pages
CMO 25 PSITE Rgional Convention 2016
No ratings yet
CMO 25 PSITE Rgional Convention 2016
105 pages
Beneficiary Agreement Letter
0% (1)
Beneficiary Agreement Letter
1 page
Data Structures and Algorithms Course Syllabus
No ratings yet
Data Structures and Algorithms Course Syllabus
3 pages
Web 2.0 and
No ratings yet
Web 2.0 and
3 pages
INS Form 1 August 1, 2020 Revision: 3 Page 1 of 7 Pages
No ratings yet
INS Form 1 August 1, 2020 Revision: 3 Page 1 of 7 Pages
7 pages
Obe CSC 139
No ratings yet
Obe CSC 139
5 pages
Module 08 Storage Area Network: Background
100% (1)
Module 08 Storage Area Network: Background
4 pages
Asra College of Engineering & Technology: Faculty/Course Details
No ratings yet
Asra College of Engineering & Technology: Faculty/Course Details
10 pages
Case Study-2 Air Ticket Reservation System
No ratings yet
Case Study-2 Air Ticket Reservation System
2 pages
ITEC 205 Information Management: Information and Decision Making
No ratings yet
ITEC 205 Information Management: Information and Decision Making
42 pages
Keyboard Processing With Control Flow: Julius Bancud
No ratings yet
Keyboard Processing With Control Flow: Julius Bancud
30 pages
Legal and Ethical Issues Surrounding Archives and Records Management
No ratings yet
Legal and Ethical Issues Surrounding Archives and Records Management
34 pages
Use Cases
No ratings yet
Use Cases
53 pages
Chapter 2 Multimedia Authoring
No ratings yet
Chapter 2 Multimedia Authoring
7 pages
Encoding Schemes
100% (1)
Encoding Schemes
4 pages
Deselection & Weeding (REPORT)
No ratings yet
Deselection & Weeding (REPORT)
28 pages
Computer Fundamentals
No ratings yet
Computer Fundamentals
84 pages
Information Assurance and Security 2 Prelim Lec Exam Answers 70%
100% (1)
Information Assurance and Security 2 Prelim Lec Exam Answers 70%
13 pages
THS 102 - Module 4 - Writing The Design and Methodology
No ratings yet
THS 102 - Module 4 - Writing The Design and Methodology
15 pages
IMD 301 Introduction To Cataloguing
No ratings yet
IMD 301 Introduction To Cataloguing
22 pages
A. What Is Subject Cataloging
No ratings yet
A. What Is Subject Cataloging
13 pages
CSE 2501: Social, Ethical, and Professional Issues in Computing
No ratings yet
CSE 2501: Social, Ethical, and Professional Issues in Computing
3 pages
What Is Professional Ethics in Computer Science
No ratings yet
What Is Professional Ethics in Computer Science
7 pages
Basic SQL Quiz - 2 Online Test
No ratings yet
Basic SQL Quiz - 2 Online Test
5 pages
MODULE 1 Presentation
No ratings yet
MODULE 1 Presentation
21 pages
Analisis Dan Desain Sistem - Bab 5
No ratings yet
Analisis Dan Desain Sistem - Bab 5
43 pages
Integrative Programming and Technologies 7
100% (1)
Integrative Programming and Technologies 7
12 pages
Lis 102
No ratings yet
Lis 102
3 pages
Internet/Web Design Syllabus Star Valley High School
No ratings yet
Internet/Web Design Syllabus Star Valley High School
6 pages
Electronic Business Systems: Introduction To Information Systems
No ratings yet
Electronic Business Systems: Introduction To Information Systems
21 pages
BA7205 Information Management
No ratings yet
BA7205 Information Management
10 pages
What Is The Meaning of ICT?: Option 1
No ratings yet
What Is The Meaning of ICT?: Option 1
13 pages
Lesson 4 The Web and The Internet
No ratings yet
Lesson 4 The Web and The Internet
49 pages
Menoufia University Faculty of Computers and Information First Year (First Semester) Principles of Programming
No ratings yet
Menoufia University Faculty of Computers and Information First Year (First Semester) Principles of Programming
7 pages
Form 2s Multiple Choice Quiz 1 - Computer Ethics
100% (1)
Form 2s Multiple Choice Quiz 1 - Computer Ethics
2 pages
Library Information System
100% (1)
Library Information System
124 pages
Isc402 Practical Cataloguing
No ratings yet
Isc402 Practical Cataloguing
40 pages
Library Management System
No ratings yet
Library Management System
43 pages
Catalogue and Cataloguing
100% (1)
Catalogue and Cataloguing
37 pages
INDEXING
No ratings yet
INDEXING
10 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages

Data Indexing Presentation

Uploaded by

Data Indexing Presentation

Uploaded by

Data Indexing

You might also like