Chapter 7 Indexing

Uploaded by

Abdallah Saber Khalifa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views29 pages

Chapter 7 Indexing

Uploaded by

Abdallah Saber Khalifa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Data and File Structures

Chapter 7
CS2202‐File Organization
2021‐2022
Chapter 7
Indexing

1
Overview
• An index is a table containing a list of keys associated with
a reference field pointing to the record.
• A simple index is simply an array of (key, reference) pairs.
• An index lets you impose order on a file without
rearranging the file.
- This makes record addition much less expensive than they are with
a sorted file.
- Indexing provides multiple access paths.
- You can have different indexes for the same data file.
• Indexing also give us keyed access to variable-length
record files.

2
A Simple Index for Entry-Sequenced
Files I
• Suppose that we are looking at a collection of
recordings with the following information about each
of them:
– Identification Number
– Title
– Composer
– Artist
– Label (publisher)

3
Contents of Sample Recording File

4
A Simple Index for Entry-Sequenced
Files II
• We choose to organize the file as a series of
variable-length record with a size field
preceding each record.
- The fields within each record are also of variable-
length but are separated by delimiters.
• We form a primary key by concatenating the
record company label code and the record’s
ID number.
- This should form a unique identifier.

5
A Simple Index for Entry-Sequenced
Files III
• In order to provide rapid key access, we build a
simple index with a key field associated with a
reference field which provides the address of the
first byte of the corresponding data record.
• The index may be sorted while the file does not
have to be.
- This means that the data file may be entry
sequenced: the record occur in the order they are
entered in the file.

6
Index of Sample Recording File

7
A Simple Index for Entry-Sequenced
Files IV
• A few comments about our Index Organization:
– The index is easier to use than the data file because
1. it uses fixed-length records
2. it is likely to be much smaller than the data file.
– By requiring fixed-length records in the index file, we
impose a limit on the size of the primary key field.
– The index could carry more information than the key
and reference fields.
– e.g., we could keep the length of each data file record in the
index as well.

8
Basic Operations on an Indexed
Entry-Sequenced File
• Assumption: the index is small enough to be held in memory.
Later on, we will see what can be done when this is not the
case.
• Operations in order to maintain an indexed file
– Create the original empty index and data files
– Load the index into memory before using it.
– Rewrite the index file from memory after using it.
– Add records to the data file and index.
– Delete records from the data file.
– Update records in the data file.

9
Creating, Loading and Re-writing
• The index is represented as an array of records.
– The loading into memory can be done sequentially,
reading a large number of index records (which are short)
at once.
• What happens if the index changed but its re-writing
does not take place or takes place incompletely?
– Use a mechanism for indicating whether or not
the index is out of date.
– Have a procedure that reconstructs the index from
the data file in case it is out of date.

10
Record Addition
• When we add a record, both the data file and the index
should be updated.
• In the data file, the record can be added anywhere.
- However, the byte-offset of the new record should be saved.
• Since the index is sorted, the location of the new record
does matter.
- We have to shift all the records that belong after the one we
are inserting to open up space for the new record.
- However, this operation is not too costly as it is performed in
memory.

11
Record Deletion
• Record deletion can be done using the
methods discussed last lecture.
• In addition, the index record corresponding to
the data record being deleted must also be
deleted.
– Once again, since this deletion takes place in
memory, the record shifting is not too costly.

12
Record Updating
• Record updating falls into two categories:
– The update changes the value of the key field.
– The update does not affect the key field.
• In the first case, both the index and data file may need
to be reordered.
- The update is easiest to deal with if it is conceptualized as a
delete followed by an insert (but the user needs not know
about this).
• In the second case, the index does not need reordering,
but the data file may.
- If the updated record is smaller than the original one, it can be
re-written at the same location.
- However, if it is larger, then a new spot has to be found for it.
Again the delete/insert solution can be used.
13
Indexes that are too large to hold
in memory I
• The indexes that we have considered before could
fit into main memory.
• If this is not the case, we have the following
problems:
– Binary searching requires several seeks rather than
being performed at memory speed.
– Index rearrangement (record addition or deletion)
requires shifting records on secondary storage →
Extremely time consuming.
• Solutions:
– Use a hashed organization (later)
– Use a tree-structured index such as B-trees and B+
trees (later)
1
4
Indexes that are too large to hold
in memory II
• Nonetheless, simple indexes are still useful:
– They allow the use of a binary search in a
variable-length record file.
– Sorting and maintaining an index is less costly
than sorting and maintaining the data file,
because the index is smaller.
– If there are pinned records in the data file,
rearrangements of the keys are possible
without moving the data records.
– They can provide access by multiple keys.

1
5
Indexing to provide access by
multiple keys
• So far, our index only allows key access. i.e., we can
retrieve record DG188807, but we cannot retrieve a
recording of Beethoven’s Symphony no. 9.
→ Not useful!
• We need to use secondary key fields consisting of album
titles, composers, and artists.
–Although it would be possible to relate a secondary key to an
actual byte offset, this is usually not done.
–Instead, we relate the secondary key to a primary key which
then will point to the actual byte offset.

1
6
Index of Sample Recording File

1
7
Composer Index & Title Index

1
8
Record Addition in multiple key
access settings
• When a secondary index is used, adding a record
involves updating the data file, the primary index and
the secondary index.
– The secondary index update is similar to the primary index
update.
• Secondary keys are entered in canonical form (ex. all
capitals).
– The upper- and lower- case form must be obtained from the
data file. As well, because of the length restriction on keys,
secondary keys may sometimes be truncated.
• The secondary index may contain duplicate.
– The primary index couldn’t 7
Record Deletion in multiple key
access settings
• Removing a record from the data file means
removing its corresponding entry in the primary
index and may mean removing all of the entries
in the secondary indexes that refer to this
primary index entry.
• This is too much rearrangement, specially if
indexes cannot fit into main memory

2
0
Record Deletion in multiple key
access settings
• However, it is also possible not to worry about the
secondary index.
– Since, as we mentioned before, secondary keys were made
to point at primary keys. We don’t modify the secondary
index files.
– When accessing the file through a secondary key, the primary
index file will be checked and a deleted record can be
identified.
– The deleted record still occupy space in the secondary key
indexes.
– If a lot of deletions occur, we can periodically cleanup these
deleted records.
2
1
Record Updating in multiple key
access settings
• There are three types of updates:
– The update changes the secondary key.
• We have to rearrange secondary index to stay in sorted
order.
– The update changes the primary key.
• We have to update and reorder the primary index.
• We have to update the references to primary key in the
secondary index.
– Update confined to other fields.
• No changes necessary to primary nor secondary index.
22
Retrieval using combinations of
secondary keys
• With secondary keys, we can now search for things like
all the recordings of “Beethoven’s work” or all the
recordings titled “Violin Concerto”.
• More importantly, we can use combinations of
secondary keys.
– e.g., find all recordings of Beethoven’s Symphony no. 9.
• Without the use of secondary indexes, this request
requires a very expensive sequential search through
the entire file.
– Using secondary indexes, responding to this query is simple
and quick.
23
Improving the secondary index
structure I: The problem
• Secondary indexes lead to two difficulties:
– The secondary index file has to be rearranged
every time a new record is added to the file.
– If there are duplicate secondary keys, the
secondary key field is repeated for each entry.
• Space is wasted.

24
Improving the secondary index
structure II: Solution 1
• Solution 1: Change the secondary index structure so it
associates an array of reference with each secondary
key.
• Advantage: helps avoid the need to rearrange the
secondary index file too often.
• Disadvantages:
– It may restrict the number of references that can be
associated with each secondary key.
– It may cause internal fragmentation, i.e., waste of
space.
25
Array of Reference

26
Improving the secondary index
structure III: Solution 2
• Solution 2: inverted lists.
– Each secondary key points to a different list of primary key
references.
– Each of these lists could grow to be as long as it needs to be
and no space would be lost to internal fragmentation.
• Advantages:
– The secondary index file needs to be rearranged only upon
record addition.
– The rearranging is faster since it is smaller.
– Space from deleted primary index records can easily be
reused, since its records have fixed-length.
• Disadvantage:
– Locality (in the secondary index) has been lost.
• More seeking may be necessary.

27
Inverted Lists

28
Selective Indexes
• Using secondary keys, you can divide the file
into parts and provide a selective view.
• For example, you can build a selective index
that contains only titles to classical recordings
or recordings released prior to 1970, and since
1970.
• A possible query could then be: “List all the
recordings of Beethoven’s Simphony no. 9
released since 1970.

File Organization Methods
No ratings yet
File Organization Methods
22 pages
Comments Resolution Sheet (CRS)
100% (1)
Comments Resolution Sheet (CRS)
35 pages
10 File Organization in DBMS
No ratings yet
10 File Organization in DBMS
15 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
IT Operations - User Access Management Policies
100% (1)
IT Operations - User Access Management Policies
8 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
NCP-US en
No ratings yet
NCP-US en
43 pages
VjD51 Exercises
No ratings yet
VjD51 Exercises
102 pages
File Organization
No ratings yet
File Organization
41 pages
Data Structure Unit 5
50% (4)
Data Structure Unit 5
14 pages
IndexedFiles Fall2023-Part1
No ratings yet
IndexedFiles Fall2023-Part1
53 pages
Lecture 3.3.2 Index Sequential
No ratings yet
Lecture 3.3.2 Index Sequential
14 pages
Juniper
100% (1)
Juniper
68 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Easy Guide To Rip Models From ArtStation and SketchFab
No ratings yet
Easy Guide To Rip Models From ArtStation and SketchFab
3 pages
FS Mod2
No ratings yet
FS Mod2
22 pages
Index
No ratings yet
Index
30 pages
Software Requirements Specification
38% (8)
Software Requirements Specification
33 pages
03 UW Indexing
No ratings yet
03 UW Indexing
97 pages
File Organization-Lec11
No ratings yet
File Organization-Lec11
15 pages
Index Structures
No ratings yet
Index Structures
34 pages
Lecture 5 Trees
No ratings yet
Lecture 5 Trees
47 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Chapter 5. Record Storage and Primary File Organization
No ratings yet
Chapter 5. Record Storage and Primary File Organization
18 pages
Program Transfer Tool
No ratings yet
Program Transfer Tool
3 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
09 FIle
No ratings yet
09 FIle
22 pages
What Is SAP ?
100% (1)
What Is SAP ?
79 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Index Method2
No ratings yet
Index Method2
26 pages
Chapter 3 File Organization Indexed Methods
No ratings yet
Chapter 3 File Organization Indexed Methods
31 pages
DS TM Study Material Presentations Unit-4 1TM
No ratings yet
DS TM Study Material Presentations Unit-4 1TM
22 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
Indexing
No ratings yet
Indexing
53 pages
MCA Third Semester Syllabus
No ratings yet
MCA Third Semester Syllabus
12 pages
Installing MySQL On Unix Linux Using Generic Binaries
No ratings yet
Installing MySQL On Unix Linux Using Generic Binaries
6 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Microsoft Access 2003: Manual - Foundation Level
No ratings yet
Microsoft Access 2003: Manual - Foundation Level
114 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
AutoCAD Workbook For Architects and Engineers
94% (36)
AutoCAD Workbook For Architects and Engineers
298 pages
File Organization
No ratings yet
File Organization
11 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
02 Blocking - Addional
No ratings yet
02 Blocking - Addional
74 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
7 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
26 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Types of Indexes
No ratings yet
Types of Indexes
9 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Python Business Intelligence Cookbook - Sample Chapter
No ratings yet
Python Business Intelligence Cookbook - Sample Chapter
22 pages
Indexing - DBMS
No ratings yet
Indexing - DBMS
20 pages
Indexing
No ratings yet
Indexing
62 pages
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
No ratings yet
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
32 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
A Guide To Your iVMS-4200 Remote Camera Client
No ratings yet
A Guide To Your iVMS-4200 Remote Camera Client
30 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
File Structure Data Storage Query Evaluation Indexing and Hashing
No ratings yet
File Structure Data Storage Query Evaluation Indexing and Hashing
14 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
CHFI Crash Study Guide
No ratings yet
CHFI Crash Study Guide
9 pages
JReport Server User's Guide
No ratings yet
JReport Server User's Guide
1,566 pages
Avail List 1
No ratings yet
Avail List 1
4 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
VVOL
No ratings yet
VVOL
112 pages
Pylab Manual
No ratings yet
Pylab Manual
25 pages
Computer Book
No ratings yet
Computer Book
62 pages
Indexing Files: Last Time
No ratings yet
Indexing Files: Last Time
5 pages
DSA Unit6 Theory
No ratings yet
DSA Unit6 Theory
23 pages
Unicode Setup
No ratings yet
Unicode Setup
2 pages
Technology For Success
No ratings yet
Technology For Success
27 pages
Cse PDF
No ratings yet
Cse PDF
201 pages
Change Log
No ratings yet
Change Log
27 pages
3-Arrays 1
No ratings yet
3-Arrays 1
21 pages
Alternating Stress in ANSYS (Part 1: Principal Stress) : We Make Innovation Work
No ratings yet
Alternating Stress in ANSYS (Part 1: Principal Stress) : We Make Innovation Work
38 pages
Creating NC Files
No ratings yet
Creating NC Files
8 pages
Tips For Searching Files
No ratings yet
Tips For Searching Files
8 pages
BIL244-Lecture04 UnixIO
No ratings yet
BIL244-Lecture04 UnixIO
32 pages
Chapter 6 Organizing Files For Performance Not Complete
No ratings yet
Chapter 6 Organizing Files For Performance Not Complete
65 pages
BIAB-Enhancing A Simple Band-in-A-Box File. The MIDI Studio Consortium Faculty of MIDI Music PDF
100% (1)
BIAB-Enhancing A Simple Band-in-A-Box File. The MIDI Studio Consortium Faculty of MIDI Music PDF
5 pages
Level 3 Visual Fox Pro
No ratings yet
Level 3 Visual Fox Pro
38 pages
Chapter 11 Hashing
No ratings yet
Chapter 11 Hashing
42 pages
Indexing Sectuion
No ratings yet
Indexing Sectuion
5 pages
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
From Everand
ADVANCED DATA STRUCTURES FOR ALGORITHMS: Mastering Complex Data Structures for Algorithmic Problem-Solving (2024)
VIOLET CASTRO
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet

Chapter 7 Indexing

Uploaded by

Chapter 7 Indexing

Uploaded by

Data and File Structures

You might also like