0% found this document useful (0 votes)

89 views35 pages

Organization and Maintenance of Large Ordered Indices

The document summarizes an index organization scheme using B-trees that allows retrieval, insertion, and deletion of keys in time proportional to logkI, where I is the size of the index and k is a device-dependent parameter related to page size. B-trees keep index keys organized across multiple pages or nodes that are stored on a backup device like a disk. The keys within each page are ordered, and interior nodes contain pointers to child nodes. Experiments showed an index of 15,000 keys could be maintained with an average of 9 transactions per second on an IBM 360/44 system.

Uploaded by

stk.stoyanov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views35 pages

Organization and Maintenance of Large Ordered Indices

Uploaded by

stk.stoyanov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

107

Di-82-0989

ORGANIZATION AND MAINTENANCE OF LARGE

ORDERED INDICES

R. Bayer

and

E. McCreight

Mathematical and Information Sciences Report No. 20

Mathematical and Information Sciences Laboratory

BOEING SCIENTIFIC RESEARCH LABORATORIES

July 1970
i0

ABSTRACT

Organization and maintenance of an index for a dynamic random

access file is considered. It is assumed that the index must be kept

on some pseudo random access backup store like a disc or a drum. The

index organization described allows retrieval, insertion, and deletion

of keys in time proportional to lOgkl where I is the size of the

index and k is a device dependent natural number such that the per-

formance of the scheme becomes near optimal. Storage utilization is at

least 50% but generally much higher. The pages of the index are organized

in a special data-structure, so-called B-trees. The scheme is analyzed~

performance bounds are obtained, and a near optimal k is computed.

Experiments have been performed with indices up to i00,000 keys. An

index of size 15,000 (i00,000) can be maintained with an average of 9

(at least 4) transactions per second on an IBM 360/44 with a 2311 disc.

Key Words and Phrases: Data structures, random access files, dynamic

index maintenance, key insertion, key deletion, key retrieval, paging,

information retrieval.

CR Categories: 3.70, 3.73, 3.74.

109

i. Introduction

In this paper we consider the problem of organizing and maintaining

an index for a dynamically changing random access file. By an index

we mean a collection of index elements which are pairs (x,e) of fixed

size physically adjacent data items, namely a key x and some associated

information ~. The key x identifies a unique element in the index,

the associated information is typically a pointer to a record or a

collection of records in a random access file. For this paper the

associated information is of no further interest.

We assume that the index itself is so voluminous that only rather

small parts of it can be kept in main store at one time. Thus the bulk

of the index must be kept on some backup store. The class of backup

stores considered are pseudo random access d~uides which have a rather

long access or wait time--as opposed to a true random access device like

core store--and a rather high data rate once the transmission of physically

sequential data has been initiated. Typical pseudo random access devices

are: fixed and moving head discs, drums, and data cells.

Since the data file itself changes, it must be possible not only

to search the index and to retrieve elements, but also to delete and to

insert keys--more accurately index elements--economically. The index

organization described in this paper always allows retrieval, insertiol~,

and deletion of keys in time proportional to lOgkI or better, where I

is the size of the index, and k is a device dependent natural number

which describes the page size such that the performance of the maintenance
ii0

and retrieva] scheme becomes near optimal.

In more illustrative terms theoretical analysis and actual experi-

ments show that it is possible to maintain an index of size 15000

with an average of 9 retrievals, insertions, and deletions per second

in real time on an IBM 360/44 with a 2311 disc as backup store. According

to our theoretical analysis, it should be possible to maintain an

index of size 1500000 with at least two transactions per

second on such a configuration in real time.

The index is organized in pages of fixed size capable of holding up

to 2k keys, but pages need only be partially filled. Pages are the

blocks of information transferred between main store and backup store.

The pages themselves are the nodes of a rather specialized tree,

a so-called B-tree, described in the next section. In this paper these

trees grow and contract in only one way, namely nodes split off a brother,

or two brothers are merged or "catenated" into a single node. The splitting

and catenation processes are initiated at the leaves only and propagate

toward the root. If the root node splits, a new root must be introduced,

and this is the only way in which the height of the tree can increase.

The opposite process occurs if the tree contracts.

There are, of course, many competitive schemes, e.g., hash-coding,

for organizing an index. For a large class of applications the scheme

presented in this paper offers significant advantages over others:

1tl

i) Storage utilization is at least 50% "at any time and should

be considerably better in the average.

ii) Storage is requested and released as the file grows and con-

tracts. There is no congestion problem or degradation of

performance if the storage occupancy is very high.

iii) The natural order of the keys is maintained and allows pro-

cessing based on that order like: find predecessors and

successors; search the file sequentially to answer queries;

skip, delete, retrieve a number of records starting from a

glven key.

iv) If retrievals, insertions, and deletions come in batches,

very efficient essentially sequential processing of the index

is possible by presorting the transactions on their keys

and by using a simple prepaging algorithm.

2. B-Trees

Def. 2.1: Let h ~ 0 be an integer, k a natural number. A directed

tree T is in the class ~(k,h) of B-trees if T is either empty (h=0)

or has the following properties:

i) Each path from the root to any leaf has the same length h,

also called the height of T, i.e., h = number of nodes in path.

ii) Each node except the root and the leaves has at least k + 1

sons. The root is a leaf or has at least two sons.

i11) Each node has at most 2k + 1 sons.

Number of Nodes in B-Trees: Let N . and N be the minimal and

nun max
maximal number of nodes in a B-tree T c ~(k,h). Then

N . = i + 2 ((k+l) 0 + (k+l) l + "'" + (k+l) h-2) = i + ~-

K2 ((k+l) h-l-l)
mln

for h >_ 2. This also holds for h = i. Similarly one obtains

h-i
Nma x = ~ (2k+l) i = ~ i ((2k+l)h-l) ; h _> i.
i=0

Upper and lower bounds for the number N(T) of nodes of T E ~(k,h) are

given by:

N(T) = 0 if T ~ T(k,O); (2.1)

2( (k+l)h-l-i ) _< N(T)

i + ~ _< ~ 1 (2k+l)hl) otherwise.

Note that the classes ~(k,h) need not be disjoint.

113

3. The Data Structure and Retrieval Algorithm

To repeat, the pages on which the index is stored are the nodes of

a B-tree T e m(k,h) and can hold up to 2k keys. In addition the

data structure for the index has the following properties:

i) Each page holds between k and 2k keys (index elements)

except the root page which may hold between 1 and 2k

keys.

ii) Let the number of keys on a page P, which is not a leaf, be

£. Then P has ~ + 1 sons.

iii) Within each page P the keys are sequential in increasing

order: xl, x2, ..., x ; k J ~ J 2k except for the root page

for which 1 J £ J 2k. Furthermore, P contains ~ + 1

pointers P0' PI' ..., p~ to the sons of P. On leaf panes

these pointers are undefined. Logically a page is then

organized as shown in Figure i.

~1111IIIII17-i]

Figure l . Organization of a Page

The ~i are the associated information in the index element

(xi,~i). Tl)e triple (xi,c~i,Pi) or--omitting <~i--thc' pair

(xi,P i) is also called an orzD~'y.

iv) Let P(pi ) be the page to which Pi points, let K(Pi)

114

be tile set of keys on the pages of that maximal subtree of

which P(pi ) is tile root. Then for the B-trees considered here

the following conditions shall always hold:

(Vy ¢ K(P0))(y < x I) (3.1)

(Vy e K(Pi) ) (xi < y < Xi+l); i = 1,2 .....~-i (3.2)

(Vy ~ K(p~))(x~ < y) (3.3)

Figure 2 is an example of a B-tree in ~(2,3) satisfying all the above

conditions. In the figure the c~i are not shown and the page pointers

are represented graphically. The boxes represent pages and the numbers

outside are page numbers to be used later.

Retrieval Algorithm: The flowchart in Figure 3 is an algorithm for

retrieving a key y. Let p,r,s be pointer variables which can also

assume the value "undefined" denoted as u. r points to the root and is

u if the tree is empty, s does not serve any purpose for retrieval,

but will be used in the insertion algorithm. Let P(p) be the page to

which p is pointing, then Xl,...,x are the keys in P(p) and

P0 .... 'P~ the page pointers in P(p).

The retrieval algorithm is simple logically, but to program it for

a computer one would use an efficient technique, e.g., a binary search,

to scan a page.

Cost of Retrieval: Let h be the height of the page tree. Then at

most h pages must be scanned and therefore fetched from backup store

to retrieve a key y. We will now derive bounds for h for a given index
115

- - i
11{

per
sou

-f p=u?
\
Iso

J
4~°

l~o ~,,...J

- ] P--PL l-

Figure 3. Retrieval Algorithm

117

of size I. The minimum and maximum number I . and I of keys

mln max
in a B-tree of pages in T(k,h) are:

I min

Ima x = 2k 2k
t = (2k+1) h - 1

This is immediate from (2.1) for h >_ i. Thus we have as sharp bounds

for the height h:

lOg2k+l (I+i) _< h _< 1 + lOgk+l~-~-- ~

A+i\ for I _> i,

(3.J)
h = 0 for I = O.
118

4. Key Insertion

The algorithm in Figure 4 inserts a single key y into an index

described in Section 3. The variable s is a page pointer set by the

retrieval algorithm pointing to the last page that was scanned or having

the value u if the page tree is empty.

Splitting a Page: If a page P in which an entry should be inserted

is already full, it will be split into two pages. Logically first insert

the entry into the sequence of entries in P--which is assumed to be in

main store--resulting in a sequence

P0*(xl,Pl),(x2,P2),-'-,(Xek+l,P2k+l)

Now put the subsequence p0,(xl,Pl),...,(xk,Pk ) into P and introduce

a new page P' to contain the subsequence

Pk+l,(Xk+2,Pk+2),(Xk+3,Pk+ 3) ..... (X2k+l,P2k+l).

Let Q be the father page of P. Insert the entry (Xk+1, p ), where

p' points to P', into Q. Thus P' becomes a brother of P.

Inserting (Xk+l,p') into Q may, of course, cause Q to split

too, and so on, possibly up to the root. If the splitting page P is

the root, then we introduce a new root page Q containing p,(Xk+l,p')

where p points to P and p' to P'.

Note that this insertion process maps B-trees with parameter k

into B-trees with parameter k, and preserves properties (3.1), (3.2),

and (3.3).
119

To illustrate the insertion process, insertion of key 9 into the

tree in Figure 5 with parameter k = 2 results in the tree in Figure

apply retrieval
algorithm for
key y

(
l
found y? j
YES

1 No
( S=U? ) : YES _1 tree is empty, I
create root

split page
page with y
L
routine is P(s) full? )
for P(s)

insert entry ]
I (y, u) in P(s)

(i) Key y is already in index, take appropriate action.

Figure 4. Insertion Algorithm

ILt'~
IC'~I

C,4
(v)

i inn

~4
QJ

\ r_o
i
b-.I

L~
CO

r~
121

5. Cost of Retrievals and Insertions

To analyze the cost of m a i n t a i n i n g an index and retrieving keys

we n e e d to k n o w how many pages must be fetched from the backup store

into m a i n store and how many pages must be w r i t t e n onto the backup store.

For our analysis we m a k e the following assumption: Any page, whose content

is e x a m i n e d or modified during a single retrieval, insertion, or deletion

of a key, is fetched or paged out respectively exactly once. It w i l l

become clear during the course of this paper that a paging area to hold

h + i pages in main store is sufficient to do this.

Any more p o w e r f u l p a g i n g scheme, like, e.g., k e e p i n g the root page

permanently locked in main store, will, of course, decrease tile number

of pages w h i c h must be fetched or paged out. We w i l l not, however,

analyze such schemes, although we have used them in our experiments.

Denote by fmin (fmax) the minimal (maximal) number of pages

fetched, and by Wmi n (Wmax) the minimal (maximal) number of pages

written.

Cost of Retrieval: From the retrieval algorithm it is clear that for

retrieving a single key we get

fmln = i; f max = h; w min = W m a x = 0;

•

Cost of Insertion: For inserting a single key the least work is required

if no page splitting occurs, then

f . = h; w . = i;
mln mln
12

Most work is required if all pages in the retrieval path inc]ud~ng

the root page sp]flt into two. Since the retrieval path contains h

pages and we have to write a new root page, we get:

fmax = h; Wma X = 2h + i

Note that h always denotes the height of the old tree. Although

this worst bound is sharp, it is not a good measure for the amount of

work which must generally be done for inserting one key.

If we consider an index in which keys are only retrieved or inserted,

but: no keys are deleted, then we can derive a bound for the average amount

of work to be done for building an index of I keys as follows:

Each page split causes one (or two if the root page splits) new

pages to be created. Thus the number of page splits occurring in building

an index of I items is bounded by n(1) - i, where n(1) is the number of

pages in the tree. Since each page has at least k keys, except the root page

which may have only i, we get: n(1) _< I ~ i + i. Each single page split

causes at most 2 additional pages to be written. Thus the average

number of pages written per single key insertion due to page splitting

is bounded by

2 2
(n(1) - i) • ¥ <

A page split does not require any additional page retrievals. Thus in

tile average for an index without deletions we get for a single insertion:

2
fa=h; w < i+
a k"
123

6. Deletion Process

In a dynamically changing index it must be necessary to delete

keys. The algorithm of Figure 6 deletes one key y from an index and

maintains our data structure properly. It first locates the key, say

Yi" To maintain the data structure properly, Yi is deleted if it is

on a leaf, otherwise it must be replaced by the smallest key in the

subtree whose root is P(pi ). This smallest key is found by going

from P(pi ) along the P0 pointers to the leaf page, say L, and

taking the first key in L. Then this key, say Xl, is deleted from

L. As a consequence L may contain fewer than k keys and a catena-

tion or underflow between L and an adjacent brother is performed.

Catenation: Two pages P and P' are called adjacent brothers if

they have the same father Q and are pointed to by adjacent pointers

in Q. P and P' can be catenated, if together they have no more than

2k keys, as follows: The three pages of the form

o
I "'''(Yj-I'P)'(Yj'P')'(Yj+I'Pj+I ) .... ]

P pv

P0,(xl,Pl),.-.,(x%,P~)
I
can be replaced by two pages of the form:
Q

I .... (Yj-I'P)'(Yj+I'Pj+I) ....

F
[ , x
P0,(Xl,Pl) ..... (x£,P~),(Yj,P0),(.~+i,P~+i),'-- ]
12

As a consequence of deleting the entry (yj,p') from Q it is now

po:~sible that Q contains fewer than k keys and special action must

be taken for Q. This process may propagate up to the root of the

t Fee .

Underf]ow: If the sum of tile number of keys in P and P' is greater

than 2k, then the keys in P and P' can be equally distributed,

the process being called an underflow, as follows:

Perform the catenation between P and P' resulting in too large

a P. 'lllisis possible since P is in main store. Now split P

"in the middle" as described in Section 4 with some obvious minor modi-

f ica tioos.

Note that underflows do not propagate. Q is modified, but the

number of keys in it is not changed.

To illustrate the deletion process consider the index in Figure 2.

Deleting key 9 results in the index in Figure 5.

125

apply retrieval
algorithm for y

1
Cy'oun~?D No

I YES
( y o nY
l eEa Sf )p a g e ? ] delete y [
= [ from leaf ]
[No
retrieve pages !
down to leaf
along Po pointers '

1
replace y by
first key on
leaf page

1
delete first
necessary,
perform
key on leaf catenations
and underflow

(i) The key to be deleted is not in index, take appropriate action.

Figure 6. Deletion Algorithm

12C

7. Cost of D e l e t i o n s

For a s u c c e s s f u l deletion, i.e., if the key y to be deleted is in

the index, the least amount of w o r k is r e q u i r e d if no catenations or

underflows are p e r f o r m e d and y is in a leaf. This r e q u i r e s :

f . = h; w . = I;
mln mln

If y is not in a leaf and no catenations or u n d e r f l o w s occur,

then

f = h; w = 2;

A m a x i m a l amount of w o r k must be done if all but the first two pages

in the r e t r i e v a l path are catenated, the son of the root in the r e t r i e v a l

path has an underflow, and the root is modified. T h i s requires:

f = 2h - i; w = h + i;
max max

A~ in the case of the i n s e r t i o n process the b o u n d s o b t a i n e d are

sharp, but very far apart and assumed rarely except in p a t h o l o g i c a l

examples. To obtain a more useful m e a s u r e for the average amount of

work necessary to delete a key, let us consider a "pure d e l e t i o n p r o c e s s "

during w h i c h all keys in an i n d e x I are deleted, b u t no keys are

inserted.

D i s r e g a r d i n g for the moment catenations and underflows we may get

fl = h and w I = 2 for each deletion at worst. But this is the b e s t

bound obtainable if one considers an example in w h i c h keys are always

deleted from the root page.

127

Each deletion causes at most one underflow, requiring f2 = 1

a d d i t i o n a l fetches and w2 = 2 a d d i t i o n a l writes.

The total n u m b e r of p o s s i b l e catenations is b o u n d e d by

n(1) - i, w h i c h is at most I - i Each c a t e n a t i o n causes 1

k
a d d i t i o n a l fetch and 2 a d d i t i o n a l writes, w h i c h results in an average

1
f3 = Y <i

w 3 =y <~ •

Thus in the a v e r a g e we get:

i
fa = fl + f2 + f3 < h + i + ~

2 2
w a = w I + w 2 + w3< 2 + 2 + ~= 4 + ~ .
8. Page Overflow and Storage Utilization

In the scheme described so far utilization of back-up store

may be as low as 50% in extreme cases--disregarding the root page--if

all pages contain only k keys. This could be improved by avoiding

certain page splits.

An overfgow between two adjacent brother pages P and P' can be

performed as follows: Assume that a key must be inserted in P and

P is already full, but P' is not full. Then the key is inserted

into the key-sequence in P and an underflow as described in Section 6

between the resulting sequence and P' is performed. This avoids the

need to split P into two pages. Thus a page will be split only if

both adjacent brothers are full, otherwise an overflow occurs.

In an index without deletions overflows will increase the storage

utilization in the worst cases to about 66%. If both insertions and

deletions occur, then the storage utilization may of course again be as

low as 50%. For most practical applications, however, storage utilization

should be improved appreciably with overflows.

One could, of course, consider a larger neighborhood of pages than

just the adjacent brothers as candidates for overflows, underflows,

and catenations and increase the minimal storage occupancy accordingly.

Bounds for the cost of insertions for a scheme with overflows are

easily derived as:

f . = h; w . = I;
mln mln
f = 3h - 2; w = 2h + 1
max max
129

For a pure insertion process one obtains as bounds for the average

cost:

2 9
fa < h + 2 + ~; Wa < 3 + ~ .

It is easy to construct examples in which each insertion causes an

overflow, thus these bounds cannot be improved very much without special

assumptions about the insertion process.

9. Maintenance Cost for Index with Insertions and Deletions

, The main purpose of this paper is to develop a data structure which

allows economical maintenance of an index in which retrievals, insertions,

and deletions must be done in any order. We will now derive bounds on

the processing cost in such an environment.

The derivation of bounds for retrieval cost did not make any assump-

tions about the order of insertions or deletions, so they are still valid.

Also, the minimal and maximal bounds for the cost of insertions and dele-

tions were derived without any such assumptions and are still valid. The

bounds derived for the average cost, however, are no longer valid if

insertions and deletions are mixed.

The following example shows that t h e upper bounds for the average

cost cannot be improved appreciably over the upper bounds of the cost

derived for a single retrieval or deletion.

]~xamp!e: Consider the trees T2 in Figure 2 and T5 in Figure 5.

Deleting key 9 from T2 leads to TS, and inserting key 9 in T5

leads back to T 2. Consider a sequence of alternating deletions and

insertions of key 9 being applied starting with T 2.

Case i: No page overflows, but only page splits occur:

i) Each deletion of key 9 from T2 requires:

3 retrievals to locate key 9, namely pages i, 2, 6.

1 retrieval of brother 5 of page 6 to find out that

pages 5 and 6 can be catenated.

13.,I

2 pages, namely 5 and 2 are m o d i f i e d and must be written.,

Pages 6 and 3 are deleted from the tree T 2.

Thus f = 5 and w = 2. But f = 5 = 2h - 1 = fmax and

w= 2 = h- 1 =w -2.
max

ii) Each insertion of key 9 into T5 requires:

2 retrievals to locate slot for 9 in page 5.

5 pages m u s t b e written, namely i, 2, 3, 5, 6.

Thus

f= 2=h= f
max

w = 5 = 2h+ 1 =w
max

Case 2: Consider a scheme w i t h page overflows.

i) Deletion of k e y 9 leads to the same results as in Case i.

ii) Insertion of key 9 requires:

2 retrievals to locate slot for 9 on page 5.

2 retrievals of b r o t h e r s 4 and 7 of 5 to find out that

5 must be split.

5 pages must be w r i t t e n as in Case i.

Thus :

f= 4= 3h-2= f
max

w = 5 = 2h + 1 =w
max

Analogous examples can be c o n s t r u c t e d for a r b i t r a r y 11 and k.

From the analysis it is clear that the p e r f o r m a n c e of our scheme depends

on the actual sequence of i n s e r t i o n s and deletions. The interference

between insertions and deletions may degrade the performance of the

scheme as opposed to doing insertions or deletions only. But even in

the worst cases thls interference degrades the performance at most by a

factor of 3.

It is an open question how important this interference is in any

actual applications and how relevant our worst case analysis is.

Although the derivable cost bounds are worse, the scheme wlth overflows

performed better in our experiments than the scheme without overflows.

133
i + ~ +

Vl Vl Ii ii

.,~ ~ .~ -~ o~

o~.~o~,~ ~ - , ~, ' +
~ o ~ ,~ ~ ~,
aJ
.~ ~ .~ ~ v, y , ,
c~

o=
~ m -I- +

,, ,, ,, ~ ,, aJ

~ o ~ ~ ~ ~ v, ~ ,, o
o

.,~

='~ ~ ~ ~ + , +

¢~ ,,-I
M '~::~~4.A ~ ~ ~ Vl m m m ~ I
~ .I.I ~ ~ V I ~ II II
g g x ~ o ~
i11

o~ -~ ~ + ~I~
•~l ..C: m . H ,.-4
~ o = ~ = ~ ~ + , +
l::: ~ m ,C eL.J=

.... , .... . °.
~.~o.~ v ~ " "
U

,--I 0 o
n II v I II II II

oj
i0. Choice of k

The performance of our scheme depends on the parameter k. Thus

care should be taken in choosing k to make the performance as good as

possible.

To obtain a very rough approximation to the performance of the

scheme we make the following assumptions:

i) The time spent for each page which is written or fetched can

be expressed in the form:

+ B(2k+l) + y ~n(~k+l)

~: fixed time spent per page, e.g., average disc seek time

plus fixed CPU overhead, etc.

~: transfer time per page entry.

y: constant for the logarithmic part of the time, e.g.,

for a binary search.

~: factor for average page occupancy, i _< ~ < 2.

We assume that modifying a page does not require moving keys within

a page, but that the necessary channel subcommands are generated to

write a page by concatenating several pieces of information in main

store. This is the reason for our assumption that fetching and writing

a page takes the same time.

i) The average number of pages fetched and written per single

transaction in an environment of mixed retrievals, insertions,

and deletions is approximately proportional--see Figure 7--to

135

h, say 6h. The total time T spent per transaction can

then be approximated by:

T = 6h(~+6(2k+l)+ Y En(vk+l)) . Approximating h itself by:

h ~ logvk+l(l+l) where I is the size of the index, we get:

T ~ r a = 6 lOg~k+l(l+l) (0M-B(Ek+l)+y En(vk+l))

Now one easily obtains the minimum of Ta if k is chosen such

that:

c~ 2 ((vk+l) En(vk+l) - (2k+l)) = f(k,v)

Neglecting CPU time, k is a number which is characteristic for

the device used as b a c k u p store. T o obtain a near optimal page size

for our test e x a m p l e s we assumed a = 50 ms and ~ = 90 ps. According

to the table in Figure 8 an a c c e p t a b l e choice should be 64 < k < 128.

For reasons of p r o g r a m m i n g convenience w e chose k = 60 resulting in a

page size of 120 entries.

k f(k, 1) f(k,l.5) f(k,2)

2.0000OE+OO 1.591r7E+O0 2.3n35GE+O~ ':';. 9 4 7 ' 7 2 ~+,~ 9
4.00000E+OO 7.nfi437E+O0 9 lfilg2E+;)q
8. OOF~F~OE+ON 2.2553DE+01 ~ 7hSnlT+n ! .11 ~4 ? :2+r !
l. G O O O O E + q l C.33292E+QI 7 h 2q5fi~:+9,!
3.20DOOE+C! !.557C2E+C2 ,, J,.C5Z+~2 2.A . T r , F + ; " 2
&.4OOOOE+[)i 4. !3,c7DE+n2 t , . q 7 ' : 7 <F+P2
1.280~DE+C2 fi.~6g31rT+q2 I ":" 72': E+ C"3 1 i re~l 1 r+-'?
2.5~000E+~2 2.33q22'7+R3 2 5t 2 q ~ 2 + ' : 3 .2. c ~ ~'°"r + -
5.]2COOE+' 2 5.377522+2'3
]. O24q OE+~?3 ! . 2!~725':+C4 !.~57!,2r+ ~
2. F;42CG!:+q3 2.7!50C1+C4 ,-, o2 pq2,-+,: b o.gq!?l pT+ n~
1 ; . O'qFOGT+?3 5. 9051~7"+8t: r . . ~2qC2+9 ' r r~Ts ]r+~l,

:;. I C 20:i2+ 23 Zo 312F~gE+r5 1.37"~,'r:+~q !.47F17'+qq

]..63q*~3'2+'iq 2.P5235~+~)5 2.~55!4q+25
3.27c23"+F4 S.15877Z+~5 ~.421~42K+05 C.F!2,~2~+',F
F.551gO~+%~ 1,322522+0~ - . 3 7 . ) ~7 ~ , .o+r
1 ],fi !.413!;"'+9F

Figure 8. The Function f(k,~) for Optimal

Choice of k
The size of the index which can be stored for k = 60 in a page

tree of a certain height can be seen from Figure 9.

Height of Minimum Maximum

page tree index size index size

1 i 120
2 121 14640
3 7441 1771560
4 453961 214358880

Figure 9. Height of Page Tree and

Index Size
137

ii. Experimental Results

The algorithms presented here were programmed and their performance

measured during various experiments. The programs were run

on an IBM 360/44 computer with a 2311 disc unit as a backup store. For

the index element size chosen (14 8-bit characters) and index size

generally used (about i0,000 index elements), the average access mechanism

delay for this unit is about 50 ms, after which information transfer

takes place at the rate of about 90 Ds per index element. From these two

parameters, our analysis predicts an optimal page size (2k) o n the order

of 120 index elements.

The programming included a simple demand paging scheme to take advan-

tage of available core storage (about 1250 index elements' worth) and thus

to attempt to reduce the number of physical disc operations. In the

following section by ~rtual disc read we mean a request to the paging

scheme that a certain disc page be available in core; a virtual disc

read will result in a physical disc read only if there is no copy of

the request~Idisc page already in the paging area of core storage. A

~rt~ disc w~te is defined analogously.

At the time of this writing ten experiments had been performed.

These experiments were intended to give us an idea of what kind of

performance to expect, what kind of storage utilization to expect, and

so forth. For us the specification of an experiment consists of choosing

i) whether or not to permit overflows on insertion,

2) a number of index elements per page, and

3) a sequence of transactions to be made against an initially

empty index.

At several points during the performance of an experiment certain per-

formance variables are recorded. From these the performance of the

algorithms according to various performance measures can be deduced; to

wit

i) % storage utilization

2) average number of virtual disc reads/transaction

3) average number of physical disc reads/transaction

4) average number of virtual disc writes/insertion or deletion

5) average number Of physical disc writes/insertion or deletion

6) average number of transactions/second.

We now summarize the experiments. Each experiment was divided into

several phases, and at the end of each of these the performance variables

were measured. Phases are denoted by numbers within parentheses.

El: 25 °elements/page, overflow permitted.

(i) i0000 insertions sequential by key,

(2) 50 insertions, 50 retrievals, and i00 deletions uniformly

random in the key space.

E2: 120 elements/page; otherwise identical to El.

E3: 250 elements/page; otherwise identical to El.

E4: 120 elements/page, overflow permitted.

(i) i0000 insertions sequential bY key,

(2) i000 retrievals uniformly random in key space,

139

(3) i0000 sequential deletions.

E5: 120 elements/page, overflow not permitted.

(i) 5000 insertions uniformly random in key space,

(2) i000 retrievals uniformly random in key space,

(3) 5000 deletions uniformly random in key space.

E6: Overflow permitted; otherwise identical to E5.

E7: 120 elements/page, overflow permitted.

(i) 5000 insertions sequential by key,

(2) 6000 each insertions, retrievals, and deletions uniformly

random in key space.

E9: 250 elements/page; otherwise identical to E8.

El0: 120 elements/page, overflow permitted.

(i) i00,000 insertions sequential by key,

(2) 1000 each insertions, deletions, and retrievals uniformly

random in key space,

(3) i00 group retrievals uniformly random in key space, where

a group is a sequence of i00 consecutive keys (statistics

on the basis of i0000 transactions),

(4) i0000 insertions sequential by key, to merge uniformly

with the elements inserted in phase (i).

% Storage
used VR/T* PR/T VW/I or D PW/I or D T/Sec.

El(1) 99.8 2.2 0 2.3 .04 66.1

El(2) 91.5 4.4 1.62 2.7 1.5 6.6

E2(1) 99.2 1.0 0 1.0 .008 94.5

E2(2) 87.3 2.5 1.15 1.3 i.i 6.7

E3(1) 97.6 1.0 0 1.0 .004 i00.0

E3(2) 84.7 2.4 1.08 1.3 i.i 5.2

E4(1) 99.2 1.0 0 1.0 .008 94.5

E4(2) 99.2 2.0 . . . . . . . . . 19.5

E4(3) --- 2.0 .01 2.0 0 74.1

E5(1) 67.1 1.0 .55 1.0 .56 17.0

E5(2) 67.1 2.0 .83 . . . . . . 18.2

E5(3) --- 4.0 .68 2.2 .65 12.4

E6(1) 86.7 I.i .55 I.i .54 17.1

E6(2) 86.7 2.0 • 79 24.3

E6(3) --- 4.0 .65 2.2 .62 13.4

E7(1) 96.9 1.0 1.0 .008 111.9

E7(2) 76.8 2.3 .83 1.3 .88 13.1

E8(1) 84.5 1.3 .87 1.3 .85 i0.i

E8(2) 83.9 3.7 1.00 3•0 1.00 9.5

E9(1) 86.4 i.i .84 1.0 .82 8.5

E9(2) 85.2 2.3 .94 i.i .96 8.2

El0(1) 99.8 1.9 0 1.9 •008 91.7

El0(2) 82.1 4.1 1.94 1.8 1.54 4.2

E10(3) 82.1 4.0 .03 . . . . . 75.7

El0(4) 83.8 2.2 .i0 ; 2.2 .ii 38.0

*These numbers are somewhat misleading for deletions due to the way the deleti
were programmed into the experiments. To find the necessary number of virtu~
reads, for sequential deletions subtract one from the number shown, and for
random deletions subtract one and multiply the result by about 0.5.
14]

References :

Adelson-Velskii, G. M. and Lmndis, E. M. An Information OrgaL~iz~Lio~

Algorithm. DANSSSR, No. 2, 1962.

Foster, C. C. Information Storage and Retrieval Using AVL Trees.

Proc. ACM 20th Nat'l. Conf. (1965), pp. 192-205.

Gladun, V. P. Storage Organization for Key Search and Recording.

~be~etics, Vol. i, No. 4, August 1965.

Landauer, W. I. The Balanced Tree and Its Utilization in Information

Retrieval. IEEE Trans. on Electronic Computers, Vol. EC-12, No. 6,

December 1963.

Sussenguth, E. H., Jr. The Use of Tree Structures for Processing

Files. Comm. ACM, Vol. 6, No. 5, May 1963.

BABOK 3 ONLINE - A Guide To The Business Analysis Body of Knowledge
99% (68)
BABOK 3 ONLINE - A Guide To The Business Analysis Body of Knowledge
514 pages
1972 Bayer Mccreight
No ratings yet
1972 Bayer Mccreight
17 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
Btrees Animated
No ratings yet
Btrees Animated
77 pages
FS Mod3
No ratings yet
FS Mod3
46 pages
Indexing
No ratings yet
Indexing
56 pages
Physical DBs B Tree PDF
No ratings yet
Physical DBs B Tree PDF
35 pages
2 - Indexing Structures - Ch14
No ratings yet
2 - Indexing Structures - Ch14
50 pages
Prefix B-Trees: Rudolf Bayer and Karl Unterauer Technische Universitiit Miinchen
No ratings yet
Prefix B-Trees: Rudolf Bayer and Karl Unterauer Technische Universitiit Miinchen
16 pages
Indexing
No ratings yet
Indexing
77 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
Chapter 7 - Indexing
No ratings yet
Chapter 7 - Indexing
94 pages
Unit-5 B+Trees & Hashing
No ratings yet
Unit-5 B+Trees & Hashing
37 pages
CS2202 IndexingHashing
No ratings yet
CS2202 IndexingHashing
83 pages
FS Mod 3 - Multilevel Indexing and B-Trees
No ratings yet
FS Mod 3 - Multilevel Indexing and B-Trees
37 pages
CH 12 Updated
No ratings yet
CH 12 Updated
55 pages
Software Validation Template
No ratings yet
Software Validation Template
8 pages
CSE 301 Lecture-8-Indexing WT
No ratings yet
CSE 301 Lecture-8-Indexing WT
31 pages
Ch14, Veiws, Normalization - Summary
No ratings yet
Ch14, Veiws, Normalization - Summary
68 pages
Indexing
No ratings yet
Indexing
141 pages
Dbms. 5 Unit Part-B
No ratings yet
Dbms. 5 Unit Part-B
8 pages
Physical DBs B+ Tree
No ratings yet
Physical DBs B+ Tree
35 pages
Analysis of B-Tree Data Structure and Its Usage in Computer Forensics
No ratings yet
Analysis of B-Tree Data Structure and Its Usage in Computer Forensics
7 pages
Tree-Structured Indexes: R & G Chapter 9
No ratings yet
Tree-Structured Indexes: R & G Chapter 9
34 pages
UNIT-5: Indexing and Hashing
No ratings yet
UNIT-5: Indexing and Hashing
78 pages
MODULE6A
No ratings yet
MODULE6A
25 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
9 pages
Unit 5
No ratings yet
Unit 5
99 pages
Hash Tree Index
No ratings yet
Hash Tree Index
44 pages
Unit 5 Indexing 2024
No ratings yet
Unit 5 Indexing 2024
50 pages
DSA Unit-5
No ratings yet
DSA Unit-5
7 pages
How To Prepare Functional Specification Document in SAP - 2014
No ratings yet
How To Prepare Functional Specification Document in SAP - 2014
2 pages
Datastage Scenarios
100% (1)
Datastage Scenarios
34 pages
B-Tree Resume
No ratings yet
B-Tree Resume
4 pages
Lesson 8 Cs450 - Indexing
No ratings yet
Lesson 8 Cs450 - Indexing
31 pages
DBMS Unit5
No ratings yet
DBMS Unit5
40 pages
Lecture 5.Pptx 2
No ratings yet
Lecture 5.Pptx 2
22 pages
Index and Hashing
No ratings yet
Index and Hashing
82 pages
Data Structure Final PART II
No ratings yet
Data Structure Final PART II
50 pages
B Tree
No ratings yet
B Tree
53 pages
Unit V
No ratings yet
Unit V
55 pages
B Tree Application
100% (2)
B Tree Application
6 pages
Unit Iv
No ratings yet
Unit Iv
29 pages
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
No ratings yet
CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006
52 pages
The Ubiquitous B-Tree: Computer Sctence Department, Purdue Untverstty, West Lafayette, Indiana 47907
No ratings yet
The Ubiquitous B-Tree: Computer Sctence Department, Purdue Untverstty, West Lafayette, Indiana 47907
17 pages
Database Modeling - Notes-V
No ratings yet
Database Modeling - Notes-V
9 pages
Unit Iv Indexing and Hashing: Basic Concepts
No ratings yet
Unit Iv Indexing and Hashing: Basic Concepts
35 pages
5 Unit PDF
No ratings yet
5 Unit PDF
2 pages
CH 14
No ratings yet
CH 14
6 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
B - Trees
No ratings yet
B - Trees
19 pages
Storage and Indexing
No ratings yet
Storage and Indexing
41 pages
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
No ratings yet
Find All Students With Gpa 3.0'': Can Do Binary Search On (Smaller) Index File!
42 pages
PDA TR 84 Presentation
No ratings yet
PDA TR 84 Presentation
30 pages
Database Management Systems November 6, 2008: Dynamic Indexes: Sections 14.3
No ratings yet
Database Management Systems November 6, 2008: Dynamic Indexes: Sections 14.3
38 pages
SAP EDI 850 Outbound Implementation
No ratings yet
SAP EDI 850 Outbound Implementation
13 pages
Data Structures Using C, 2e Jhalak Dutta
No ratings yet
Data Structures Using C, 2e Jhalak Dutta
16 pages
Unit-2 (Btree InsertionDelection)
No ratings yet
Unit-2 (Btree InsertionDelection)
4 pages
B Trees and B Trees
No ratings yet
B Trees and B Trees
24 pages
Assignment (DS)
No ratings yet
Assignment (DS)
8 pages
C3311
No ratings yet
C3311
2 pages
CH 13
No ratings yet
CH 13
34 pages
Chapter 11 IS Auditing Standards, Guidelines and Best Practices
No ratings yet
Chapter 11 IS Auditing Standards, Guidelines and Best Practices
9 pages
B-Trees: Balanced Tree Data Structures
No ratings yet
B-Trees: Balanced Tree Data Structures
0 pages
Seeburger Edi
67% (3)
Seeburger Edi
106 pages
Lenovo - SAP HANA Operations Guide X6-2.04.042-21-05
No ratings yet
Lenovo - SAP HANA Operations Guide X6-2.04.042-21-05
243 pages
Java Training Philippines
No ratings yet
Java Training Philippines
9 pages
Pravin-CV-Enterprise-Applications-Consultant-System Analyst
No ratings yet
Pravin-CV-Enterprise-Applications-Consultant-System Analyst
8 pages
CLOUD COMPUTING LAB MANUAL V Semester
No ratings yet
CLOUD COMPUTING LAB MANUAL V Semester
63 pages
Employee Managmnent System
No ratings yet
Employee Managmnent System
42 pages
Staxrip Plugins
No ratings yet
Staxrip Plugins
109 pages
How Many Objects Are Created When Same or Different Client Send 100 Requests To A Servlet
No ratings yet
How Many Objects Are Created When Same or Different Client Send 100 Requests To A Servlet
5 pages
Introduccion A SCRUM - V1.0 - Roger Vargas
No ratings yet
Introduccion A SCRUM - V1.0 - Roger Vargas
34 pages
Cyber Security Careers
No ratings yet
Cyber Security Careers
10 pages
Machine Learning For Intrusion Detection: Pavel Laskov
No ratings yet
Machine Learning For Intrusion Detection: Pavel Laskov
72 pages
PTS ODI11g Workshop LabBook Nov-2010
No ratings yet
PTS ODI11g Workshop LabBook Nov-2010
266 pages
Report Reference Docs 2020
No ratings yet
Report Reference Docs 2020
32 pages
SPM Project Report
No ratings yet
SPM Project Report
25 pages
Technical Paper 1
No ratings yet
Technical Paper 1
14 pages
Unit-1 L5-DBMS Data Models
No ratings yet
Unit-1 L5-DBMS Data Models
26 pages
KoBo XLSForm Examples
No ratings yet
KoBo XLSForm Examples
23 pages
What Is Data Mining Tools
No ratings yet
What Is Data Mining Tools
3 pages
Application and Network Monitoring: Lorna Robertshaw, Director of Applications Engineering OPNET Technologies
No ratings yet
Application and Network Monitoring: Lorna Robertshaw, Director of Applications Engineering OPNET Technologies
31 pages
BP030 Data Gathering Requirements
No ratings yet
BP030 Data Gathering Requirements
6 pages
The User Interface and Software Architecture: Len Bass
No ratings yet
The User Interface and Software Architecture: Len Bass
9 pages
Oracle Certification Matrix
No ratings yet
Oracle Certification Matrix
1 page
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
Manish Soni
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet