Organization and Maintenance of Large Ordered Indices
Organization and Maintenance of Large Ordered Indices
Di-82-0989
ORDERED INDICES
by
R. Bayer
and
E. McCreight
July 1970
i0
ABSTRACT
on some pseudo random access backup store like a disc or a drum. The
index and k is a device dependent natural number such that the per-
least 50% but generally much higher. The pages of the index are organized
(at least 4) transactions per second on an IBM 360/44 with a 2311 disc.
Key Words and Phrases: Data structures, random access files, dynamic
information retrieval.
i. Introduction
size physically adjacent data items, namely a key x and some associated
small parts of it can be kept in main store at one time. Thus the bulk
of the index must be kept on some backup store. The class of backup
stores considered are pseudo random access d~uides which have a rather
long access or wait time--as opposed to a true random access device like
core store--and a rather high data rate once the transmission of physically
sequential data has been initiated. Typical pseudo random access devices
are: fixed and moving head discs, drums, and data cells.
Since the data file itself changes, it must be possible not only
to search the index and to retrieve elements, but also to delete and to
which describes the page size such that the performance of the maintenance
ii0
in real time on an IBM 360/44 with a 2311 disc as backup store. According
to 2k keys, but pages need only be partially filled. Pages are the
trees grow and contract in only one way, namely nodes split off a brother,
or two brothers are merged or "catenated" into a single node. The splitting
and catenation processes are initiated at the leaves only and propagate
toward the root. If the root node splits, a new root must be introduced,
and this is the only way in which the height of the tree can increase.
ii) Storage is requested and released as the file grows and con-
iii) The natural order of the keys is maintained and allows pro-
glven key.
i) Each path from the root to any leaf has the same length h,
ii) Each node except the root and the leaves has at least k + 1
h-i
Nma x = ~ (2k+l) i = ~ i ((2k+l)h-l) ; h _> i.
i=0
Upper and lower bounds for the number N(T) of nodes of T E ~(k,h) are
given by:
To repeat, the pages on which the index is stored are the nodes of
keys.
~1111IIIII17-i]
which P(pi ) is tile root. Then for the B-trees considered here
conditions. In the figure the c~i are not shown and the page pointers
are represented graphically. The boxes represent pages and the numbers
u if the tree is empty, s does not serve any purpose for retrieval,
but will be used in the insertion algorithm. Let P(p) be the page to
to scan a page.
most h pages must be scanned and therefore fetched from backup store
to retrieve a key y. We will now derive bounds for h for a given index
115
h~
- - i
11{
per
sou
-f p=u?
\
Iso
J
4~°
l~o ~,,...J
- ] P--PL l-
I min
Ima x = 2k 2k
t = (2k+1) h - 1
This is immediate from (2.1) for h >_ i. Thus we have as sharp bounds
(3.J)
h = 0 for I = O.
118
4. Key Insertion
retrieval algorithm pointing to the last page that was scanned or having
is already full, it will be split into two pages. Logically first insert
P0*(xl,Pl),(x2,P2),-'-,(Xek+l,P2k+l)
and (3.3).
119
2.
apply retrieval
algorithm for
key y
(
l
found y? j
YES
1 No
( S=U? ) : YES _1 tree is empty, I
create root
split page
page with y
L
routine is P(s) full? )
for P(s)
insert entry ]
I (y, u) in P(s)
ILt'~
IC'~I
C,4
(v)
i inn
~4
QJ
\ r_o
i
b-.I
L~
CO
,o
r~
121
into m a i n store and how many pages must be w r i t t e n onto the backup store.
For our analysis we m a k e the following assumption: Any page, whose content
become clear during the course of this paper that a paging area to hold
written.
Cost of Insertion: For inserting a single key the least work is required
f . = h; w . = i;
mln mln
12
the root page sp]flt into two. Since the retrieval path contains h
fmax = h; Wma X = 2h + i
Note that h always denotes the height of the old tree. Although
this worst bound is sharp, it is not a good measure for the amount of
but: no keys are deleted, then we can derive a bound for the average amount
Each page split causes one (or two if the root page splits) new
pages in the tree. Since each page has at least k keys, except the root page
which may have only i, we get: n(1) _< I ~ i + i. Each single page split
number of pages written per single key insertion due to page splitting
is bounded by
2 2
(n(1) - i) • ¥ <
A page split does not require any additional page retrievals. Thus in
tile average for an index without deletions we get for a single insertion:
2
fa=h; w < i+
a k"
123
6. Deletion Process
keys. The algorithm of Figure 6 deletes one key y from an index and
maintains our data structure properly. It first locates the key, say
from P(pi ) along the P0 pointers to the leaf page, say L, and
taking the first key in L. Then this key, say Xl, is deleted from
they have the same father Q and are pointed to by adjacent pointers
o
I "'''(Yj-I'P)'(Yj'P')'(Yj+I'Pj+I ) .... ]
P pv
P0,(xl,Pl),.-.,(x%,P~)
I
can be replaced by two pages of the form:
Q
F
[ , x
P0,(Xl,Pl) ..... (x£,P~),(Yj,P0),(.~+i,P~+i),'-- ]
12
po:~sible that Q contains fewer than k keys and special action must
t Fee .
than 2k, then the keys in P and P' can be equally distributed,
"in the middle" as described in Section 4 with some obvious minor modi-
f ica tioos.
apply retrieval
algorithm for y
1
Cy'oun~?D No
I YES
( y o nY
l eEa Sf )p a g e ? ] delete y [
= [ from leaf ]
[No
retrieve pages !
down to leaf
along Po pointers '
1
replace y by
first key on
leaf page
1
delete first
necessary,
perform
key on leaf catenations
and underflow
7. Cost of D e l e t i o n s
f . = h; w . = I;
mln mln
then
f = h; w = 2;
f = 2h - i; w = h + i;
max max
inserted.
1
f3 = Y <i
w 3 =y <~ •
i
fa = fl + f2 + f3 < h + i + ~
2 2
w a = w I + w 2 + w3< 2 + 2 + ~= 4 + ~ .
8. Page Overflow and Storage Utilization
P is already full, but P' is not full. Then the key is inserted
between the resulting sequence and P' is performed. This avoids the
need to split P into two pages. Thus a page will be split only if
Bounds for the cost of insertions for a scheme with overflows are
f . = h; w . = I;
mln mln
f = 3h - 2; w = 2h + 1
max max
129
For a pure insertion process one obtains as bounds for the average
cost:
2 9
fa < h + 2 + ~; Wa < 3 + ~ .
overflow, thus these bounds cannot be improved very much without special
and deletions must be done in any order. We will now derive bounds on
The derivation of bounds for retrieval cost did not make any assump-
tions about the order of insertions or deletions, so they are still valid.
Also, the minimal and maximal bounds for the cost of insertions and dele-
tions were derived without any such assumptions and are still valid. The
bounds derived for the average cost, however, are no longer valid if
The following example shows that t h e upper bounds for the average
cost cannot be improved appreciably over the upper bounds of the cost
w= 2 = h- 1 =w -2.
max
Thus
f= 2=h= f
max
w = 5 = 2h+ 1 =w
max
5 must be split.
Thus :
f= 4= 3h-2= f
max
w = 5 = 2h + 1 =w
max
factor of 3.
actual applications and how relevant our worst case analysis is.
Although the derivable cost bounds are worse, the scheme wlth overflows
Vl Vl Ii ii
.,~ ~ .~ -~ o~
o~.~o~,~ ~ - , ~, ' +
~ o ~ ,~ ~ ~,
aJ
.~ ~ .~ ~ v, y , ,
c~
o=
~ m -I- +
,, ,, ,, ~ ,, aJ
~ o ~ ~ ~ ~ v, ~ ,, o
o
.,~
='~ ~ ~ ~ + , +
¢~ ,,-I
M '~::~~4.A ~ ~ ~ Vl m m m ~ I
~ .I.I ~ ~ V I ~ II II
g g x ~ o ~
i11
o~ -~ ~ + ~I~
•~l ..C: m . H ,.-4
~ o = ~ = ~ ~ + , +
l::: ~ m ,C eL.J=
.... , .... . °.
~.~o.~ v ~ " "
U
4-
,--I 0 o
n II v I II II II
oj
i0. Choice of k
possible.
i) The time spent for each page which is written or fetched can
+ B(2k+l) + y ~n(~k+l)
~: fixed time spent per page, e.g., average disc seek time
We assume that modifying a page does not require moving keys within
store. This is the reason for our assumption that fetching and writing
that:
1 i 120
2 121 14640
3 7441 1771560
4 453961 214358880
on an IBM 360/44 computer with a 2311 disc unit as a backup store. For
the index element size chosen (14 8-bit characters) and index size
generally used (about i0,000 index elements), the average access mechanism
delay for this unit is about 50 ms, after which information transfer
takes place at the rate of about 90 Ds per index element. From these two
parameters, our analysis predicts an optimal page size (2k) o n the order
tage of available core storage (about 1250 index elements' worth) and thus
empty index.
wit
i) % storage utilization
several phases, and at the end of each of these the performance variables
*These numbers are somewhat misleading for deletions due to the way the deleti
were programmed into the experiments. To find the necessary number of virtu~
reads, for sequential deletions subtract one from the number shown, and for
random deletions subtract one and multiply the result by about 0.5.
14]
References :
December 1963.