File Structures PDF
File Structures PDF
Folk
Bill
Zoellick
File
Structures
Second Edition
2011
http://www.archive.org/details/filestructuresOOfolk
File
SECOND
EDITION
Structures
MICHAEL
J.
FOLK
University of Illinois
BILL ZOELLICK
Avalanche Development Company
TV Addison-Wesley
Publishing
Company,
Inc.
San Juan
Milan
Paris
Illustrator
Roy Logan
Manufacturing Supervisor
S.
J.
File structures
Michael
J.
2nd
ed.
cm.
p.
ISBN 0-201-55713-4
1.
II.
File
I.
Zoellick, Bill.
Title.
QA76.9.F5F65
1992
005.74 dc20
91-16314
CIP
instructional value.
particular
it
Many
claimed
as
was aware
trademarks.
o\\\
Where
programs or
by manufacturers and
applications.
products are
in initial caps or
all
caps.
Copyright
<
No
system, or transmitted,
Inc.
Company.
a retrieval
12
3 4 5 6 7 8 9 10-DO-9594939291
To
and
Peter
Preface
story of
how
files. It
also
Knowing
Literacy implies
is
the
The first edition told the story of file structures up to about 1980. This
second edition continues the story, examining developments such as
extendible hashing and optical disc storage that have moved from being a
research topic at the start of the
last
decade to
mature technology by
its
end.
While the history of file structures provides the key organizing principle
for
much of this
text,
we
All
VI
PREFACE
The
last
we have
enough
instance,
the
decade
past
gigabytes of
RAM
that,
on
assuming
files
RAM
when
sorting
available
is
sorting of large
is
the
in
amount of
RAM.
When more
differently.
increase in
the
is
disk.
available, sorting
second edition
is
we
can approach
it is
that
on disk
is
reflects this
structures problems
with the
scarce, sorting
Now
file
RAM
is
on tape
much
is
computer hardware.
first
many
many
different kinds
of students in
different kinds
the fundamentals.
A word
issues
of caution:
presented in the
material.
The
It is
much time on
Move quickly
first
relatively large
six
chapters.
the low-level
through
this
to these matters
is
a reflection
PREFACE
VII
is
in B-tree design.
we
approximation of
the sequence of topics used in the book, especially through the first six
chapters. We have already stressed that we wrote the book so that it can be
read from cover to cover. It is not a reference work. Instead, we develop
Finally,
ideas as
makes
A Book
we
it
a close
in the
book
Computing Professionals
for
We
to teach, but
we now
living.
this
first six
building
file
structures.
We have
tried to present
them
programmer who
in a
way
that
is
is
both
you
UNIX
user, the
a feel for
first
files.
UNIX
why UNIX
Similarly, the
material in the
is
you may be
to
file
powerful
programs
with
in
files.
programs
as a
your needs.
professionals
CD-ROM.
Appendix
with the need
in
introduced
principles
design
A not only provides an example of how the
this text are applied to this important medium, but it also gives you a good
to understand
are confronted
introduction to the
medium
itself.
and use
VIII
PREFACE
Acknowledgements
There are a number of people we would like to thank for help in preparing
this second edition. Peter Shepard, our editor at Addison- Wesley, initiated
the idea of a new edition, kept after us to get it done, and saw the
production through to completion. We thank our reviewers; James
Canning, Jan Carroll, Suzanne Dietrich, Terry Johnson, Theodore Norman, Gregory Riccardi, and Cliff Shaffer. We also thank Deebak Khanna
for comments and suggestions for improving the code.
Since the publication of the first edition, we have received a great deal
of feedback from readers. Their suggestions and contributions have had a
major effect on this second edition, and in fact are largely responsible for
our completely rewriting several of the chapters.
Colleagues with whom we work have also contributed to the second
edition, many without knowing they were doing so. We are grateful to
them for information, explanations, and ideas that have improved our own
understanding of many of the topics covered in the book. These colleagues
include Chin Chau Low, Tim Krauskopf, Joseph Hardin, Quincey Koziol,
Carlos Donohue, S. Sukumar, Mike Page, and Lee Fife.
Thanks
are
still
outstanding to people
who
contributed to the
initial
Marilyn Aiken, Art Crotzer, Mark Dalton, Don Fisher, Huey Liu,
Gail Meinert, and Jim Van Doren.
We thank J. S. Bach; whose magnificent contribution of music to work
edition:
by makes
up with
all
fathers
and husbands
at
who
night to
up too early to
write some more. It's
get
Boulder, Colorado
B.Z.
Urbana,
M.F.
Illinois
Contents
The Heart of
1.2
A
A
1.3
File Structure
Short History of
Key Terms
Fundamental
File Processing
2.2
Opening
Operations
Files
2.3
Closing
2.4
13
Files
14
2.4.2 A Program
to Display the
Seeking
2.1
2.5
Design
File Structure
Conceptual Toolkit:
Summary
Design
14
Contents of a
File
15
18
18
2.5.1 Seeking
in
2.5.2 Seeking
in
Pascal
19
20
2.6
2.7
The
2.8
UNIX
21
Directory Structure
22
UNIX
23
23
IX
CONTENTS
2.9
File-related
2.10
UNIX
Summary
and Pipes
I/O Redirection
Header
27
Key Terms
Further Readings
26
Files
Commands
Filesystem
24
25
29
26
Exercises
31
33
Disks
37
37
38
41
45
47
49
Magnetic Tape
56
56
3.4
Storage as
3.5
A Journey
61
Byte
3.5.1 The
File
3.5.2 The
I/O Buffer
62
63
64
Manager
64
3.6
Buffer
Management
I/O
in
UNIX
I/O
68
3.7
57
59
60
Hierarchy
of
53
54
35
69
69
72
72
3.7.2 Linking
File
Names
3.7.3 Normal
Files,
to Files
76
78
66
CONTENTS
3.7.4 Block
78
I/O
79
Summary
80
Key Terms
Further Readings
Field
87
Exercises
91
Fundamental
4.1
82
79
80
File Structure
4.1.1 A Stream
Concepts
94
94
File
96
of Fields
99
101
Record Access
File
109
111
for Sequential
Processing
115
Structures
114
117
4.4
4.5
File
Access and
Beyond Record
File
120
Organization
Structures
122
123
124
4.5.3 Metadata
125
125
4.6
103
Dump
109
4.3
93
File
128
in
One
Access
File
132
133
134
134
136
129
117
107
XI
XJI
CONTENTS
Summary
142
Further Readings
Programs
Pascal
152
Programs
167
Files for
Data Compression
Performance
183
185
185
186
188
5.2
146
Exercises
144
153
Organizing
5.1
Key Terms
5.1.5 Compression
in
UNIX
189
Reclaiming Space
in Files
190
189
190
198
201
An
in
RAM
File in
Files
203
204
204
206
5.4
Keysorting
209
Why
213
Further Readings
207
208
Summary 214
Key Terms
223
192
196
211
217
Exercises
219
212
CONTENTS
Indexing
225
6.1
What
6.2
6.3
6.4
6.5
6.6
Retrieval
6.7
an Index?
Is
226
6.7.1 A
Too Large
to
Hold
227
Memory
First
Attempt
6.8
Selective Indexes
6.9
Binding
230
234
235
239
242
Lists
242
at a Solution
Summary
in
File
244
References
List of
248
249
Key Terms
250
Further Readings
252
253
Exercises
256
of Large
7.1
Files
A Model
for
257
7.1.1 Matching
Names
in
Summary
Lists
Application of the
Model
268
of the
to a General
Model
to the
Second Look
at
for
Model
266
Ledger Program
Ledger Program
276
276
Sorting in
268
271
7.4
259
263
7.2.2 Application
7.3
Two
259
RAM
Heapsort
I/O:
of Lists
279
in
File
the File
283
280
281
278
XIII
CONTENTS
XIV
7.5
Merging
7.5.1
Way
as a
of Sorting Large
7.5.2 Sorting a
File
That
on Disk
Number
of
292
File Size
285
287
290
Is
Files
293
304
7.5.10 Effects
of
Multiprogramming
312
314
Sort-Merge Packages
318
in
UNIX
Summary
Key Terms
322
Further Readings
310
315
310
7.8
307
309
311
7.7
295
298
UNIX
318
318
Utilities in
325
317
UNIX
Exercises
320
328
331
8.1
Introduction:
8.2
8.3
8.4
AVL
8.5
8.6
8.7
8.8
Splitting
Trees
as a
334
336
Solution
337
340
343
the
and Promoting
347
Bottom
347
Trees
345
XV
CONTENTS
8.9
8.10
B-Tree Nomenclature
8.11
8.12
8.13
364
364
A Way
to
Improve Storage
371
Utilization
8.15
B* Trees
8.16
372
373
375
376
of Virtual B-Trees
377
8.17
8.18
Summary
Key Terms
380
Further Readings
Programs
Pascal
366
370
8.13.1 Redistribution
8.14
352
362
379
383
Exercises
387
to Insert
Programs
382
377
Keys
into a
Keys
to Insert
B-Tree
into a
389
B-Tree
397
Sequential
9.2
406
Sequence Set
407
9.2.2 Choice
of
Block Size
9.3
Adding
9.4
407
410
9.5
The Simple
9.6
Simple Prefix
Prefix
B+
41
Tree
Keys
413
416
Tree Maintenance
417
in
in
9.7
9.8
417
418
421
Variable-order B-Tree
422
CONTENTS
XVI
9.9
Loading
+
9.10
9.11
B-Trees,
B+
10.1
425
Key Terms
436
B+
Trees in Perspective
437
Exercises
443
Further Readings
Hashing
Tree
429
Trees
Summary 434
B+
Simple Prefix
445
446
Introduction
10.1.1 What
is
Hashing?
447
448
10.1.2 Collisions
10.2
10.3
among Addresses
10.3.2
Some
450
455
456
10.4
How Much
Extra
Memory
How
Storing
for Different
of
Packing Densities
10.8
of
476
Tombstones
of Deletions
480
for Insertions
481
483
483
484
10.9
471
472
479
10.7.3 Effects
466
468
10.7.2 Implications
463
467
Buckets on Performance
Making Deletions
462
by Progressive Overflow
10.6.1 Effects
10.7
Should Be Used?
10.6
461
462
10.5
453
454
488
487
486
482
431
CONTENTS
Summary 489
Key Terms
Further Readings
492
Extendible Hashing
11.1
Introduction
11.2
How
503
504
505
505
11.2.1 Tries
11.3
495
Exercises
501
Implementation
507
into a Directory
Handle Overflow
to
508
510
510
11.4
513
514
519
520
Deletion
11.4.1 Overview
of the Deletion
11.4.2 A Procedure
520
Process
Buddy Buckets
for Finding
522
526
11.5.1 Space
526
11.6
11.6.3 Approaches
Further Readings
Appendix A:
Appendix
Using
A. 2.
Exercises
File Structures
Introduction to
535
533
537
539
A.2
A. 2.1
528
to Controlling Splitting
Key Terms
A.l
this
527
528
530
534
Buckets
Alternative Approaches
Summary
526
Utilization for
520
522
CD-ROM
as a File
CD-ROM
542
CD-ROM
Short History of
on
543
CD-ROM
543
Structure Problem
545
541
XVII
XVIII
CONTENTS
A.3
Physical Organization of
A. 3.1 Reading
A. 3. 2
CLV
Pits
CAV
Instead of
547
549
A. 3. 4 Structure of a Sector
A.4
CD-ROM
552
552
A. 4. 3 Storage Capacity
A. 4. 4
553
Read-Only Access
553
A.5
Tree Structures on
A.
Hashed
A.
Files
CD-ROM
on
Bucket Size
A. 6. 3
How
The
CD-ROM
System
Helps
558
Design Exercise
560
A. 7. 3
A Hybrid Design
562
559
563
566
Index
559
559
A. 7. 2
Summary
556
File
The Problem
CD-ROM
557
558
the Size of
CD-ROM
A. 7.1
555
556
557
A. 6. 2
A.6. 4 Advantages of
A.7
553
A. 5.
A.6
553
554
A. 5.3 Special
A.
CD-ROM
Design Exercises
A. 5.1
552
552
Seek Performance
A. 4.1
546
546
549
3 Addressing
A. 3.
CD-ROM
and Lands
581
575
567
572
567
Introduction to
File Structures
CHAPTER OBJECTIVES
Introduce the primary design issues that characterize
file
structure design.
file
to design
structures teaches us
file
our
file
own
file
structures.
CHAPTER OUTLINE
1.1
1.2
1.3
mm
Design
1.1
The Heart
Design
of File Structure
Disks are slow. They are also technological marvels, packing hundreds of
megabytes on disks that can fit into a notebook computer. Only a few years
ago, disks with that kind of capacity looked like small washing machines.
However, relative to the other parts of a computer, disks are slow.
How slow? The time it takes to get information back from even
relatively slow electronic random access memory (RAM) is about 120
nanoseconds, or 120 billionths of a second. Getting the same information
from a typical disk might take 30 milliseconds, or 30 thousandths of a
second. To understand the size of this difference, we need an analogy.
Assume that RAM access is like finding something in the index of this
book. Let's say that this local, book-in-hand access takes 20 seconds.
Assume that disk access is like sending to a library for the information you
cannot find here in this book. Given that our "RAM access" takes 20
seconds,
how
long does the "disk access" to the library take, keeping the
ratio the
same
as that
is
of a
real
from the
or
RAM.
On the other hand, disks provide enormous capacity at much less cost
RAM. They also keep the information stored on them when they are
Good
file
a disk's relatively
is
all
file
the capacity
without making our applications spend a lot of time waiting for the
how to develop such file designs.
its
structure
disk.
1.2
A Short
Put another way, our goal is to show you how to think creatively about file
structure design problems. Part of our approach to doing this is based on
history: After introducing basic principles of design in the first part of this
book, we devote the last part to studying some of the key developments in
file design over the last 30 years. The problems that researchers struggle
with
file
file
reflect the
design problem.
problems.
we would
be
information
of our analogy,
few accesses
studies of
record
among
But having
sons.
get everything
client's
would
to
prefer to get
all
It is
relatively easy to
these goals
when we have
it.
come up with
files
file
files
Designing
file
meet
structures
information
Early
file.
is
work with
INTRODUCTION TO
STRUCTURES
FILE
the data
files
requiring
many
In
AVL
tree,
an elegant, self-adjusting
RAM.
tree,
of records.
It took nearly 10 more years of design work before a solution emerged
in the form of the B-tree. Part of the reason that finding a solution took so
long was that the approach required for file structures was very different
from the approach that worked in RAM. Whereas AVL trees grow from
down
the top
as
grow from
the
longer could
a file
bottom up.
was a cost: No
this
called a
list is
Over
B + tree.
many commercial
file
entries
entry
among
trees
became
number of
B+
is
indexed in
means
the
number of entries
a single
in the file
grow
and k
is
in
the
you can
find
one
delete entries,
performance
Being able to get information back with just three or four accesses is
But how about our goal of being able to get what we want
with a single request? An approach called hashing is a good way to do that
with files that do not change size greatly over time. From early on, hashed
indexes were used to provide fast access to files. However, until recently,
pretty good.
svstems tor
A CONCEPTUAL TOOLKIT:
with one
We
most, two disk accesses no matter how big the file becomes.
book with a careful look at this work, which took place from
close this
A Conceptual
first
As we move through
decades, watching
first
STRUCTURE LITERACY
or, at
1.3
FILE
the developments in
file
file
it
addresses
dynamic
files
we
emerging.
see
We
that
the
decrease the
and addressing
shown
us that
the tools
and extendible hashing, we develop mastery and flexibility in our own use
of the tools. In other words, we acquire literacy with regard to file
This text
structures.
is
file
structure
literacy.
Chapters
dynamic hashed
files.
SUMMARY
The key design problem that shapes
large amount of time that is required
structure designs focus
file
structure design
is
the relatively
to get information
on minimizing disk
accesses
want
is
already in
RAM.
This text begins by introducing the basic concepts and issues associated
with file structures. The last half of the book tracks the development of file
structure design as
it
addressed throughout
this
evolution
is
last
INTRODUCTION TO
FILE
STRUCTURES
accesses for
KEY TERMS
AVL
tree.
B-tree.
good
RAM.
the tree
B+
tree.
on
variation
length in Chapter
We
9.
as disks.
Hashing. An
access
mechanism
key into
Fundamental
File
Processing Operations
CHAPTER OBJECTIVES
Describe the process of linking a logical file within
to an actual physical file or device.
program
files.
UNIX
operations and
view of
a file,
commands
and describe
based on
this
UNIX
view.
CHAPTER OUTLINE
2.1
2.6
2.2
Opening
2.7
The
2.3
Closing Files
2.8
2.4
Files
Directory Structure
UNIX
UNIX
2.8.1
Physical Devices as
2.8.2
The Console,
Contents of a File
2.4.3 Detecting End-of-File
2.4.1
2.5
UNIX
Seeking
Seeking in C
2.5.2 Seeking in Pascal
2.5.1
2.1
Files
the Keyboard,
2.9
File-related
2.10
UNIX
File
Header
Files
System Commands
When we
talk
about
on
a file
physically exists.
these physical
From
disk or tape,
we
when
word
file,
the
refer to a particular
is
used in
this sense,
files.
To
file is
the program, a
only about 20
The
file is
files.
application
program
relies
to take care
file,
down
or they might
come from
program
It
originate
from
some
other
the keyboard or
of
down
the line
OPENING FILES
from the
physical
files
on
making
phone
such
line)
IBM's OS/MVS,
as
file
hookup between
or device.
When
On
'
file named
make the hookup by assigning a logical file (phone
The number identifying the particular phone line that is assigned
and then
myfile.dat
line) to
is
it.
to
name
is
what we use
is
the
to refer to the
file's
file
logical
inside the
2.2
not physically
Opening
Files
Once we have
we need
to declare
statement. Creating a
file
also
opens the
file in
file
is
compilers vary widely with regard to I/O procedures, since standard Pasway of I/O definition. Throughout this book we use the term Pasdiscussing features common to most Pascal implementations. When we refer to
when
ready for
+ Different Pascal
cal
it is
Turbo
Pascal,
we
say so.
10
FUNDAMENTAL
FILE
PROCESSING OPERATIONS
Logical
files
Program
Limit of approximately
20 phone lines
Physical
files
Printer
and physical
files
relies
in
Turbo
logical
and devices.
statement
Pascal
is
statement
used to create
we might
le
le)
use
is
new
myfile.dat*);
as:
OPENING FILES
CLEAN UP
YOUR MESS
11
SAFETY
FIRST
Note
that
statement.
we use the
To create a
files
logical
file
in
or I/O devices
file
Turbo
'myfile.dat');
i 1 e
rewrite (inp_file)
FUNDAMENTAL
PROCESSING OPERATIONS
FILE
ments and
a third
argument
that
fd = open(f ilename,
The
is
optional:
flags,
[pmode]);
return value fd and the arguments filename, flags, and pmode have the
following meanings:
fd
The fde
descriptor.
Using our
is
the
phone
program. It is
open the file, this value
filename
flags
is
negative.
file
name. (Later
we
the
file's
The
argument can be
location. This
argument
pathname.)
is
"
OR
D_APPEND
Append every
the
0_CREAT
file.
no
This has
CUE X C L
CURDONLY
CURDWR
D_TRUNC
Open
a file for
reading only.
Open
a file for
zero, destroying
CUHRONLY
Open
its
it
is
specified
to a length
of
contents.
pmode
If
0_C RE AT
ment
pmode
is
specified,
pmode
a three-digit octal
is
number
that indicates
how
the
file
piler.
The name of
system.
the include
file is
file
it
UNIX
system or
C com-
CLOSING FILES
digit).
The
sion, the
first bit of each octal digit indicates read permissecond write permission, and the third execute per-
mission. So,
owner has
if pmode is
the octal
number
0751, the
file's
111
10
owner
group
PMODE = 0751=
world
would
fd = openCf ilename,
The following
already
fd
a file
new
is
new
file
only
0_TRUNC, 0751);
if there is
this
not already
name
exists,
it is
0_CREAT
If there is
with
in filename. If a file
and writing.
its
0_CREAT
fd
0_CREAT, 0751);
in filename,
for reading
file
Finally, here
file
call creates a
0_RDWR
D_EXCL,
0751);
File protection
specific language.
a file
2.3
when you
create
it.
Closing Files
file is like hanging up the
is available for taking
phone
line
phone,
the
the
hang
up
phone. When you
logical
file name or file
the
file,
close
a
or placing another call; when you
descriptor
is
file.
Closing
a file that
has been
used for output also ensures that everything has been written to the file. As
you will learn in a later chapter, it is more efficient to move data to and from
it
is
to
move
14
FUNDAMENTAL
FILE
PROCESSING OPERATIONS
Consequently, the operating system does not immediately send off the
we write, but saves them up in a buffer for transfer as a block of data.
Closing a file makes sure that the buffer for that file has been flushed of data
bytes
we have
file.
statement within
program
is
needed only
up
Some
VAX
work.
Now
2.4
you know
that
from physical
files
and receiving
data.
and
make
file
file
The
actual
form of
Some
provide access
explore
at a
much lower
some of these
level.
Our
allows us to
differences.^
Source_f i le
The RE AD(
We
to read.
name (phone
'To accentuate these differences and provide
systems level, we use the read( ) and write(
functions such as fgetc( ), jgets( ), and so on.
a
)
call
is
15
we must have
physical
file
where to place the inforfrom the input file. In this gefunction we specify the destination by
)
mation
neric
it
reads
giving the
address of the
first
where we want
Size
Finally,
READ(
memory
block
must know how much inin from the file. Here the
formation to bring
argument
A WRITE statement is
is
is
moves
in
NRITECDestinat ion_f i
Destinat i on_f
i 1
le
The
Source_addr
logical
name we
file
ze)
data.
Source_addr
WRITE(
mation that
it
Size
We
will send.
is
to be written
stored.
must be
supplied.
File
do some reading and writing to see how these functions are used. This
file processing program, which we call LIST, opens a file for
input and reads it, character by character, sending each character to the
screen after it is read from the file. LIST includes the following steps:
Let's
simple
first
2.
Read the
user's response
name of the
input
file.
name.
3.
Open
4.
While there
5.
the
file
for input.
are
still
characters to be read
a.
b.
file
file.
of
this
file,
and
program.
these implementations.
It is
FUNDAMENTAL
/*
**
*/
1 i
5 t
c
.
PROCESSING OPERATIONS
FILE
^include <stdio.h>
^include <fcntl.h>
ma i n(
char
fd;
/ *
file descriptor */
char filename[20];
int
"
!=
0)
1);
close(fd);
FIGURE 2.2 The LIST program
/*
/*
/*
Step
Step
Step
/*
/*
Step 4a */
Step 4b */
/*
Step
*/
*/
*/
*/
in C.
Steps 1 and 2 of the program involve writing and reading, but in each
of the implementations' this is accomplished through the usual functions for
handling the screen and keyboard. Step 4a, where we read from the input
the first instance of actual file I/O. Note that the read( ) call in the C
language parallels the low-level, generic READ( ) statement we described
file, is
earlier; in truth,
low-level
we
READ(
).
The arguments
information
name
file
at a
The
for
function's
first
the Pascal
read(
call
higher
level.
character variable used as a destination; given the name, Pascal can find the
address. Because of Pascal's strong emphasis
we
READ(
function
is
on variable
we must want
to
After a character
is
read,
I/O used
in
we
write
it
Once
17
the
write (
Using the
) call.
of
STDOUT
wnte( STDOUT,
&c
means: "Write to the screen the contents from memory starting at the
address Sec. Write only one byte." Beginning C programmers should pay
special attention to the use
particular
call,
as a
of the
&
symbol
very low-level
in
RAM
call,
in the write(
programmer
STDOUT,
1.
counterpart "standard
its
Files
in
UNIX."
Pascal.
in
it
to
screen
the terminal
>
VAR
c
char;
file of char;
<
infile
filename
BEGIN {main>
wnteCEnter
');
{
<
<
Step
Step
Step
>
>
>
BEGIN
readdnf ile,c)
write(c)
<
Step 4a
Step 4b
<
Step
<
>
>
END;
closednf lie)
END.
>
>
FUNDAMENTAL
FILE
PROCESSING OPERATIONS
name
is
specified in a writef
higher
at a
level.'*'
c is
of type
write(c)
As
in the read(
bytes; the
programmer need
name of the
variable
that
is
The programs
end-of-file detection.
Pascal supplies
end-of-file.
Boolean function,
As we read from
eof(
),
to test for
a file, the
location in the
next byte
As
file
read, the
is
byte. For an
empty
we
file,
eof(
) call
immediately returns
true
read.
In the
read(
2.5
) call
returns the
file.
run
as
long
function,
number of bytes
we
something
read. If
to read.
Seeking
In the preceding
sample programs
byte
we
is
system moves
file sequentially,
file.
Every time
""This is
library provides a
19
SEEKING
be able to
control the
The
action of
called seeking.
moving
two
often
file is
SEEK(
):
Source_f
Offset
The
le
logical
file
The number of
moved from
Now,
name
in
positions in the
file
will occur.
the pointer
is
to be
file.
SEEK( data,
2.5.1 Seeking
in
373
UNIX
One of
the features of
that has
any byte in
The
where the
called lseek(
j^^citz
K~
pos
fd
The
file
descriptor of the
file
to
which the
lseek(
) is
to
be applied.
byte_offset
),
a file.
lseek(
pos =
It lets
20
FUNDAMENTAL
FILE
PROCESSING OPERATIONS
origin
byte_offset
or
2-f
heek(
from
lseek(
2lseek(
^CzAL^
file;
move
to a position that
long pos
i
nt
f d
seek
is
( i
nt
2.5.2 Seeking
in
373L,
a file is a
byte-by-byte
long offset,
to
origin);
int
in Pascal differs
acter or integer, or
a file in
from the
view
basis.
When we
particular type.
within
Pascal
seek to a position,
some
use lseek(
a file.
0);
B.AJk^
pos=lseek(fd
In
& Ll 9^-
^C^? r* -
S^C^-
3fc?
it
Pascal
may
is
file is a
record can be
in
be
we
in at least
on
file is
sequence of "records" of
more complex
structure.
Addressing
if a
400.
Standard Pascal actually does not provide for seeking. The model for
is
magnetic
tape,
file
from beginning
end of
a file
requires
from the input file to a second, output file, and then adding the new
data to the end of the output file. However, many implementations
of Pascal such as VAX Pascal and Turbo Pascal have extended the
standard and do support seeking.
1, and 2 are almost always used here, they are not guaranteed to
implementations. Consult your documentation.
work
tor
all
SPECIAL CHARACTERS
There
ANSI/IEEE
is
It
may
21
IN FILES
Pascal
in
seeking:
beyond
Position(f)
may
file,
If
SeekRead(
then the
attempts to
positioned
file is
at
file.
EndPosition(f)
file
element.
Many
be examined.
file
element.
2.6
Special Characters
in Files
text,
that turn
with characters that disappear, and with numeric counts that are inserted
your files. Here are some examples of the kinds of things you might
into
encounter:
On many
value of 26)
is
appended
at
may
This
is
most
likely to
find that a
happen on
if
Control-Z (ASCII
Some
files.
applications
MS-DOS
it
there.
systems.
1"
value of 13)
""When
we
characters
as a pair
from
ASCII
character
set.
set
we
are referring to a
file
such as ASCII or
will be assumed. Appendix
set,
consisting entirely of
EBCDIC.
Unless other-
>
t
^~Q? (^
22
FUNDAMENTAL
FILE
PROCESSING OPERATIONS
LF
characters into
CR-LF
pairs.
most
likely to
encounter
phenomenon on
this
them with
as a line
a count
of
file
of the characters
in
you
are
systems.
formats under
file
from your
MS-DOS
VMS
remove car-
text.
VMS, may
as
CR
management systems
your files. You will find that they are usually associated with the concepts
of a line of text or end of a file. In general, these modifications to your files
are an attempt to make your life easier by doing things for you
automatically. This might, in fact, work out for users who want to do
nothing more than store some text in a file. Unfortunately, however,
programmers building sophisticated file structures must sometimes spend a
lot of time finding ways to disable this automatic assistance so they can have
complete control over what they are building. Forewarned is forearmed;
readers who encounter these kinds of difficulties as they build the file
structures described in this text can take some comfort from the knowledge
that the experience they gain in disabling automatic assistance will serve
them
2.7
well, over
and over,
in the future.
root
of the
programs and
data,
UNIX
and directories
references to devices, as
stored in a
In
tree-structured organization of
tree signified
name
files.
UNIX
files
shown
in
(Fig. 2.4).
UNIX,
files:
regular
directories,
directories,
files
with
directory corresponds to
what we
call its
The
file
physical
name.
Since every file in a UNIX system is part of the filesystem that begins
with root, any file can be uniquely identified by giving its absolute pathname.
For instance, the true, unambiguous name of the file "addr" in Fig. 2.4 is
23
(root)
adb
cc
console kbd
yacc
libc.a
TAPE
libm.a
'
'
The
file
2.8
mydir, addr
directory
is
is
in
Hence,
file
if
and ".."
your current
/usr6/mydir /addr.
UNIX
One of the most powerful ideas in UNIX is reflected in its notion of what
a file is. In UNIX, a file is a sequence of bytes, without any implication of
how or where the bytes are stored or where they originate. This simple
24
FUNDAMENTAL
PROCESSING OPERATIONS
FILE
conceptual view of
a file
makes
it
possible in
many
on
things
also files
But
disks.
in
because
file,
UNIX,
pressed;
the
console
accepts
corresponding symbols on
allows so
very few
sequence of bytes
a screen.
UNIX
operations on a
can
the situation
file is
logically
it is
do with
to
many
produces
How
we
UNIX
times as
the same. In
by an integer
its
the
simpler?
a file
The
may
simplest form, a
file
UNIX
trick in
UNIX
file is
is that no
view of a
represented
an index to an
is
array of
and
describes a
logical
file is
name of
whether the
file
a file,
We
see an
program
/*
/*
/*
Step
Step
Step
1)
while (readCfd, &c
writeCSTDOUT, &c
1);
/*
/*
Step 4a */
Step 4b */
>
0)
We
LIST
the
The
files in
in Fig. 2.2:
logical
file
is
some
STDOUT,
integer
defined as
earlier in the
3.
In Step 4b,
program,
we
*/
*/
*/
) call.
use the
to identify the
console as the
The statement
readCSTDIN,
k,
1);
STDOUT,
is
STDERR
25
UNIX
an error
is
When your
IN
file
which,
compiler detects an
error, it generally writes the error message to this file, which means
normally that the error message turns up on your screen. As with STDIN,
the values STDIX and STDERR are usually defined in stdio.h.
Steps 1 and 2 of the LIST program also involve reading and writing
from
STDIX
or
these devices,
2.8.3
I/O Redirection
file,
the output of
and Pipes
like to
rather than to
LIST
as
want
to
<
>
For example,
if
output from
STDOUT to
>
What
if,
you wanted
UNIX
pipes
for
file
file
list
for
file
the executable
myf
i 1
LIST program
a file called
let
programl
it
immediately
you do
I
called "list,"
we
redirect the
is
this.
in
from the
list
program
in a
file,
The notation
for a
UNIX
pipe
is '|\
Hence,
program2
Stnctly speaking, I/O redirection and pipes are part of a UNIX shell, which is the cominterpreter that sits on top of the core UNIX operating system, the kernel. For the
purpose of this discussion, this distinction is not important.
mand
26
FUNDAMENTAL
FILE
PROCESSING OPERATIONS
means take any S TDO UT output from program 1 and use it in place of any
STDIN input to program2. Since UNIX has a special program called sort,
which takes its input from STDIN, you can sort the output from the list
program, without using an intermediate file, by entering
list
Since
sort
writes
sort
its
output to
STDOUT,
on your
it
elsewhere.
2.9
File-related
UNIX,
Header
Files
file
beyond
file.
Three header files relevant to the material in this chapter are stdio.h,
and file.h. EOF, for instance, is defined on many UNIX systems in
/usr/include /stdio.h, as are the file pointers STDIN, STDOUT, and
STDERR. And the flags 0_RDONLY, 0_WRONLY, and O.RDWR
can usually be found in /usr/ include /sys/file.h or possibly one of the files that
fcntl.h,
it
includes.
It
would be
instructive for
you
to
files,
as well as
2.10
manipulating
files.
We
list a
few
manual
for
cat
tail
filenames
filename
that
to use them.
named
the text
text
file.
files,
UNIX
SUMMARY
cp
filel ftle2
mv
filel file2
rm
filenames
chmod
mode filename
files.
Is
mkdir
rmdir
name
Create
name
Remove
of the directory.
a directory
the
named
directory.
SUMMARY
This
chapter
OPEN(
introduces
CREATE(
),
fundamental operations of
the
CLOSE(
),
),
READ(
WRITE(
),
),
systems:
file
and SEEK( ).
link between a
The
six operations
for instance,
Before
we
can use
many
different
built-in
SEEK(
file.
forms.
all
logical file
all
six
are
Not
The operation
operations.
is
a physical file,
we must
link
it
to a logical
file.
In
in
or writing.
operates
CREATE(
on an already
causes a
between
a logical file
to the
is
new
physical
existing physical
and
its
file.
file,
file
to be created.
OPEN(
The CLOSE(
corresponding physical
file. It
also
is
makes sure
actually sent
file.
The
logical
and
WRITE(
),
when viewed
at a
items of information:
name of the
file
to be read
from or written
to;
low,
27
28
FUNDAMENTAL
An
address
of
computer"
An
PROCESSING OPERATIONS
FILE
memory
how much
indication of
data
is
and
to be read or written.
These three fundamental elements of the exchange are illustrated in Fig. 2.5.
READ( ) and WRITE( ) are sufficient for moving sequentially through
a file to any desired position, but this form of access is often very inefficient.
position in a
operation.
The
giving us
lseek(
file.
)
operation
great deal of
let a
program move
directly to
freedom
us view a
file as a
in deciding
how
to organize a
access, but
file
many
file.
dialects
of
Pascal do.
One
other useful
file
operation involves
is
knowing when
handled
the end of a
in different
file
ways by
different languages.
Much
files,
When we
that
to deal
little
with
details
The
try to
UNIX
file
files in a tree
with
all files
identified
Amount of data
to transfer
KEY TERMS
KEY TERMS
Access mode. Type of
access allowed.
file
The
modes
variety of access
When
Buffering.
input or output
destination immediately,
we
we
is
say that
it is
its
we
can dramatically improve the performance of proand write data if we buffer the I/O.
Byte offset. The distance, measured in bytes, from the beginning of the
file. The very first byte in the file has an offset of 0, the second byte
has an offset of 1, and so on.
CLOSE( ). A function or system call that breaks the link between a logical file name and the corresponding physical file name.
CREATE( ). A function or system call that causes a file to be created
on secondary storage and may also bind a logical name to the file's
find that
grams
that read
physical
name
see
OPEN(
).
CREATE(
call to
by
the system to
also results in
manage
the
file,
End-of-file (EOF).
An
file.
indicator within a
countered
tells if
file
that the
the end of a
file
is
a file
EOF
UNIX
UNIX).
in
File descriptor.
open(
A
)
or creat(
) call
that
is
used
as a logical
name
by
for the
UNIX
file
system calls.
Filesystem. The name used in UNIX to describe a collection of files
and directories organized into a tree-structured hierarchy.
Header file. A file in a UNIX environment that contains definitions and
declarations commonly shared among many other files and applications. In C, header files are included in other files by means of the
in later
"#include" statement
The header
29
30
FUNDAMENTAL
PROCESSING OPERATIONS
FILE
important declarations
in file processing.
may
bind
also
file
include information
Pathname.
name to a physical file. Its arguments inname and the physical file name and may also
a logical file
on how
the
file is
expected to be accessed.
rectory. If the
pathname
starts
with
a 7',
then
it
file
or di-
pathname
its file
directory.
A UNIX
Pipe.
number
that indicates
everyone
READ(
).
device.
(1) a
else.
function or system
When viewed
at
Source_file logical
call
name corresponding
to an
open
file; (2)
amount of data
SEEK(
).
to be read.
function or system
tions allow
(3)
the
the Size
programs
file.
Languages
file
from
file directly,
EXERCISES
tially)
call
By
ways
for input
(STDOUT),
standard output
default
STDERR
STDIN is
and
STDERR
(standard
STDOUT and
WRITE(
(1) a
UNIX,
(STDIN),
input
error).
ties.
In
).
function or system
When viewed
Destination_file
call
lowest
at the
level,
it
name corresponding
to an
open
file; (2)
(3)
the
the Size or
EXERCISES
1. Look up operations equivalent to OPEN( ), CEOSE( ), CREATE( ),
READ( ), WRITE( ), and SEEK in other high-level languages, such as
PL/I, COBOL, and Fortran. Compare them with the C or Pascal versions.
2.
If
a)
you use C:
Make
CREATE(
is
b)
How
OPEN(
),
Show how
lseek(
to
other:
scanfC
fscanf(
getc( )
)
)
Describe as
useful.
fgetcC
gets(
fgetsC
<
)
to the
many of these
Which belong
system?
read(
as
you
can,
and indicate
how
they might be
31
32
3.
FUNDAMENTAL
PROCESSING OPERATIONS
FILE
If
a)
file
operations
WRITE(
tell
)? If
why.
If
CREATE(
there
in
OPEN(
),
),
CLOSE(
is
an operation
is
missing,
how
to
do
are
its
),
to
perform the
), and
READ(
a certain
operation,
functions carried
out?
Implement
b)
SEEK(
if
it
does not
al-
One
new compiler
compiler.
that the
difference
when execution of a
What sorts of problems
files
did.
Look
the
at
Why
write.
of steps
in the
in the text.
loop
is test,
Each has
a while loop. In
read, write. In C,
it is
read,
the difference?
the
the
In Fig. 2.4:
a.
b.
7.
What
is
direct error
8.
the difference
Look up
the
9.
WC
Find
UNIX command
between
messages from
why
it
STDERR.
gives the
number of files
UNIX
in the directory.
stdio.h
end-of-file.
Programming Exercises
10. Make the LIST program we provide in
compiler on your operating system.
11.
Write
program
to
program
open the
to create a
file
this
chapter
a string in
it.
Write another
FURTHER READINGS
12.
Try
file
with an access
mode on
a file to read-only,
tail -n,
to
happens?
where
is
the
number of lines
STDOUT.
Change
file,
MS-DOS
environments.)
15.
Write
input,
write
b.
out, re-reversed, to
c.
the resulting
list
in
again,
sort.
FURTHER READINGS
Introductory textbooks on
only briefly,
if at all.
This
is
) and
do provide treatment of the
fundamental file operations are Bourne (1984), Kernighan and Pike (1984), and
Kernighan and Ritchie (1978, 1988). These books also provide discussions of
higher-level I/O functions that we omitted from our text.
As for UNIX specifically, as of this writing there are two dominant flavors of
UNIX: UNIX System V from AT&T, the originators of UNIX, and 4.3BSD
(Berkeley Software Distribution) UNIX from the University of California at
Berkeley. The two versions are close enough that learning about either will give you
fgetc(
).
UNIX
that
V Interface
Definition
(AT&T,
1986).
33
34
FUNDAMENTAL
your
as
specific
some
FILE
PROCESSING OPERATIONS
Problems: Files."
is
some
as well
on which all
with standard Pascal and
a Persistent
Source of
CHAPTER OBJECTIVES
Describe the organization of typical disk drives, including basic units of organization and their relationships.
Describe magnetic tapes, identify some tape applications, and investigate the implications of block size
on space requirements and transmission speeds.
Identify fundamental differences
criteria that
medium
to an application.
many of the
context of
UNIX.
35
CHAPTER OUTLINE
3.1
Disks
A Journey
3.5
3.5.1
3.5.2
Needs
3.5.3
by Sector
Organizing Tracks by Block
Nondata Overhead
The Cost of a Disk Access
Effect of Block Size on
Performance: A UNIX Example
3.1.4
Controller
3.1.5
3.1.6
3.1.7
Buffer
3.6
I/O in
3.7
Magnetic Tape
3.2.1
Tape Length
3.2.2 Estimating
The Kernel
3.7.
Linking
File
3.7.
Normal
3.7.
3.7.
3.7.
3.4
Storage as a Hierarchy
Good
design
is
Names
is
to Files
and
Block I/O
Device Drivers
The Kernel and File Systems
Magnetic Tape and UNIX
3.7.
Times
3.2.4 Tape Applications
UNIX
3.7.
Sockets
Requirements
3.2.3 Estimating Data Transmission
3.3
Management
Buffer Bottlenecks
3.6.2 Buffering Strategies
3.6.1
of a Byte
design as
medium and
it is
to
for designs
wood and
stone. Given the ability to create, open, and close files, and to
and write, we can perform the fundamental operations of file
construction. Now we need to look at the nature and limitations of the
devices and systems used to store and retrieve files, preparing ourselves for
in
seek, read,
file
design.
If files
the tools
we would
The
in
RAM,
need to build
from
file
RAM. An
is
applications.
RAM. One
impact,
would be no
there
separate discipline
would give
us
all
is
than do accesses to
all
Good
costs.
file
to arrange data in
ways
that
37
DISKS
we examine the
on the constraints
In this chapter
devices, focusing
We
of secondary storage
that shape our design work in the
characteristics
begin with
look
at
when
look
3.1
at
many
files,
byte
is
pieces of
sent
Disks
Compared to the time it takes to access an item in RAM, disk accesses are
always expensive. However, not all disk accesses are equally expensive. The
reason for this has to do with the way a disk drive works. Disk drives"
"
belong to
a class
because they
with
of devices
make
serial devices,
it
known
directly.
DASDs
(DASDs)
are contrasted
devices use media such as magnetic tape that permit only serial access
particular data item cannot be read or written until
it
in
many
all
in order.
low cost per bit. Hard disks are the most common disk used
everyday file processing. Floppy disks are inexpensive, but they are slow
and hold relatively little data. Floppies are good for backing up individual
and for transporting small amounts of data.
files or other floppies
Removable disk packs are hard disks that can be mounted on the same drive
at different times, providing a convenient form of backup storage that also
capacity and
in
directly.
becoming
Appendix A for a
its
infull
applications.)
The information
stored on a disk
is
more
platters (Fig. 3.1). The arrangement is such that the information is stored in
successive tracks on the surface of the disk (Fig. 3.2). Each track is often
""When
we
drives,
we
38
2>
2>
2>
2>
divided into
of a
disk.
file,
the
Boom
Read/write heads
Spindle
Platters
number of sectors.
When
READ(
sector
statement
is
byte from
disk
and
and
heads.
Moving
arm
arm movement
is
usually the
in storage
this
is
a disk.
from
less
bottom
one
surface,
a typical disk
and
all
other
Tracks
Sectors
Gaps
FIGURE 3.2 Surface of disk showing tracks and sectors.
of
seven cylinders.
Seven
cylinders
39
40
platters contribute
cylinder
is
two
number of
tracks per
number of platters.
function of the
on
track depends
on how densely
on the quality
can be stored on
of the recording medium and the size of the read/ write heads.) An
inexpensive, low-density disk can hold about 4 kilobytes on a track, and 35
the disk surface. (This in turn depends
bits
= number of tracks
to
we know
compute
the
for instance,
records
on
"typical"
following characteristics:
How many
Number
Number
Number
of cylinders = 1,331.
file
20,000
- =
40
=11
two
One
= 512
Number
records, the
1AAnn
10,000
sectors.
= 440
sectors
file
requires
so the
number of cylinders
1 1
required
10,000
is
approximately
22.7 cylinders.
440
Of course,
it
may
be that
a disk drive
41
DISKS
might
in fact
hundreds, of cylinders.
we
look
at
block organizations.
Sectors
a track.
The
that
one
good way
all
15\ 16 117/18X19
72
32 \ 31
(a)
23 \ 4
27/14/
17/30/11
i20\7
(b)
in the
same
adjacent sectors.
track,
That
is
42
it
it is
amount
also
physically adjacent,
were processing
now
controller
offer
speeds
an entire track in
Clusters
performance,
the
is
sectors.
cluster has
cluster
map
It
is
does
this
a fixed
been found on
by viewing the
file
manager
that
possible to read
file
to their corresponding
file as a series
number of contiguous
sectors.
it
making
means
a single
third
have improved so
interleaving. This
ties logical
still
"*"
of
clusters
Once
of
given
can be accessed
to
by using
clusters in a
On many
how many
by
VAX
on
a disk
when
the disk
is
initialized.
The
default
is three 512-byte sectors per cluster, but the cluster size may be set to
any value between 1 and 65,535 sectors. Since clusters represent physically
contiguous groups of sectors, larger clusters guarantee the ability to read
value
""It
is
is
determined by
43
DISKS
(FAT)
The
part of the
FAT
pertaining
to
our
Cluster
Cluster
number
location
file
^^^\
r^
^r
-~~\
~t^"
^^?
in
the
file
that
more
performance gains
substantial
when
a file is
When this is
of its sectors,
contiguous whole
tracks,
(Fig. 3.6a).
the case,
and
This
(if it is
is a
we say
of one
large enough) cylinders form one
good
that the
file
consists
is
to
file,
can be
If there is
the
divided into
file is
an extent.
When new
make them
is
file
clusters are
added to
Each part is
manager tries to
parts.
unavailable for
this, it
file,
but
if space
(Fig. 3.6b).
The
44
(a)
(b)
FIGURE 3.6
single
File extents
file).
file
increases, the
file
file is
that as the
number of
disk,
and
is
size
two ways
of a sector
is
to deal
is
no convenient
with
fit
between
only one record per sector, or allow records to span sectors, so the
beginning of
another (Fig.
The
first
it
in
3.7).
enormous amount of unused space within each sector. This loss of space
within a sector is called internal fragmentation. The second option has the
45
DISKS
advantage that
it
loses
no space from
may
it
has the
sectors.
allocated for a
file.
When
the
is
results
number of
bytes in a
file
is
not an exact
extent of the
file.
A disk that is expected to have mainly large files that will often be processed
would usually be given a large cluster size, since internal
fragmentation would not be a big problem and the performance gains
sequentially
might be great. A disk holding smaller files or files that are usually accessed
only randomly would normally be set up with small clusters.
are
not
blocks
whose
size
The word
FIGURE 3.7 Alternate record organization within sectors (shaded areas represent
data records, and unshaded areas represent unused space).
(a)
block
46
Sector
Sector 3
Sector 2
Sector 4
2 2
3 3 3
Sector 5
Sector 6
444 4444444444
4 4 15 5
5.
(a)
11111111111.
.111111111222.
.22',333;444444.
.4444 441555
(b)
meaning
of the
UNIX
in the context
when
it is
desirable to
have the physical allocation of space for records correspond to their logical
organization. (There are disk drives that allow both sector-addressing and
block-addressing, but
we do
In block-addressing
by one or more
Typically there
number of
is
usually accompanied
contains
(among other
things) the
key
may
DISKS
(Fig. 3.9b).
When
47
its
among
all
track for a block with a desired key. This approach can result in
searches
than
means
on a
much more
the blocks
normally
are
FIGURE 3.9 Block addressing requires that each physical data block be accompanied by one
more subblocks containing information about its contents.
or
III
I
i
(a)
(b)
mSM
48
there
generally
is
when block
can vary
addressing
is
sizes
is
example.
If there are 10
is
10,
or
if
it is
60?
when overhead
can
on
fit
is
L15.38J
as
15.
as
3.
6,300
track.
there
is
When
less
more
efficient use
space
consumed by
a file,
of
so
accompany
each block.
Can we conclude from this example that larger blocking factors always
more efficient storage utilization? Not necessarily. Since we can put
only an integral number of blocks on a track, and since tracks are fixed in
length, we almost always lose some space at the end of a track. Here we
lead to
have the internal fragmentation problem again, but this time it applies to
fragmentation within a track. The greater the block size, the greater
potential amount of internal track fragmentation. What would have
happened if we had chosen a blocking factor of 98 in the preceding example?
in
since
it
lets
the
programmer
49
DISKS
determine to
disk.
On
a large
extent
how
blocking schemes
the
require
programmer
of a Disk
Access
To give you a feel for the factors contributing to the total amount of time
needed to access a file on a fixed disk, we calculate some access times. A disk
access can be divided into three distinct physical operations, each with its
own cost: seek rime, rotational delay, and transfer tiiiie.
Seek
Time
Seek time
is
move
the access
arm
to the
disk access
accessing
from two
files
file is
innermost cylinder,
one
on
outermost cylinder),
the
at
seeking
is
very
expensive.
Seeking
is
likely to be
more
where
several processes are contending for use of the disk at one time, than in
single-user environment,
is
done
as
Since
it is
required for
file
usually impossible to
particular
we
file
random,
to
know
file.
exactly
how many
number of cylinders
"Derivations of
tracks will be
more
list
figure as the
(1982),
50
"
FIGURE 3.10 When a single file can span several tracks on a cylinder, we
can stagger the beginnings of the tracks to avoid rotational delay when
moving from track to track during sequential access.
average seek time for the drives. Most hard disks available today (1991)
have average seek times of less than 40 milliseconds (msec), and highperformance disks have average seek times as low as 10 msec.
Rotational Delay
we want
it
is
is
a sluggish
which
83.3 msec.
As in the case of seeking, these averages apply only when the read/ write
head moves from some random place on the disk surface to the target track.
In
many
much
less
of available tracks
the file to disk sequentially, with one write call. When the first track is
filled, the disk can immediately begin writing to the second track, without
any rotational delay. The "beginning" of the second track is effectively
staggered by just the amount of time it takes to switch from the read/write
head on the first track to the read/write head on the second. Rotational
delay, as it were, is virtually nonexistent. Furthermore, when you read the
file back, the position of data on the second track ensures that there is no
that there are plenty
from one
staggered arrangement.
track to
another.
Figure 3.10
51
DISKS
Time
Transfer
Transfer time
head,
it
rotation time.
of sectors on
a track.
situations that
times.
time
it
We
will
takes to access
we
Let's look at
different types
all
it
of
two
file
different
takes to access a
of the records
file
processing
former
The
use as
basis for
our calculations
is
is
typical
The
TABLE
Minimum
3.1
(track-to--track) seek
time
examples
in
in text
6 msec
msec
18
Rotational delay
8.3
Maximum
transfer rate
are
this disk,
512
40
11
msec
1,331
Interleave factor
Cluster size
8 sectors
5* cjtu/*ti/**
5 clusters
p^-^ "tJsoce^k-
52
is
a cluster size
5 clusters, space
is
files in
1,
one-track
so data on a
suppose that we wish to know how long it will take, using this
drive, to read a 2,048-K-byte file that is divided into 8,000 256-byte records.
First we need to know how the file is distributed on the disk. Since the
4,096-byte cluster holds 16 records, the file will be stored as a sequence of
500 4,096-byte clusters. Since the smallest extent size is 5 clusters, the 500
clusters are stored as 100 extents, occupying 100 tracks.
This means that the disk needs 100 tracks to hold the entire 2,048 K
bytes that we want to read. We assume a situation in which the 100 tracks
are randomly dispersed over the surface of the disk. (This is an extreme
situation chosen to dramatize the point we want to make. Still, it is not so
extreme that it could not easily occur on a typical overloaded disk that has
a large number of small files.)
Now we are ready to calculate the time it would take to read the
2,048-K-byte file from the disk. We first estimate the time it takes to read
the file sector by sector in sequence. This process involves the following
Let's
Average seek
msec
msec
16.7 msec
18
Rotational delay
Read one
8.3
track
Total
We
want
to find
Total time
Now
let's
msec.
43
tracks, so the
100 x 43 msec
it
4,300 msec
would
4.3 seconds.
same 8,000
records using random access rather than sequential access. In other words,
we have
we
after another,
some order
read a
new
that requires
we assume
that
jumping from
Read one
cluster 11
Total
Total time
msec
msec
3.3 msec
18
Rotational delay
8.3
16.7)
msec
29.6
236,800 msec
236.8 seconds.
53
DISKS
3.1.7 Effect
In deciding
of
how
The
standard
CSRG
at
the time
on
UNIX
minimum
systems,
w as
T
block
size
of 512 bytes,
in a typical
UNIX
of 2. But even with 1,024-byte blocks, they found that throughput was only
about 4% of the theoretical maximum. Eventually, they found that
4,096-byte blocks provided the fastest throughput, but this led to large
amounts of wasted space due to internal fragmentation. These results are
summarized in Table 3.2.
TABLE 3.2
The amount
of
Space
Percent
Used
Waste
wasted space as
Organization
(Mbyte)
775.2
0.0
807.8
4.2
828.7
6.9
866.5
11.8
948.5
22.4
1,128.3
45.6
+ modes,
Data + inodes,
Data
UNIX
file starts
files
on 512-byte boundary
2,048-byte
4,096-byte
al.,
p.
198.
54
To
blocks for
files
In the
new
allow the large blocks to be divided into one or more fragments. With a
fragment size of 512 bytes, as many as eight small files can be stored in one
block, greatly reducing internal fragmentation.
much
up disk I/O.
One technique that is now offered on many high-performance systems
called striping. Disk striping involves splitting the parts of a file on several
to speed
different drives, then letting the separate drives deliver parts of the
file
to the
network simultaneously.
For example, suppose we have a 10-megabyte file spread across 20
high-performance (3 megabytes per second) drives that hold 50 K per track.
The first drive has the first 50 K of the file, the second drive has the second
50 K, etc., through the twentieth drive. The first drive also holds the
twenty-first 50 K, and so forth until 10 megabytes are stored. Collectively,
the 20 drives can deliver to the network 250 K per revolution, a combined
rate of 60 megabytes per second.
Disk striping exemplifies an important concept that we see more and
more in system configurations parallelism. Whenever there is a bottleneck
at
some point
is
the source
in
parallel.
Another approach
the disk at
all.
As
the cost of
is
to avoid accessing
users
55
DISKS
RAM to hold data that a few years ago had to be kept on a disk.
Two effective ways in which RAM can be used to replace secondary storage
are RAM disks and disk caches.
A RAM disk a large part of RAM configured to simulate the behavior
are using
is
of a mechanical disk in every respect except speed and volatility. Since data
can be located in
without a seek or rotational delay,
disks can
provide much faster access than mechanical disks. Since RAM is normally
volatile, the contents of a
disk are lost when the computer is turned
off.
disks are often used in place of floppy disks because they are
much faster than floppies and because relatively little
is needed to
RAM
RAM
RAM
RAM
RAM
from
When
a disk.
data
is
first
data. If
if it
memory,
the
file
manager
it
the data
from
disk, replacing
file
some page
when
substantial
improvements
Locality exists in a
when
file
in
a
performance,
high degree of
temporal sequence are stored close to one another on the disk. When a
disk cache is used, blocks that are close to one another on the disk are much
more likely to belong to the page or pages that are read in with a single
read, diminishing the likelihood that extra reads are
cesses.
RAM
look
at
buffering,
We
very
take a closer
we
processing. With
RAM disks
RAM
file
design
f The
term
spect to
cache (as
mary memory
RAM
that
prire-
56
Magnetic Tape
3.2
to a class
accessing facility but that can provide very rapid sequential access to data.
byte within a
start
of the
file
file.
parallel tracks,
on
On
a tape.
corresponds directly to
We may
is
no need
is
sequence of
as a
one-bit-wide
of
bits. If
can be thought of
of
its
each of which
for addresses to
slice
a.
parity
of tape. Such
bit.
So
byte
a slice is called a
frame.
The
parity bit
is
is
is
is
set to
in the
FIGURE 3.11
Nine-track tape.
Track
Frame
J.
I
I
111
1
Gap
JU
Data block
JU
Gap
57
MAGNETIC TAPE
odd
tapes use
parity,
of consecutive
Tape
no
frames
is
used to
all
bits,
so
a large
When
number
fill
drives
differences
quantities:
commonly 800,
or 6,250
per inch
much
30,000
Tape speed commonly 30
200 inches per second
and
of interblock gap commonly between 0.3 inch and 0.75
Tape density
1,600,
(bpi) per
bits
bpi;
as
to
(ips);
Size
Note
inch.
that a 6,250-bpi nine-track tape contains 6,250 bits per inch per track,
and 6,250 bytes per inch when the full nine tracks are taken together. Thus,
in the computations that follow, 6,250 bpi is usually taken to mean 6,250
bytes of data per inch.
we want
Suppose
to store a
g =
n
number of data
We know
b is
we
blocks,
(b
file is
g).
are. In fact,
not know what b and
depends on our choice of b. Suppose
choose each data block to contain one 100-byte record. Then b, the
that
is
whatever we want
block
=
it
is
to be,
we do
and
;/
//
given by
jr
rr
tape density (bytes per inch)
:
- r~^- =
0.016 inch.
6,250
n,
the
58
file is
1,000,000 x (0.016
316,000 inches
26,333
0.3) inch
feet.
Magnetic tapes range in length from 300 feet to 3,600 feet, with 2,400
being the most common length. Clearly, we need quite a few
2,400-foot tapes to store the file. Or do we? You may have noticed that our
choice of block size was not a very smart one from the standpoint of space
feet
utilization.
The
up about 19
times as
much
Gap
Data
Clearly,
Gap
we
we
is
we were
file
take
to take a
like this:
Gap
Data
Data
not used!
Data
Most of the
tape.
it
if
we want
we
file
number of
^Q =
20,000,
and the space requirement for interblock gaps decreases from 300,000 inches
to 6,000 inches. The space requirement for the data is of course the same as
it was previously. What has changed is the relative amount of space occupied
by the gaps, as compared to the data. Now a snapshot of the tape would
look much different:
Data
Gap
Data
Gap
Data
Gap
Data
Gap
Data
59
MAGNETIC TAPE
We leave it to
when
you
show
to
that the
blocking factor of 50
When we compute
numbers
file
can
easily
fit
used.
is
A more
file.
we produce
file,
number of bytes
number of inches
When
block
is
per block
required to store
100,
100 bytes
is
= 316 4bpi
'
0.316 inches
is
a far cry
at
it,
space utilization
amount of
time
it
'
Either
sizes
block'
blocking factor of 1
which
bpi.
is
now
see
how
you understand
two
read/write head. If we
know
these
two
values,
we
Nominal
rate
x tape speed
(ips).
nominal transmission
1,250,000 bytes/sec
1,250 kilobytes/sec.
rate
of
This rate
example, that
we
gets dispersed
by
Suppose, for
file
and tape
60
discussed in
the
0.3-inch gap).
We
saw
organization
316.4 bpi.
drive
is
If the tape
is
moving
at a rate
of 200
ips,
then
its
is
316.4 x 200
63,280 bytes/sec
63.3 kilobytes/sec,
a rate that is
It
and that
result,
factor
improves on this
improves on it
substantially.
Although there
size
is
on space
utilization
if
the
an appropriate
is
files
medium
in applications
problem of updating
for a
list
list needs to be current only when mailing labels are printed, all
of the changes that occur during the course of a month can be collected in
one batch and put into a transaction file that is sorted in the same way that
the mailing list is sorted. Then a program that reads through the two files
simultaneously can be executed, making all the required changes in one pass
the mailing
through the
data.
Since tape
data offline.
megabytes
is
relatively inexpensive,
At current
prices,
it is
an excellent
medium
much
as a reel
is
of tape
for storing
that holds
that,
good medium
150
properly
for archival
storage and for transporting data, as long as the data does not have to be
available
on short notice
61
3.3
less
suited
for
files.
Over
that
for sequential
is
will occur,
RAM
cost of disk
RAM
RAM
RAM
most
files
to a level that
makes disk
quite
competitive with tape for sequential processing. This change, added to the
superior versatility and decreasing costs of disks, has resulted in use of disk
for
most
of tape.
This
sequential processing,
is
which
in the past
processing. If a
use
file
them
it
may
it
to
it
sequentially.
Although
tions, tape
Tape
is still
it
t Techniques for
RAM
62
tape has
emerged
CD-ROM)
as
Storage as a Hierarchy
3.4
we
for a
as a
at different levels in
Types of
Devices and
Access times
Capacities
Cost
memory
media
(sec)
(bytes)
(cents/bit)
- Primary
Core and
Registers
10
10
10-10 9
10- 1(T 3
RAM
RAM
disk
and
disk cache
Secondary
-_.
Direct-access
Magnetic disks
Serial
Tape and
10" 3 -10
-10 9
10- 2 -l(T o
10-10 n
10" 5 -10" 7
10
mass storage
_- Offline _,
Archival
and
backup
Removable
magnetic
disks,
optical discs,
and tapes
10- 10'
10
-10 12
5
1(T -10
A JOURNEY OF A BYTE
63
Operating system's
User's program:
WRITE
M
(
file
i/o system:
text M ,c,l)
Get one byte from variable c
in user
Write
it
area.
to current location
in text file.
c:
FIGURE 3.13 The WRITE( ) statement tells the operating system to send one
character to disk and gives the operating system the location of the character. The operating system takes over the job of doing the actual writing and
then returns control to the calling program.
and
3.5
A Journey
how
cost.
of a Byte
in a character variable
somewhere on
a disk.
WRITECTEXT,
but the journey
is
The WRITE(
to a file
named
in the variable
TEXT
c,
1)
longer than
this
statement results in
a call to the
computer's operating
system, which has the task of seeing that the rest of the journey
successfully (Fig.
stored
much
)
From
3.13).
is completed
Often our program can provide the operating
64
system with information that helps it carry out this task more effectively,
but once the operating system has taken over, the job of overseeing the rest
of the journey is largely beyond our program's control.
3.5.1 The
An
Manager
File
operating system
is
not
program, but
a single
a collection
of programs,
Among
these
devices.
We call
programs
and I/O
of programs the operating system's file manager.
The file manager may be thought of as several layers of procedures (Fig.
3.14), with the upper layers dealing mostly with symbolic, or logical,
aspects of file management, and the lower layers dealing more with the
physical aspects. Each layer calls the one below it, until, at the lowest level,
the byte
The
is
this subset
manager begins by finding out whether the logical characterare consistent with what we are asking it to do with the file.
It may look up the requested file in a table, where it finds out such things
as whether the file has been opened, what type of file the byte is being sent
to (a binary file, a text file, some other organization), who the file's owner
is, and whether WRITE( ) access is allowed for this particular user of the
file
of the
istics
file
file.
The
file
manager must
also
needs to
know where
sector in the
file.
is
This information
appended
file is
is
to the
file
TEXT the
the
file,
file
'P' is
manager
file
last
allocation table
(FAT) described earlier. From the FAT, the file manager locates the
and sector where the byte is to be stored.
drive,
cylinder, track,
3.5.2 The
Next, the
I/O Buffer
file
already in
'P' is
RAM
sector that
RAM.
is
to contain the
If the sector
needs
for
the
it,
then read
it
manager can deposit the 'P' into its proper position in the buffer
3.15). The system I/O buffer allows the file manager to read and write
file
(Fig.
manager
of data
in
RAM
it
enables the
file
conforms to the
organization
it
will
65
A JOURNEY OF A BYTE
Logical
1.
The program
2.
3.
The
file
about
it,
manager looks up
TEXT
The
file
The
file
to the file
manager.
file is
if
on
file
use,
what
the logical
to.
manager searches a
5.
the job
TEXT.
file
is to
RAM,
has been
file
into
its
6.
The
file
the byte
7.
finds a time
when
the drive
is
available to receive
the data and puts the data in proper format for the disk.
it
It
may
also
disk.
8.
9.
The
Physical
FIGURE 3.14 Layers of procedures involved in transmitting a byte from a program's data area to a file called TEXT on disk.
before actually transmitting anything. Even though the statement WRITE(TEXT,c,l) seems to imply that our character is being sent immediately to
the disk,
it
(There are
buffer
may in fact be
many situations
is filled
would have
TEXT
kept in
in
before transmitting
to flush
so the data
all
RAM
it.
for
it
is
sent.
which the
file
a
it
would not be
lost.)
66
User's program:
WRITE ("text",c,
If necessary,
sector
1)
2.
load
last
from "TEXT"
into
GJ
I/O
*"
system's
output buffer
P'
FIGURE 3.15 The file manager moves P from the program's data area to a system
output buffer, where it may join other bytes headed for the same place on the
disk. If necessary, the file manager may have to load the corresponding sector
from the disk into the system output buffer.
I/O
Processor
So far, all of our byte's activities have occurred within the computer's
primary memory and have probably been carried out by the computer's
central processing unit (CPU). The byte has travelled along data paths that
are designed to be very fast and that are relatively expensive. Now it is time
for the byte to travel along a data path that is likely to be slower and
narrower than the one in primary memory. (A typical computer might have
an internal data-path width of four bytes, whereas the width of the path
leading to the disk might be only two bytes.)
Because of bottlenecks created by these differences in speed and
data-path widths, our byte and its companions might have to wait for an
67
A JOURNEY OF A BYTE
CPU
simultaneously.
CPU
the
to
devices simultaneously.
it
takes
its
from
instructions
it
the
runs independently,
computing
In
to overlap."
typical
computer, the
is
file
is
tell
the
I/O
how much
instructed to
is
move
first
mechanical!
The
time, a device
is
its
read/write head to
companions are to
being asked to do something
its
it is
already there), and then wait until the disk has spun around so the desired
sector
is
under the head. Once the track and sector are located, the I/O
processor (or perhaps the controller) can send out bytes, one
the drive.
drive,
Our
where
it
probably
is
its
stored in
at a
time, to
it
waits to
On many systems the I/O processor can take data directly from RAM, without further
involvement from the CPU. This process is called direct memory access (DMA). On other
systems, the CPU must place the data in special I/O registers before the I/O processor can
have access to it.
t
68
File
Manager
Invoke I/O processor
User's program:
I/O
processor
program
^
!i
L___i>
I
,-a
System
buffer
I/O processor
FIGURE 3.16 The file manager sends the I/O processor instructions in the form of
an I/O processor program. The I/O processor gets the data from the system
buffer, prepares it for storing on the disk, and then sends it to the disk controller, which deposits it on the surface of the disk.
3.6
Buffer
Any
Management
travelling
between
RAM
69
BUFFER MANAGEMENT
We know
that a
hold incoming
file
manager
allocates
I/O buffers
enough
data, but
it is
to
buffers
buffers
performing I/O.
To understand the need for several system buffers, consider what
happens if a program is performing both input and output on one character
at a time, and only one I/O buffer is available. When the program asks for
its first character, the I/O buffer is loaded with the sector containing the
character, and the character is transmitted to the program. If the program
then decides to output a character, the I/O buffer is filled with the sector
into which the output character needs to go, destroying its original
for
contents.
Then when
is
have to be written to disk to make room for the (original) sector containing
the second input character, and so on.
Fortunately, there is a simple and generally effective solution to this
ridiculous state of affairs, and that is to use more than one system buffer.
For this reason, I/O systems almost always use at least two buffers
one
for input and one for output.
Even if a program transmits data in only one direction, the use of a
single system I/O buffer can slow it down considerably. We know, for
instance, that the operation of reading a sector from a disk is extremely slow
compared to the amount of time it takes to move data in RAM, so we can
guess that a program that reads many sectors from a file might have to
spend much of its time waiting for the I/O system to fill its buffer every
time a read operation is performed before it can begin processing. When this
happens, the program that is running is said to be I/O bound the CPU
spends much of its time just waiting for I/O to be performed. The solution
to this problem is to use more than one buffer and to have the I/O system
filling the
CPU
is
processing the
current one.
When
70
finished,
swapping the
called
roles
of two buffers
after
This technique of
is
(Fig. 3.17).
The
idea of
to allow processing
and I/O
to
number of
of ways. The
it.
FIGURE 3.17 Double buffering: (a) The contents of system I/O buffer 1 are sent to
is being filled; and (b) the contents of buffer 2 are sent to
disk while I/O buffer 1 is being filled.
disk while I/O buffer 2
To
disk
To
disk
(a)
(b)
71
BUFFER MANAGEMENT
Several different schemes are used to decide which buffer to take from
a
buffer pool.
buffer that
is
One
le ast recently
use d.
least-recently-used queue, so
less-recently-used
(LRU)
buffers
When
it is
to take
is
buffer
is
allowed to retain
new
its
The
accessed,
data has
is
it
data until
tha'j
put on
all
other
least-recently-used
many
applications
chapters.)
It is
ceases
to
more
Locate
in
Mode
Sometimes
it
is
not necessary
way of
is
called
place in
to a
the
to
When
data
program buffer ( or
vice
move
can be substantial.
it
involves
be accessed.
There
are
wo way s
that
move mode
file
manager
can perform I/O directly between secondary storage and the program's data
no extra move
is
file
72
identifies
block
is
a collection
a single
to be scattered.
The converse of
scatter input
gather oulgut,.
is
3.7
I/O in
UNIX
We see in the journey of a byte that we can view I/O as proceeding through
several layers. UNIX provides a good example of how these layers occur in
operating system, so we conclude this chapter with a look at UNIX.
of course beyond the scope of this text to describe the UNIX I/O layers
in detail. Rather, our objective here is just to pick a few features of UNIX
that illustrate points made in the text. A secondary objective is to familiarize
you with some of the important terminology used in describing UNIX
systems. For a comprehensive, detailed look at how UNIX works, plus a
thorough discussion of the design decisions involved in creating and
improving UNIX, see Leffler et al. (1989).
a real
It is
we
see
how
The topmost
name,
file a
file.
The
view
program
to
a series
of layers.
We
store in a
array of numbers, or
some other
what goes
from
proceeding through
into a
as
views on
files.
UNIX
on
a physical device.
UNIX
impose
certain
I/O IN
PROCESSES
user programs
shell
73
UNIX
commands
libraries
system
call
interface
KERNEL
I/O
block
character
network
system
(normal
I/O system
files)
printers, etc.)
(terminals,
TIT TTT
consoles
disk...
(sockets)
TT
disk
I/O system
printers...
..networks...
HARDWARE
FIGURE 3.18 Kernel
such
as
I/O structure.
Processes include
and
programs
library
files,
once
The
we
scanf(
numbers,
UNIX
kernel views
all
I/O
as
a file are
gone.
The
etc.
kernel,
the layers.^
3.18.
routines like
to read strings,
Below
file
which incorporates
all
the rest of
operating on
all
sequence of bytes, so
decision to design
UNIX
in this
way
to
make
all
of
a file
UNIX
""It
is
as a
is
unusual.
tion of the
It
is
also
UNIX
UNIX
al.
(1989).
a full
descrip-
74
file
a file,
beyond the
imposing no
it must be
fact that
on
restrictions
built
from
how we
sequence of
bytes.
Let's illustrate the journey
in this chapter
by tracing the
results
call
such
as
kernel
way from
is
The
refer to.
file
that they
descriptor table;
a file
an open
file table,
node; and
a table
Although these
a sense,
tables are
"owned" by
managed by
file
in use.
four tables are invoked in turn by the kernel to get the information
file
on
how
this
works by looking
it
at
The
of the
open
file
file
descriptors used
file table.
entries for
and
by
all files it
is
its
own
descriptor table,
table, the
which includes
STDIN, STDOUT,
STDERR.
^This should not be confused with a library call, such as fprintf( ), which invokes the standard library to perform some additional operations on the data, such as converting it to an
ASCII format, and then makes a corresponding system call.
I/O IN
75
UNIX
table
file
file
descriptor
entry
*-
to open file
table
Number of
inode
processes
Offset
of next
ptr to
R/W
write
table
mode
using
access
routine
entry
it
to inode
^^^
write
table
100
""
write() routine
of
The
file
file
table.
open file.
added to the open file
entries are called file structures, and they contain important
information about how the corresponding file is to be used, such as the
read/write mode used when it was opened, the number of processes
open
Every time a
table. These
file
file is
opened or
created, a
new
entry
is
and the offset within the file to be used for the next read
or write. The open file table also contains an array of pointers to generic
currently using
it,
76
file.
These functions
will differ
file.
It is
table entry, so
more commonly
An
inode
structure.
inode exists
inode
is
file
as
more permanent
long
as its
a file
is
opened,
structure than an
corresponding
is
When
file is
For
file exists.
RAM
it is
is
the
UNIX
counterpart to the
in this chapter.*
knows
all
that
Once
it
fi le
needs to
is
know
about the
is
kinds of
file
data that
3.7.2 Linking
It is
File
it
how
must
Names
instructive to look a
cess at the
that
you
to be written.
is
role
more
file.
closely at
All
UNIX,
its
buffer to
of device drivers in
how
references
It"
you
difficult to
moved from
In
among
this
its
UNIX,
the different
to Files
a file
to
name
files
is
actually
begin with
are writing to a
from the
file
determine.
dynamic,
it
It
same time
file.
deal with.
little
*Of course,
described earlier
it is
we
To accommodate both
tree-like structure.
files,
this
I/O IN
77
UNIX
device
permissions
owner's userid
file size
block count
w^
tile
allocation
table
FIGURE 3.20 An inode. The inode is the data structure used by UNIX to describe
file. It includes the device containing the file, permissions, owner and group
the
IDs,
and
allocation table,
file
directory, for
is
it is
just a small
pointer to the
inode of a
name
link
is
to
other things.
file's
It
name
together with
file.
RAM
When
and to
a directory to the
It is
file,
names
file
file is
all
among
a file is
opened,
file
hard
this
up the corresponding
set
file table.
file
names
to point to the
field in the
inode
tells
how many
file
hard
file
names
is
for the
just
^The
al.
is
a little
this,
es-
78
file.
directory or even to a
3.7.3 Normal
Files,
the
The
first
files.
three
file
At
a certain
UNIX
files
any
of them. For instance, you can establish access to all three types by opening
them, and you can write to them with the write( ) system call.
3.7.4 Block
I/O
In Fig.
we
3.18,
to access
respective devices via three different I/O systems, the block I/O system, the
I/O system, and the network I/O system. Henceforth we ignore the
second and third categories, since it is normal file I/O that we are most
character
It
data,
device like
UNIX
concerns
as a
itself
counterpart of the
with
how
on
file
to transmit
a
manager in
normal file
block-oriented
a disk, for
example,
it
It is
not entirely true. Sockets, for example, can be used to move normal
network systems bypass the normal
favor of sockets to squeeze every bit of performance out of the network.
iThis
is
files
file
from
system
in
I/O IN
with
we saw how
UNIX
79
UNIX
systems dealt
convention.)
this
driver, that
job
is
it
0,
from
to take a block
block
etc.,
1,
a buffer,
the device. This saves the block I/O part of the kernel
from having
to
on
know
it is
block device.
we
filesystem
is
the actual
files in
described the
a collection
the system.
files,
RAM
RAM
This separation of the filesystem from the kernel has many advantages. One
important advantage is that we can tune a filesystem to a particular device
or usage pattern independently of how the kernel views files. The
discussions in section 3.1.7 of
4.3BSD block
how
the kernel
works.
80
Important
the
as
UNIX
fit
category. Character devices read and write streams of data, not blocks, and
block
SUMMARY
In this chapter
we look
at
which
file
processing
programs must operate and at some of the hardware devices on which files
are commonly stored, hoping to understand how they influence the ways
we design and process files. We begin by looking at the two most common
storage media: magnetic disks and tapes.
A disk drive consists of a set of read/write heads that are interspersed
among one or more platters. Each platter contributes one or two surfaces,
each surface contains a set of concentric tracks, and each track is divided into
sectors or blocks.
read/write heads
is
The
set
called a cylinder.
by sector and by
block. Used in this context, the term block refers to a group of records that
are stored together on a disk and treated as a unit for I/O purposes. When
There are two basic ways
is
to address data
better able to
make
on
disks:
its
logical
organization,
RAM.
Three possible disadvantages of block-organized devices are the danger
of internal track fragmentation, the burden of dealing with the extra
SUMMARY
some of
loss
of opportunities to do
as sector interleaving)
that
The
used,
it is
physically
by one or more
sectors.
Although
by separating them
takes
it
much
less
time to
access a single record directly than sequentially, the extra seek time required
for
doing direct accesses makes it much slower than sequential access when
of records is to be accessed.
Despite increasing disk performance, network speeds have improved to
a series
system.
is
number of techniques
RAM
disks,
effect
BSD UNIX
shows
to 4,096 bytes,
for large
files,
as
A negative consequence
much
data
could be
of this reorganization
was
that
in file processing.
processing, compact, robust, and easy to store and transport. Data are
usually organized
on
space utilization,
it is
more
bytes.
When
estimating
file
organization.
as
RAM
RAM
to disk.
journey of a byte as it is sent from
and
programs
The journey involves the participation of many different
devices, including
a user's
tem;
initial call to
81
82
file manager, which maintains tables of information that it uses to translate between the program's logical view of
the file and the physical file where the byte is to be stored;
an I/O processor and its software, which transmit the byte, synchronizing the transmission of the byte between an I/O buffer in
and the disk;
the disk controller and its software, which instruct the drive about
how to find the proper track and sector, then send the byte; and
tne disk drive, which accepts the byte and deposits it on the disk sur-
RAM
face.
Next,
for
we
on techniques
improve performance. Some techniques include
managing
buffers to
We
on
conclude with
UNIX. We
its
work
and
how
file's
inode.
to access
it,
it
an open
Once
calls
file table,
on
file
which device
accessing.
describing
KEY TERMS
bpi. Bits per inch per track.
tracks.
On
a tape,
On
a disk, data
is
recorded serially on
on
several tracks, so a
when
all
nine
tracks are taken into account (one track being used for parity).
to the
amount of data
KEY TERMS
records, but
it
may
be
a collection
of sectors
sometimes
whose
(see cluster)
is
sometimes
block
size
is
called a
block.
Block device.
in
In
UNIX,
device such as
is
organized
the user to
define the size and organization of blocks, and then access a block by
giving
its
its
organization.)
UNIX,
device such as
is
keyboard or printer
(or
cedes each data block and contains information about the data block,
without first requiring the reading of the blocks that precede it.
Direct memory access (DMA). Transfer of data directly between
and peripheral devices, without significant involvement by the
RAM
CPU.
83
84
when
pack of disks
number of cylinders
If disk
is
disks
mounted on
the
same ver-
equivalent to the
number of tracks
per surface.
same drive
at different times,
providing
accompany
data.
after taking into ac-
count the time used to locate and transmit the block of data in which
a
Extent.
physical locations of
File
all
the clusters in
all files
on disk
storage.
open
file
table in a
UNIX
kernel,
the term file structure refers to a structure that holds information the
such things
rently using
as the file's
it,
and the
file.
File structure
read/write mode,
information includes
number of processes
cur-
read or write.
Filesystem. In
UNIX,
a hierarchical collection
of
files,
usually kept
on
CD-ROM.
Fixed disk. A disk drive with platters that may not be removed.
Formatting. The process of preparing a disk for data storage, involving
such things as laying out sectors, setting up the disk's file allocation
table, and checking for damage to the recording medium.
Fragmentation. Space that goes unused within a cluster, block, track,
or other unit of physical storage. For instance, track fragmentation
is
not
single byte.
KEY TERMS
Hard
link. In
to the
UNIX,
links to a single
deleted until
Index node.
its
all
hence
file;
a file
name
a file
file.
file
are deleted.
UNIX, a data structure associated with a file that deAn index node includes such information as a file's
In
scribes the
type,
an entry in
file.
comprise the
that
inode.
read/write heads to
when
On
tell
Interleaving factor. Since it is often not possible to read physically adjacent sectors of a disk, logically adjacent sectors are sometimes arranged so they are not physically adjacent. This is called interleaving.
The interleaving factor refers to the number of physical sectors the
next logically adjacent sector
is
located
sector being
read or written.
carries out
Key subblock. On
UNIX
I/O
tasks,
allowing the
to
operating system.
block-addressable drives,
it,
allowing the
drive to search
certain key,
CPU
mem-
ory.
capacity. Also applied to very high-capacity secondary storage systems that are capable of transmitting data between a disk and any of
few seconds.
Nominal recording
data subblocks.
Nominal transmission
open
file.
See
file structure.
85
86
An
Parity.
is
set in
such
way
bit
accom-
number of
RAM
RAM
chunk of
a single
data.
Sector.
The
make up
on
the tracks
on
a disk
medium
Sometimes
(e.g., tape)
Although
in
some ways
files,
we do
not
link.
UNIX,
ters
the term special file refers to a stream of characand control signals that drive some device, such as a line printer
or
graphics device.
Special
a
file.
In
Streaming tape drive. A tape drive whose primary purpose is dumping large amounts of data from disk to tape or from tape to disk.
Subblock. When blocking is used, there are often separate groupings of
information concerned with each individual block. For example,
count subblock,
all
be
present.
Symbolic link. In UNIX, an entry in a directory that gives the pathname of a file. Since a symbolic link is an indirect pointer to a file,
not
is
as closely associated
as a
hard
link.
Symbolic
in other filesystems.
it
links
EXERCISES
Track. The set of bytes on a single surface of a disk that can be accessed
without seeking (without moving the access arm). The surface of a
disk can be thought of as a series of concentric circles, with each circle corresponding to a particular position of the access arm and read/
write heads. Each of these circles is a track.
Transfer time. Once the data we want is under the read/write head, we
have to wait for it to pass under the head as we read it. The amount
of time required for this motion and reading is the transfer time.
EXERCISES
Determine as well as you can what the journey of a byte would be like
on your system. You may have to consult technical reference manuals that
describe your computer's file management system, operating system, and
peripheral devices. You may also want to talk to local gurus who have
experience using your system.
1.
2.
a list
of names to
write statement.
it
3.
for
When you
every write,
create or
open
Compared
utilization. If
a file in
file after
4.
a text file,
close the
file
COBOL,
the
COBOL
with
available to the
5.
Much is
way
file
said in section 3.
to store files.
every
Assume
that
must occupy
stored on
a file is
problems does
file
specifications
programmer.
it
create?
about
how
disk space
is
organized physically
a single
tape.
contiguous piece of
How
does
this
is
a disk,
somewhat the
What
87
88
6.
it
file
extra read
is
likely to occur?
We
have seen that some disk operating systems allocate storage space
in clusters and/or extents, rather than sectors, so the size of any file
a multiple of a cluster or extent.
a. What are some advantages and potential disadvantages of this
method of allocating disk space?
b. How appropriate would the use of large extents be for an application that mostly involves sequential access of very large files?
c. How appropriate would large extents be for a computing system
that serves a large number of C programmers? (C programs tend to
be small, so there are likely to be many small files that contain C
programs.)
d. The VAX record management system uses a default cluster size
of three 512-byte sectors but lets a user reformat a drive with any
cluster size from 1 to 65,535 sectors. When might a cluster size larger
than three sectors be desirable? When might a smaller cluster size be
on disks
must be
desirable?
8.
In early
disk,
UNIX
Later editions divided disk drives into groups of adjacent cylinders called
cylinder groups, in
corresponding data.
mance?
UNIX
In early
a cluster size
10.
Draw
the
numbers
11.
Table
Count-data, where the extra space used by count subblock and interis equivalent to 185 bytes; and
Count-key-data, where the extra space used by the count and key
subblocks and accompanying gaps is equivalent to 267 bytes, plus
block gaps
An IBM
cylinder,
3350 has 19,069 usable bytes available per track, 30 tracks per
and 555 cylinders per drive. Suppose you have a file with 350,000
EXERCISES
How many
How many
records?
b.
How many
Make
is
size
the count-key-
if
13 bytes?
is
graph that shows the effect of block size on storage utilization, assuming count-data subblocks. Use the graph to help predict
the best and worst possible blocking factor in terms of storage utilic.
zation.
e.
How many
How much
file
(blocking factor
record randomly.
g.
how
Explain
fected
retrieval
by increasing block
size.
af-
is
efficiency
table
h.
sort the
Since the
file.
sorted in place,
on the
random
It is
memory,
number of records
access. If
provide
disk.
We
file is
much
in the
of the preceding
you
will be
it
p.
380)
N repre-
file.
is
true,
how
is
not
7,
which
long does
it
take
quential processing.)
12.
there
less
of
correspondence between
block organization
the
logical
and
in that
physical
RM05
disk drive,
It
89
90
view, a
file is
knows nothing about where one record ends and another begins, a
record can span two or more sectors, tracks, or cylinders.
One common way that records are formatted on the RM05 is to place
a two-byte field at the beginning of each block, giving the number of bytes
drive
of
store a
a.
There
data,
this
itself.
organization
is
is
How many
How might
file?
being accessed?
What
of do-
ing this?
13. Suppose you have a collection of 500 large images stored in files, one
image per file, and you wish to "animate" these images by displaying them
in sequence on a workstation at a rate of at least 15 images per second over
a high-speed network. Your secondary storage consists of a disk farm with
30 disk drives, and your disk manager permits striping over as many as 30
drives, if you request it. Your drives are guaranteed to perform I/O at a
steady rate of 2 megabytes per second. Each image is 3 megabytes in size.
mation
b.
are.
not a problem.
Describe in broad terms the steps involved in doing such an aniin real
Consider the 1,000,000-record mailing list file discussed in the text. The
to be backed up on 2,400-foot reels of 6,250-bpi tape with 0.3-inch
interblock gaps. Tape speed is 200 inches per second.
a. Show that only one tape would be required to back up the file if a
blocking factor of 50 is used.
b. If a blocking factor of 50 is used, how many extra records could
be accommodated on a 2,400-foot tape?
c. What is the effective recording density when a blocking factor of
14.
file is
50
is
d.
How
used?
large does the blocking factor have to be to achieve the
maximum
What
large
FURTHER READINGS
What would be
e.
file
record in the
ample,
some
Suppose that the extra time it takes to start before reading a block
and to stop after reading the block totals 1 msec, and that the drive
must start before and stop after reading each block. How much will
the effective transmission rate be decreased due to starting and stop-
ping
if the
blocking factor
is
1?
What
if
it is
50?
15.
just
jam
16.
all
on
why do we not
tracks
FURTHER READINGS
Many
file
in this
found the operating system texts by Deitel (1984), Peterson and Silberschatz (1985),
and Madnick and Donovan (1974) useful. Hanson (1982) has a great deal of material
on blocking and buffering, secondary storage devices, and performance. Flores's
book
(1973)
bit dated,
but
it
contains a
com-
(1984) wrote a
I/O is handled in the UNIX operating system. The latter provides a good
of ways in which a filesystem can be altered to provide substantially taster
throughput for certain applications. A comprehensive coverage of UNIX I/O from
the design perspective can be found in Leffler et al. (1989).
how
file
case study
91
92
Information on specific systems and devices can often be found in manuals and
documentation published by manufacturers. (Unfortunately, information about
how software actually works is often proprietary and therefore not available.) If you
use a VAX, we recommend the manuals Introduction to the VAX Record Management
Services (Digital, 1978), VAX Software Handbook (Digital, 1982), and Peripherals
Handbook
(Digital,
Laboratories'
IBM PCs
useful.
1981).
monograph The
1983 or
later)
the Bell
Users ot
manual
Fundamental File
Structure Concepts
CHAPTER OBJECTIVES
Introduce
file
Stream files;
and record boundaries;
Fixed-length and variable-length
Field
fields
and
records;
and
file
organization.
structures in terms of
Examine
issues
93
CHAPTER OUTLINE
4.1
Field
4.4
4.1.1
4.5
Stream
File
File Access
Beyond Record
Structures
4.5.2
in
One
File
4.2.1
Record Keys
4.2.2
A Sequential Search
UNIX Tools for Sequential
4.6
Processing
4.2.4 Direct Access
4.3
Organization
4.5.1
4.5.5
Record Access
4.2.3
File
4.5.3 Metadata
4.2
and
Structures
4.1
Field
When we build file structures we are imposing order on data. In this chapter
we investigate the many forms that this ordering can take. We begin by
looking
at the
base case:
4.1.1 A Stream
Suppose the
program
out as
file
organized
as a
name and
OUTPUT,
is
stream of bytes.
File
we
to accept
a file
address information.
shown
them
FIELD
95
PROGRAM: writstrm
get output file name and open it with the logical name OUTPUT
get LAST name as input
while
LAST name has a length > 0)
get FIRST name, ADDRESS, CITY, STATE and ZIP as input
(
write
write
write
write
write
write
LAST
FIRST
ADDRESS
CITY
STATE
ZIP
to
to
to
to
to
to
the
the
the
the
the
the
file
file
file
file
file
file
OUTPUT
OUTPUT
OUTPUT
OUTPUT
OUTPUT
OUTPUT
endwhile
close OUTPUT
end PROGRAM
FIGURE 4.1
Program
the
file
to write out a
structures
we
are discussing if
file
as a stream of bytes.
you perform
self
When we
AmesJohnl
list
Map 1 eS
23
The program
a
the output
i 1 1
file
Alan Mason
Eastgate
Ada, OK 74820
90
on our terminal
screen, here
file
specifications, the
there
is
We
no way
we
program
put
to get
all
it
creates a kind
what we
e Ada
see:
OK 7482
precisely as specified: as
problem. Once
is
of "reverse
in
meeting our
Humpty-Dumpty"
apart again.
have
96
FUNDAMENTAL
When we
fields.
STRUCTURE CONCEPTS
FILE
working with
arc
files,
we
call
field
logical notion;
is
it
is
When we
structure.
of information
yet
name and
it
is
in a file.^
does not
important to the file's
conceptual tool.
field
of
Begin each
Place
field
with
a delimiter at
length indicator.
it
field.
Use
and
its
contents.
Method
1:
in their length. If
pull
we
field.
We
can define
Using
this
file
shown
our sample
file
way
to the
or a record in Pascal to
vary
we
can
end of the
hold these
in Fig. 4.2.
shown
in Fig. 4.3(a).
Simple arithmetic
One
fields in
a structure in
fixed-length fields, as
The
obvious disadvantage of
this
is
it
sufficient to let us
fields.
approach
is
that
adding
all
the
padding required to bring the fields up to a fixed length makes the file much
Rather than using 4 bytes to store the last name Ames, we use 10. We
can also encounter problems with data that is too long to fit into the
allocated amount of space. We could solve this second problem by fixing all
larger.
make
"'"Readers
the
first
some programming
record
record
is
an aggregate data
members of different
types,
FIELD
InC:
In Pascal:
struct {
char lastCIO];
char firstClO]
char addressCl5]
char city[15];
char stateC2]
char zipC9]
} set_of__fields;
TYPE
set _of_field; s = RECORD
last
packed array [1
first
packed array [1
address
city
state
zip
97
packed
packed
packed
packed
array
array
array
array
[1
[1
[1
CI
of
of
of
of
of
of
10]
10]
15]
15]
2]
9]
char
char
char
char
char
char
END;
FIGURE 4.2 Fixed-length
fields.
Because of these
is
of fields, such
is
lengths, using a
file
names and
fields are
if
there
is
a large
addresses.
amount of
But there
highly appropriate.
very
little
If
are
every
variation in field
as
fields is often a
Field with a
Length Indicator
Another way
to
it
less
it is
not
Method
3:
each
field.
The choice of a
it
must
a character that
console. Also,
by
that,
default,
blanks
often
occur
as
legitimate
characters
within
an
address
since
field.
98
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
Ames
John
123 Maple
Stillwater
OK74075377-1808
Mason
Alan
90 Eastgate
Ada
OK74820
(a) Field
go.
(b) Delimiters are used to indicate the end of a field. Place the delimiter for the "empty" field
immediately after the delimiter for the previous field.
is
...
...
(c) Place the field for business phone at the end of the record.
encountered, assume that the field is missing.
field. If the
keyword
is
If the
...
end-of-record mark
is
is
FIGURE 4.3 Four methods for organizing fields within records to account for possible missing
the examples, the second record is missing the phone number.
fields. In
file
we
original
in the
Method
4:
Use
to Identify Fields
This option, illustrated in Fig. 4.2(d), has an advantage that the others do
not: It is the first structure in which a field provides information about itself.
Such
self-describing structures
files
FIELD
in
even
contain.
It is
also a
fields are
contained in
99
a file,
supposed to
You may have noticed in Fig. 4.3(d) that this format is used in
combination with another format, a delimiter to separate fields. While this
may not always be necessary, in this case it is helpful because it shows the
division between each value and the keyword for the following field.
Unfortunately, for the address file this format also wastes a lot of space.
Fifty percent or more of the file's space could be taken up by the keywords.
But there are applications in which this format does not demand so much
overhead. We discuss some of these applications in section 4.5.
4.1.3 Reading a Stream
of Fields
Field
Field
Field
Field
Field
Field
Field
Field
Field
Field
Field
Field
lwat er
St
OK
74075
Mason
Alan
90 Eastgate
8
9
i 1
Ada
1
1
Ames
John
123 Maple
12
QK
74820
00
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
'!'
readstrm
PROGRAM:
initialize FIELD_C0UNT
endwhile
close INPUT
end PROGRAM
FUNCTION:
initialize I
initialize CH
while (not EOF (INPUT) and CH does not equal DELIMITER
read a character from INPUT into CH
increment I
FIELD_C0NTENT [I]
= CH
:
endwhile
return (length of field that was read)
end FUNCTION
FIGURE 4.4 Program to read fields from a
Clearly,
these data.
as a
we now
file
But something
stream of fields. In
is still
fact,
are a set
records.
of
fields associated
missing.
a field as
we
store
and retrieve
The
file
first
FIELD
record
in
can be defined as a
set
It is
file is
viewed
is
we impose
on the data
in
the
structure.
file's
Here
are
file
into
records:
a predictable
a
predictable
number of bytes
number of fields
in length.
in length.
it
from the
next record.
1: Make Records a Predictable Number of Bytes (Fixedlength Records) A fixed-length record file is one in which each record
contains the same number of bytes. This method of recognizing records is
analogous to the first method we discussed for making fields recognizable.
Method
'
As we
will see in the chapters that follow, fixed-length record structures are
important to
realize,
numbers of variable length fields. It is also possible to mix fixedand variable-length fields within a record. Figure 4.5(b) illustrates how
variable-length fields might be placed in a fixed-length record.
variable
Method
2:
Make Records
a Predictable
we
good way
it
file
Number
of Fields Rather
some fixed number of
number of fields. This is a
contain
name and
address
file
we have
been
102
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
Ames
John
123 Maple
Stillwater
0K74075
Mason
Alan
90 Eastgate
Ada
0K74820
(a)
Ames John
;
123 Maple
Stillwater OK 74075
Unused space
(b)
Ames John 123 Maple Stillwater OK 74075 Mason Alan 90 Eastgate Ada OK
|
(c)
FIGURE 4.5 Three ways of making the lengths of records constant and predictable, (a) Counting
bytes: fixed-length records with fixed-length fields, (b) Counting bytes: fixed-length records with
variable-length fields, (c) Counting fields: six fields per record.
looking
The
at.
writstrm
program
each record
boundary information
to the screen
starts over.
Method
This
4.6a).
variable-length records.
Method
4:
We
is
look
at
it
more
We
us
We
Method
at a
can use an
The byte
compute
in the
file.
5:
is
let
record
Figure 4.6(b)
This option,
mechanism.
record level,
we
FIELD
103
delimiter character
want
our console,
delimiter for
files
common
we
UNIX
on
choice of a record
is
Not one of
file is
appropriate for
all
FIGURE 4.6 Record structures for variable-length records, (a) Beginning each record with a length
indicator, (b) Using an index file to keep track of record addresses, (c) Placing the delimiter '#' at
the end of each record.
(a)
Data
file:
T
Index
file:
00
40
(b)
Ames John 123 Maple Stillwater 0K 74075 #Mason Alan 90 Eastgate Ada 0K
;
(c)
04
FUNDAMENTAL
the writstrm
addressing
If
FILE
STRUCTURE CONCEPTS
we want
of every record
of
to
We
file.
files.
into
is
one
we
and
simply be
field delimiters as
as
we work
character array
we
collect
them.
Resetting the buffer length to zero and adding information to the buffer can
is
a little
more
4.7.
difficult.
much
number of ASCII
by.tes
(e.g.,
while
concatenate: BUFFER
endwhile
FIELD
DELIMITER
endwhile
the
solution in C, since
the
we
It
is
we
can
FIELD
interesting,
we might
between
choose,
and
instead,
this
same solution
to account for
field
05
in
some important
differences
Pascal:
Unlike C, Pascal automatically converts binary integers into characof those integers if we are writing to a text file.
Consequently, it is no trouble at all to convert the record length into
a character form: It happens automatically.
In Pascal, a file is defined as a sequence of elements of a single type.
Since we have a file of variable-length strings of characters, the natural type for the file is that of a character.
ter representations
as fixed-length,
in
File
Given our file
by record-length fields, it is
character form.
40 Ames John
\
06
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
readrec
PROGRAM:
RECORD_LENGTH
endwhile
end PROGRAM
FUNCTION:
:=
if EOF (INP_FILE)
then return
FUNCTION
for readrec,
and get_fld(
).
FIELD
easy to write
program
that reads
through the
file,
record by record,
The program
shown
is
in Fig. 4.9.
calls
that reads records into a buffer; this call continues until get_rec(
Once get_rec(
value of 0.
is
(SCAN_POS)
get_Jld(
or
the
in the
reads characters
returns a
position
107
from
).
The
call to get_jld( )
argument
list.
includes
scanning
SCAN_POS,
Starting at the
is
reached.
Function get_Jld(
returns
the
SCAN_POS for use on the next call. Implementations ofwritrec and readrec
in both C and Pascal are included along with the other programs at the end
of
this chapter.
dumps
of a File
Dump
of the
stored
in
the file
implementation,
look
where we choose
to
4.10(a).
In the
as
C
a
two-byte integer, the bytes look like the representation in Fig. 4.10(b).
As you can see the number 40 is not the same as the set of characters '4'
and '0'. The hex value of the binary integer 40 is 0x28; the hex values of the
x 30. (We are using the C language
characters '4' and '0' are
x 34 and
convention of identifying hexadecimal numbers through the use of the
prefix Ox.) So, when we are storing a number in ASCII form, it is the hex
FIGURE 4.10 The number 40, stored as ASCII characters and as a short integer.
Decimal value
of number
Hex value
stored
ASCII
character form
in bytes
(a)
40
34
30
'4'
'0'
(b)
40
00
28
'\0'
"('
08
FUNDAMENTAL
ASCII
values of the
number
STRUCTURE CONCEPTS
FILE
characters that
go into the
file,
itself.
Figure
4.
shows
10(b)
an integer (this
is
number
number 40
stored as
in binary
Now
we
the hexadecimal
terminal screen:
(Ames
tt_
^0x28
John
is
Blank, since
123 Maple
code for
ascii
'\0' is
Stillwater
OK
74075
$Mason Alan
tf_
^ 0x28
'('
unprintable.
...
'*'
ascii
is
Blank:
'\0' is
code for
unprintable.
using the
od
UNIX dump
-xc
< f
i 1
this
file,
time
UNIX command
Entering the
utility od.
ename>
Values
Offset
0000000
\0
3037
^ASCII
^Hex
\0
0024
7374
4561
3020
7
736f
6761
2
I
6461
4d61
7c39
6e
6572
357c
61
7c41
6174
6c77
696c
5374
3734
a
416c
7465
3320
3132
6e7c
657c
6f68
7c4a
S
4b7c
6e7c
0000100
7c4f
0000060
6573
706c
4d61
0000040
416d
0028
0000020
7c4f
4b7c
3734
3832
307c
As you can see, the display is divided into three different kinds of data. The
column on the left labeled Offset gives the offset of the first byte of the row
that is being displayed. The byte offsets are given in octal form; since each
line contains 16 (decimal) bytes, moving from one line to the next adds 020
to the range. Every pair of lines in the printout contains interpretations of
the bytes in the file in hexadecimal and ASCII. These representations were
requested on the command line with the -xc flag (x = "hex;" c =
"character").
Let's look at the first
row of ASCII
values.
As you would
'('
expect, the
RECORD ACCESS
09
But there
the
file
file in
0000000 \035\315
like this:
\0
1dcd 6500
The only
handles
of the others by
all
file is
Od
ASCII
representation.
file
we have
an interesting mix of
represents
structure
organizational tools
the
encountered. In
a single
number of
record
fields.
is
how
the
we have both
field (the
byte
common
in real-world
structures.
DEC,
If the
such
as a
if this
dump were
executed on an
way we
IBM PC,
of the first two-byte value in the file would be 0x2800, rather than 0x0028.
This reverse order also applies to long, four-byte integers on these
machines. This is an aspect of files that you need to be aware of if you expect
to make sense out of dumps like this one. A more serious consequence of
the byte-order differences among machines occurs when we move files
from a machine with one type of byte ordering to one with a different byte
ordering. We discuss this problem and ways to deal with it in section 4.6,
"Portability and Standardization."
4.2
Record Access
4.2.1 Record Keys
Since our
new
file
is
as
to
think in terms of retrieving just one specific record rather than having to
read
all
the
way through
the
file,
displaying everything.
When
looking for
1 1
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
an individual record,
it is
a key
based
on the record's contents. For example, in our name and address file we
might want to access the "Ames record" or the "Mason record" rather than
thinking in terms of the "first record" or "second record." (Can you
remember which record comes first?) This notion of a ke_y is another
fundamental conceptual tool. We need to develop a more exact idea of what
a key is.
When we
want
name Ames, we
form "AMES",
"ames", or "Ames". To do this, we must define a standard form for keys,
along with associated rules and procedures for converting keys into this
standard form. A standard form of this kind is often called a canonical form
for the key. One meaning of the word canon is rule, and the word canonical
means conforming to the rule. A canonical form for a search key is the single
representation for that key that conforms to the rule.
As a simple example, we could state that the canonical form for a key
requires that the key consist solely of uppercase letters and have no extra
to recognize
blanks
it
even
at the
to the canonical
form
"AMES"
record. If there
a single record,
is
not
fits a
John Ames's
different people
finds.
The
simplest solution
is
file.
When
for several
it
provide
way of
The prevention
takes
new
file
program respond?
first John Ames that it
the
Certainly
place as
the key
it.
It is
a single
if
uniquely.
An
It is also possible, as we see later, to search on s econdary keys
example of a secondary key might be the city field in our name and address
file. If we wanted to find all the records in the file for people who live in
towns named Stillwater, we would use some canonical form of "Stillwater"
as a secondary key. Typically, secondary keys do not uniquely identify a
.
record.
RECORD ACCESS
uniqueness.
A name is
a perfectly fine
The reason
in a retrieval
two names
a
name
is
in the
a risky
think
we
are choosing a
same
file
choice for
if
it
is
often an
too great
is
will be identical.
a
primary key
unique key,
in fact
1 1 1
is
dataless.
that
it
contains
Even when we
is
danger that
personnel records.
It
represented in the
file,
a large
citizens
were included, and in a different part of the organization all of these people
had been assigned the Social Security number 999-99-9999!
Another reason, other than uniqueness, that a primary key should be
dataless is that a primary key should be unchanguw. If information that
a certain record changes, and that information is contained
primary key, what do you do about the primary key? You probably
cannot change the primary key itself, in most cases, because there are likely
to be reports, memos, indexes, or other sources of information that refer to
the record by its primary key. As soon as you change the key, those
corresponds to
in a
become useless.
A good rule of thumb is to avoid trying to put data into primary keys.
If we want to access records according to data content, we should assign this
content to secondary keys. We give a more detailed look at record access by
references
primary and secondary keys in Chapter 6. For the rest of this chapter, we
suspend our concern about whether a key is primary or secondary and
concentrate simply on finding things by key.
file,
you should be
able to write a
program
that
record with
particular key.
1 1
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
Developing
work
memory, we
number of comparisons required for the search as the measure of work. But,
given that the cost of a comparison in
is so small compared to the cost
of a disk access, comparisons do not fairly represent the performance
RAM
count low-level
READ(
file
storage. Instead,
calls.
we
on secondary
as
system buffering
in
Chapter 3 that
enough
to
correct to be useful.
Suppose
we have
file
read in only
makes 1,000
is
a single record. If
READ(
calls
the
it is
first
we want
one
in the
file,
to use a
)
calls are
program has to
file, the program
the
we
If
maximum number
of
in a
we
file,
READ(
records in the
calls.
is
also
required.
calls
Using
of 2,000 records,
In other words, the amount of work
file
number of
file.
In general, the
with n records
is
work
proportional to n;
it
takes at
most
record in
a file
comparisons; on
//
average
to be
We learned in
performance.
disk access
""If
is a
is
this
it
up.
Knuth
(1973a)
RECORD ACCESS
and reading
(Once
we
again,
are
assuming
two
call.) It
follows that
we
all
at
greater
seek
is
should be able
several records
is
successive records.
READ(
1 1
in a block
of
in
RAM.
We
fields,
block
size
usually related
is
more
than to the content of the data. For instance, on sector-oriented disks the
block size
is
Suppose we have
size.
down
a file
to 125.
transferred
There
number of reads.
from
this analysis
and discussion of
record blocking:
it
The
result in substantial
performance improve-
does not change the order of the sequential search operacost of searching
is still
RAM
RAM, and it probably increases the amount of data transbetween disk and RAM. (We always read a whole block, even
if the record we are seeking is the first one in the block.)
Blocking saves time because it decreases the amount of seeking. We
find, again and again, that this differential between the cost of seeking and the cost of other operations, such as data transfer or RAM
done
in
ferred
access,
is
file
structure design.
FUNDAMENTAL
1 1
FILE
STRUCTURE CONCEPTS
When
is
file
It is
two major
practical advantages
it
requires
structures.
to
ASCII
files in
which you
some
Files that
(e.g.,
10 records);
(e.g.,
tape
files
usually
Files in
value,
computing
for
so
this, as
we
UNIX is
many
utilities
for
Sequential Processing
-r-A%
possible, white space as the field delimiter. Practically all files that
using
we
create
UNIX editors use this structure. And since most of the built-in C and
it is
common
to
is
in length, so
record with
fields (a
blank
new
is
line.
RECORD ACCESS
is
UNIX
For example,
such
utilities,
cat
those
1 1
it
we
as
>cat myf i 1
Stillwater DK 74075
Ames
John 123 Maple
Alan 90 Eastgate Ada
DK 74820
Mason
Or we
wc and
files.
wc The command wc
>wc myf
i 1
grep
It is
common
character string in
sequentially,
(and
it.
recognize. In
a pattern,
its
if a text file
has a certain
word
or
provides an excellent
its
file
to
UNIX
regular expression,"
the
76
14
filter
The word
for
doing
that grep
is
(the console)
all
able to
file
for
the lines in
i 1
90 Eastgate Ada
on the
OK
>grep Ada
fly,
74820
word Ada:
wc
36
number of
files
other
The most
record
is
a retrieval
to a record
mechanism known
when we
as direct accesj.
1 1
FUNDAMENTAL
read
is
STRUCTURE CONCEPTS
in.
it
O(l);
with
FILE
a single seek.
Direct access
required record
predicated on
is
is.
Sometimes
We
file.
knowing where
this
moment, we assume
we know
we do
that
file.
The
first
record in
a file
has
RRN 0,
"
RRN
1,
and so
forth."
In
assigning
record.
far,
this
which
records, but
records as
we
we
still
given the
structures
file
The
we want
we want. An exercise
sequence of
in the
we
RRN
file,
counting
chapter explores
processing,
a record's
RRN
to the start
an
RRN
record,
of the
file.
For instance,
we
file
if
we
of the
as follows:
= 546 x
128
file
byte offset of
offset
record with an
RRN
Byte
offset
of
//
69,888.
t In
this
:av-lhised count. In
some
file
is
size
n X
v,
the
r.
differ
with regard to
and Turbo
is
is
start
Pascal,
at
we assume
1
that the
rather than
0.
to
whether
RRN
is
MS-DOS
command
a file
to
in
is
jump
files.
In
and
sequence of
program does
UNIX
(and the
1 1
is
RRN
movement
the
files;
wholly
within
to
a record's
the
determination
programmer's concern.
If
we
at all in
of
no seeking
we
is
said earlier,
is
many implementations of
Pascal extend the standard definition of the language to allow direct access
to
different
locations in a
file.
The nature of
these extensions
varies
according to the differences in the host operating systems around which the
extensions were developed. All the same, one feature that
across implementations
is
that a
in Pascal
file
is
consistent
a single type.
which
datarec
number
into the
4.3
is
this file
is
in
to say in multiples
3 (zero-based count),
of
I
65-byte entity.
am jumping
If
195 bytes
ask to
(3
jump
X 65 =
to
195)
file.
Once we
is
a record,
we
we
RRN
to
of the
fields
we want
to store in
1 1
STRUCTURE CONCEPTS
FUNDAMENTAL
FILE
the record.
of
is
easy.
Suppose we
are building a
file
sales
transaction:
six-digit account
number of the
purchaser;
A
A
A
five-character stock
number
These are all fixed-length fields; the sum of the field lengths is 30 bytes.
Normally, we would simply stick with this record size, but if performance
is so important that we need to squeeze every bit of speed out of our
retrieval system, we might try to fit the record size to the block
organization of our disk. For instance, if we intend to store the records on
a typical sectored disk (see Chapter 3) with a sector size of 512 bytes or some
other power of 2, we might decide to pad the record out to 32 bytes so we
can place an integral number of records in a sector. That way, records will
never span sectors.
The choice of a record length is more complicated when the lengths of
the fields can vary, as in our name and address file. If we choose a record
length that is the sum of our estimates of the largest possible values for all
the fields, we can be reasonably sure that we have enough space for
everything, but we also waste a lot of space. If, on the other hand, we are
conservative in our use of space and fix the lengths of fields at smaller
values, we may have to leave information out of a field. Fortunately, we can
avoid this problem to some degree through appropriate design of the field
structure within a record.
In our earlier discussion
approaches
we
can
fixed-length record.
The
first,
general
in Fig.
This
previously described.
is
the approach
we
illustrated
The
first
fields
from within
variable-length record.
It is
fixed-length record.
The second
field.
By
we
make
the two
can
119
Ames
John
123 Maple
Stillwater
0K74075
Mason
Alan
90 Eastgate
Ada
0K74820
(a)
Unused
space-
Unused space
(b)
FIGURE 4.1
length record, (a) Fixed-length records with fixed-length fields, (b) Fixed-length
length information,
two approaches.
The programs
programs
at
and
which
update. pas,
change
it,
it
of
user to
Given the
this
update. c
retrieve a record,
a file
we might
is
of the
fields in
file,
an appropriate choice.
One of the
kind of structure
convenient to use
tations. In the
is
that
version
we
fill
we
and
situation.
we
the
two implemen-
is
no
how many
is
in the
single right
of the bytes
20
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
dump
number of
The output
records,
introduces
which we discuss
at the structure
start
start
we do
the record.
It is
about
at the
beginning of the
file
use of the
file.
some
general information
header record
is
often placed
in
some
make
file
from having to know a priori everything about its structure, and hence
making the file-access software able to deal with more variation in file
structures.
The header record usually has a different structure than the data records
file. The output from update. c, for instance, uses a 32-byte header
in the
record, whereas the data records each contain 64 bytes. Furthermore, the
by
tells
how many
whereas the
file.
Pascal
some
file,
variant
record in Pascal
is
its
same
use as
header record
is
When
a variant
it
size,
must be the
file.
we
can use in
a file,
we
i_
+-
o
u
c
3
a
o
3
QJ
i-
QJ
l_
TJ
O
u
QJ
"H
o
U
TJ
>
i_
-C
._L
+J
to
4-
-M
$2
^A ^
0)
djDlz:
cz
CJ
i_
-^
>>
cn xi
QJ
--"
QJ
i-
in
X)
L.
ID
O
u
OJ
+->
in
QJ
i.
-t->
OJ
(=
CK
4-
=>
C=
in
QJ
l.
CJ
QJ
QJ
JZZ
CO
O
3
QJ
-t->
-t-J
+->
>>
X
+J
X
ID
_i
ro
o ^
CD O
(=
ID
ID
"D
<L
C
O
in
ID
*ID
TD
^-
4^r
CJ
r^
c\j
OJ
r^
co
oo
CD
"d"
en
r^
T
CD
U r^
r*
rs
QJ
CD
CD
CO
CD
<+-
CD
CD
m o
CVJ
T
cn
CD
o o
O O
o o
o o
o o
CD
CD
CD
r,
ID
cn
-h*
ro
r*
tn
cvj
r*
xi
-q-
4^a-
r^
1^
u ,_
CD
,
CD
CD
sr
CD
u ,_
CD
T
\T
00
^T
O
r^
\r
r^
i_n
^r
oo
i>.
CD
r^
00
OJ
^r
r^
LD
00
CD
r^
00
iv
uo
CD
U
1^
LD
CD
u u u
CD
CD
4CD
ro
<r
00
rs
cn
r^
r-.
o
t r^
o o o o
"srcDooj
o O t- rO CD O CD
O O CD CD
o o o o
O CD CD CD
sr
ID
lu
in
111
i.
<*
*
^
QJ
cn
+-1
^
0
m *
ID
TJ
(vj
10
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CVJ
00
00
CJ
03
->
QJ
CD
<
CVJ
CVJ
CD
CVJ
CD
sr
cD
CVJ
o o
o o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
,_
CD
[V
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
O
CD
CD
CD
CD
CD
CD
CD
ro
CD
CD
CD
O
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
O
O
CD
CD
CD
\r
CVJ
CVJ
co
1^
00
cn
00
00
CD
CD
CD
CD
CD
CD
CD
O O O
CD
CD
CD
^-(Docvj
,-
O
CD
CD
CD
CD
CD
O
CD
CD
CD
CD
O
O O O O
CD
CD
CD
CD
CD
CD
,_
XI
TT
CD
CD
CD
CD
CD
CD
CD
CD
O
CD
CD
O
CD CD CD O
O O O O
CD
CD
CD
CJ
s*
CD
CD
O O
CD
CD
CD
CD
CD
CD
CD
CD
O
CD
CD
CD
CD
^r
CD
CD
CD
CD
CD
CD
CD
CD
O
O
O
O
O
O
CD
CD
CD
CD
CD
o
CD o
CD rCD CD
O O O O O
CD CD CD CD CD
CD CD O CD CD
cvj
CD
CD
CD
CD
r--
i*v
r^
CD
o o
o o
o
CVJ
u ,_
cvj
CO
T
^3-
CVJ
^r
t>
UO
CO
OJ
CD
CD
00
CD
4CD
CJ
CD
CD
CD
CD
r>
CVJ
r-
L0
00
QJ
\r
CVJ
CD
r^
r^
^r
1^
00
CVJ
CD
00
r^
00
uo
oo
CVJ
ro
0-
r^
uo
CD
uo
CD
X)
CD
T
\r
O
O
CD
CJ
cj
o
o o
CVJ
,_
co
\r
iv
00
ro
0J
^r
r^
CD
^r
r^
CVJ
CVJ
o X o
^d-
CO
,_
4-
cvj
CD
XI
^T
CVJ
00
00
CO
r^
CO
* o
cn
r\ o
o
o
o
4- ,_
CD
r^
O
[N.
OsJ
o
o
CVJ
o CD
o o
CVJ
-Q
^.
CD
Ul
M-
cvj
Cvj
CVJ
CO
CJ
CO
CVJ
I*"".
CVJ
00
CVJ
Cvj
00
CO
oo
^r
00
r^
00
00
CJ
oo
CD
OJ
o o
o o
CVJ
CVJ
CVJ
CVJ
ro
^r
cvi
cvj
o x o o
o o
(J)
4-
Cvj
ro
*-
CD
oj
CD
r^
r^
cvj
(D
CD
CD
oj
CD
oj
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
CD
Cvj
ST
=5
c5
03
CD
_Q
<D
o E
si
CD
OJ
OJ
CD
TT
OJ
CD
CD
CD
CD
-,,
c/)
J-;
Cj
>^
_Q
T3
CJ
CO
0J
oj
fC
o
c
'"5
CD
^ x:
ro
j2)
^:
"o
O 4-
<D
03
^^
TJ
03
CD
CJ
x:
djo
OJ
CJ
CD
h-
C3
CN
O c
X
i_
O O O O O
CJ
CVJ
CVJ
CJ
QJ
OJ
CVJ
CD
T3
-6
o1
o ci
Cvj
cvj
CVJ
ro
CJ
X X 3
to
u
X 6 TJ
^-
cvj
CD
OS)
+->
O O
O
CD O
oj oo
OJ
O O O CD O
CD CD O CD O
CD CD O CD O
CD
JC
4'
JC 0)
r^
CD
-3
CVJ
cvj
CD
*Q
00
>^
TJ JQ
CJ
CJ
i
CVJ
CVJ
03
X
o
> O
o o
CD o
o
u o
CD o
CVJ
(V\
CO
CJ
Ih.
o o
o CD
CVJ
>^
*- jd
CD
cc
CVJ
CO
CD
Q.
t
CJ
CVJ
rv
in
CD
O
^
CVJ
"3
CVJ
c:
..
nr
~ CO
<r
o o
o o
cu
T O CD
CM
o O
1^
T
CD
CD
-*=
CJ
cu
O O
>
= TD
-^ ?!
^fe
CJ
*-
in
CO
X3 "O
fO
<
CD
CD
CD
CD
<D
JV
ro
OJ
CO
o
o 2
O
o
DJD
CO
GO
ro
l/l
^
ID
QJ
in
CD
cn
o
o
o
o
o
o
o
o
o
-.
E17
-t->
*-
uo
\r
<v
x:
<,
^"^
<_
EI
cx>
~3
2 ffl
O '^
CJJ
<=
s
d.1
CUD
CJ
CJ
DJO
g?
-
QJ
CD
E
C
CO
c
x:
-trL
CU3
-t-^
*f 00 >,
JQ CD
LU TJ
II
O "O
CD
CJ x:
CD CJ 03 cd
X
u_ cn c= x=
121
22
FUNDAMENTAL
FILE
We
to tricks.
STRUCTURE CONCEPTS
use such
We just
use the
initial
integer
record for
when we
example,
a different
we
of tree-structured indexes for files, we see that header records are often
placed at the beginning of the index to keep track of matters such as the
RRN of the record that is the root of the index. We investigate some more
elaborate uses of header records later in this chapter and also in subsequent
chapters.
4.4
Access and
File
In the course
File
Organization
of our discussions in
we have
this chapter,
looked
at
Variable-length records;
Fixed-length records;
Sequential access; and
Direct access.
The
two of these
first
with
a useful one;
is
relate to aspects
access.
The
interaction
we need
to look at
between
it
more
organization and
file
file
this chapter.
so far
falls
file
organization:
Can
Is
the
file
be divided into
fields?
file
that
combines the
Do
How
How
We
have seen that there are many possible answers to these questions and
of a particular file organization depends on many things,
including the file-handling facilities of the language you are using and the
use you
want
Using
to
file
implies access.
We
looked
first
at
we
sequential
did not
access,
know where
to us.
23
When we wanted
directly to
it.
In other
fixed-length record
file
organization.
Does
access
this
caused us to choose
mean
that
we
can equate
is nothing
about our having fixed the length of the records in a file that precludes
sequential access; we certainly could write a program that reads sequentially
through
fixed-length record
Not only
can
sequentially, but
we
we
elect
file.
to
simply by keeping a list of the byte offsets from the start of the file for the
placement of each record. We chose a fixed-length record structure in
update. c and update. pas because it is simple and adequate for the data that we
want to store. Although the lengths of our names and addresses vary, the
variation is not so great that we cannot accommodate it in a fixed-length
record.
would be
disastrously
wasteful
variable-length records
4.5
is
Now
we
have
a grip
124
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
We begin
The
history of
file
structures and
file
fields
for instance,
process and transmit sound, and they could process and display images and
documents
(Fig. 4.13).
fit
that
we
the
The notion
medium
is
that
we need
a disk.
as
it
appears on
fit
fields.
term
a particular
that encourages
25
terms of
how
might physically be
One way
file is
to
a medium-oriented view.
methods of abstract data models are described
access
that
how
the data
stored.
we
know
keep information
about objects
in a
might be done
this
is
to put
We have
seen
how
Files
many
file.
If
we
self-describing.
store in a
file
the following
information:
A name
We
can now write a program that can read and print a meaningful display
of files with any number of fields per record and any variety of fixed-length
widths. In general, the
field
file's
header,
As
usual, there
is
files in
file
structure information
the less
structure of an individual
structures of
more
put into
file.
a trade-off: If
the
know
we
programs
we do
that read
file
headers.
4.5.3 Metadata
Suppose you
are an
astronomer interested
you want
of these images
in
to design a
(Fig. 4.14).
You
file
expect to have
many
images, perhaps thousands, that you want to study, and you want to store
one image per file. While you are primarily interested in studying the
images themselves, you will certainly need information about each image:
where in the sky the image is from, when it was made, what telescope was
used, references to related images, and so forth.
126
FUNDAMENTAL
STRUCTURE CONCEPTS
FILE
data
in
the
itself.
same
file
(This image
as the
shows
by many
metadata,
it
A common
may
Typically, a
community of users of a
file is
FITS
(Flexible
file itself.
on
standard format
International Astronomers'
Union
just described in a
blocks of 80-byte
file's
header.^
ASCII
records, in
binary
rather are first processed into a picture and then displayed, binary format
t For
more
Readings."
details
on FITS,
end of
this
chapter
"Further
is
=
=
=
=
=
=
=
-
27
16
'
'
'
FIGURE 4.15 Sample FITS header. On each line, the data to the left of the 7* is the actual
metadata (data about the raw data that follows in the file). For example, the second line
("BITPIX = 16") indicates that the raw data in the file will be stored in 16-bit integer format. Everything to the right of a V is a comment, describing for the reader the meaning of
the metadata that precedes it. Even a person uninformed about the FITS format can learn a
great deal about this file just by reading through the header.
28
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
FITS image
itself is
we
look
Another example
is
which
at next.
From
a user's
device as
it is
point of view, a
a data processor.
spreadsheets, or numbers,
we
on
is
are likely to be
a rectangular array
a screen.
The dimensions of the image: the number or pixels per row and the
number of rows.
The number of bits used to describe each pixel. This determines how
many
display only
two
and so
2
),
image can
image can
1-bit
2-bit
(2
),
forth.
and white.
2-bit
image uses
a table
is
to be assigned
color lookup
so forth.
If
that
Then
type,
what
are
some methods
computer:
Display an image in
window on
console screen;
element."
a particular
The
an example of
is
129
Keywords
described earlier,
nique,
be contained
in
One
File
(Fig.
for identifying
in
we
headers,
fields
we
files,
we saw
demanded
the
file.
In
file
whereas
it
was inappropriate
keywords
FITS
files
When
size
the
of
is
Tags
the
image
we
via
included, the
is
keywords of
file
header,
see that a
can be
file
sizes in a file,
(Fig. 4.16).
Now we
may
what the
view
new
scientist learned
mixture ot
our previous file
kind of file structure.
file as a
that
There are many ways to address this new file design problem. One
would be simply to put each type of object into a variable-length record and
...
in a file.
MMIS1
KAXIS2
600
max: si
".-
MMUS4
kax:i<
yjvx
ssoale - :.::
BZERO - 1S8E<
BZES
U5x=rO
500
>
t::
l
1S8E14
U5y=rA
30
FUNDAMENTAL
write our
The
like:
FILE
file
first
STRUCTURE CONCEPTS
record
is
is
first
know what
the
is
is a
is
it
has
some
familiar
images
The
in large files
file
time consuming.
We
notebook for some of the images (or in some cases leave out the
notebook altogether) without rewriting all programs that access the
file
is
file's
structure.
hinted
at in
the
line
begins with a
Why
line.
the
the
as well as the
images and
enough
Place the
keywords
is
let it
be big
offset
of
the actual metadata (or data) and a length indicator that indicates
how many
The term
this
In
type of
it,
we
tag is
file
file.
structure.
file
design:
(1)
the
use of an index table to hold descriptive information about the primary data,
and
(2)
one another
Tag
structures are
For example,
mixture of objects
and content.
file a
in structure
common among
a structure called
popular tagged
file
objects
standard
file
from
HDF
Format)
is
very
(Hierarchical Data
Format) is a standard tagged structure used for storing many different kinds
of scientific data, including images. In the world of document storage and
retrieval, SGML (Standard General Markup Language) is a language for
131
Index table
with tags:
header
notes
header
image
rrx
SIMPLE - -7HAX1S - 4
HAXIS1 - :.:
KAXIS2 - *::
hax:s3 - l
HAXIS4 > i
8SCALE - 0.015
BZERO - 158EU
=:v?le
MAXIS
MAX! SI
MAX Is;
500
600
"_i
MAX1S4
BSCALE
BZEfcO
0.015
i53E-:4
U5x=ro
FIGURE 4.17
Same
describing
document
mark up
that
file
provided
at
is
The
idea of allowing
When we want
to read an object
of
a particular type,
how
do
we
When we want
how
and where do we
and where exactly do we put the object?
Given that different objects will have very different appearances
within a file, how do we determine the correct method for storing or
store
its
file,
tag,
The
first
tags
and pointers to
are dealt
with in
32
FUNDAMENTAL
detail
question,
STRUCTURE CONCEPTS
FILE
Chapter
how
6,
so
we
implications that
we
on
briefly touch
4.5.6 Object-oriented
File
We
abstract data
methods
The
third
here.
Access
is
essentially an
in-RAM,
application-
oriented view of an object, one that ignores the physical format of objects
as
files.
Taking
this
things:
RAM
mats.)
File access
access,
emphasizing the
oriented
between
parallels
it
programming paradigm.
As an example
that illustrates
call
RAM
on
8-bit images,
stored in FITS
files in a
RAM
RAM
Tagged
file
file
organization
accompanied by
and
a
file
way
access.
specification
to
implement
The
this
specification
conceptual view of
ot
tag
can
be
133
program find_star
read_image ("starl"
process image
image)
image
RAM
(FITS
file)
Disk
FIGURE 4.18 Example of object-oriented access. The program find_star knows nothing about
file format of the image that it wants to read. The routine readjmage has methods to
convert the image from whatever format it is stored in on disk into the 8-bit in-RAM format
required by find_star.
the
of an application.
Indeed,
format lends
itself to the
model from
object-oriented approach.
4.5.7 Extensibility
One of the
do not have
software
to
may
know
a priori
what
all
files is
We
that
like that
if
we
our
our
34
STRUCTURE CONCEPTS
FUNDAMENTAL
FILE
program
methods
for reading
software.
4.6
Portability
and Standardization
we have just
seen
is
the
up on, and that they are somehow compatible with all of the different
programs that will access them. In this final section, we look at two
complementary topics that affect the sharability of files: portability and
standardization.
Differences
file
file
it
In
that
encounters
systems
this is
Chapter 2
in the section
MS-DOS
adds an extra
Differences
among Languages
header records,
forced to
we
make our
chose to
when
discussing
bytes, but
allows us to
we were
35
fixed record lengths according to our needs, but Pascal requires that
all
records in
This
nontext
file
illustrates a
be the same
size.
way
you define
structures within a
may
among
files:
The
be constrained by the
file.
shows
this
0000000
The
,
or
two bytes contain the number of records in the file, in this case
If the same C program is compiled and executed on an IBM
first
20 16 or 32 10
PC
first line:
VAX,
the hex
0000000
Why
dump
like this:
that in
by the high-order
byte.
IBM
dumps we saw
that the
hexadecimal
value of 500,000,000 10
an
file
0000000
this:
0065 cdld
RAM
struct
{
i
char
>
i t
cost;
i den t
em
write (fd,
&item,
sizeof (item));
and you want to write files using this code on two different machines, a
Cray 2 and a Sun 3. Because it likes to operate on 64-bit words, Cray's C
36
FUNDAMENTAL
STRUCTURE CONCEPTS
FILE
compiler allocates
it
allocates
minimum
When
it
in a struct, so
to the
file.
Text
is
encoded differently on
also
two
that use
EBCDIC^
among
Differences
insure
it.
Here
are
some
guidelines.
Agree on
physical standard
Unfortunately, once
"improve" on
a
it
standard
by changing
is
its
it
in
is
established,
it
is
very tempting to
it
no longer
few times
in its
all
37
format.
One way
to
make
files
power
is
to
make
a
it
wide
such
standard.
FITS headers
are
in blocks
of 36
records each, and FITS images are stored as one contiguous block of
Agree on
most
to read
and write
in
most
The two
common types
is an effort to go the rest of the way. XDR specifies not only a set
of standard encodings for all files (the IEEE encodings), but provides for a
set of routines for each machine for converting from its binary encoding
when writing to a file, and vice versa (Fig. 4.19). Hence, when we want to
store numbers in XDR, we can read or write them by replacing read and
XDR
program with
XDR routines.
The
applications,
character
versions of both
this text,
for
most
set.
used for more than just number conversions. It allows a C programmer to deoriginated as a Sun
scribe arbitrary data structures in a machine-independent fashion.
protocol for transmitting data that is accessed by more than one type of machine. For further information, see Sun (1986 or later).
*XDR
is
XDR
138
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
x:
XDR float
(&xdrs,
&x)
234.5
RAM
FIGURE 4.19
in
file.
to the
XDR_f loatC
tion
IBM
is
not feasible.
on
equivalents,
is
illustrated
in
39
Fig.
4.20(a).
But what if, in addition to IBM and VAX computers, you find that
your data is likely to be shared among many different platforms that use
different numeric encodings? One way to solve this problem is to write a
program to convert from each of the representations to every other
representation. This solution, illustrated in Fig. 4.20(b), can get rather
complicated. In general,
need n(n
messy. Not
for each
to
1)
you have
if
of where the
know which
many
file
1)
If n
is
be very
n(n
translators, but
translator to use.
Fig. 4.20(c).
(Why?)
different translators.
file,
n different
it
and
it
down
cut
to agree
the
number of
on
standard
XDR whenever
a different
to 2n, but
would probably be
XDR,
is
illustrated in
translators
from
XDR. One
is
that
it
1)
translators.
Conversion
File Structure
X-ray
raster
images of
a particular
For example, there are many software packages that deal with images,
and very little agreement about a file format for storing them. When we
look at this software, we find different solutions to this problem:
Require that the user supply images in a format that is compatible
with the one used by the package. This places the responsibility on
the user to convert from one format to another. For such situations,
140
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
From:
To:
(a)
Converting between
native format
From:
To:
(b)
To
8c
From:
From:
To:
XDR
FIGURE 4.20 Direct conversion between n native machines formats requires n (n - 1) conversion routines, as illustrated in (a)
and (b). Conversion via an intermediate standard format requires
2n conversion routines, as illustrated in (c).
System Differences
File
to
another,
Finally, if
such
as
you move
2,880-bytes
non-UNIX
UNIX
files
system
files
are
to tapes in
When
problem.
to deal
systems write
file
way
sizes,
UNIX
from one
files
with
and Portability
this
transferring
files
UNIX
as the block-size
dd.
Although dd
UNIX systems,
utility
it
is
can
provides the
Of
discussed here
is
UNIX
itself.
By
its
UNIX
makes
to the
problems
UNIX
encourages the use of the same operating system, the same file system, the
same views of devices, and the same general views of file organization, no
matter what particular hardware platform you happen to be using.
organization with
computers, including
Apple, IBM, Silicon
runs
some
flavor of
42
FUNDAMENTAL
UNIX,
STRUCTURE CONCEPTS
FILE
they
all
all
external storage
devices,
all
use
zation supports.
SUMMARY
The lowest
we
of organization
level
that
by storing
data in a
we impose structure on
to separate
one
field
file.
Begin each
it
field
number of bytes
that
contains.
Use
delimiters to
In the case
of
mark
the divisions
between
fields,
fields.
entities.
is
to use a
"keyword =
another useful
is
that tells
file
One
In this chapter
at
we
simple
file
individuals.
before
We use buffering to
we know
its
length indicator
it
to the
complete record
former
contents of our
file.
case,
it
is
for writing
length to write
allowing us to read in
digits.
programs
and reading
of variable-length records containing names and addresses of
file.
at
one time.
number
We
represent the
or as a sequence of
dump
to
ASCII
examine the
SUMMARY
we
In this chapter
through
a file
look
looking for
at
time for
process
record with
files,
is
sequentially are
files
wc and
Two
mechanism
The
directly,
for looking
This,
sequential
UNIX
utilities that
clear that
some of the
useful
grep.
methods provide
when
up or
in turn,
by RRN,
it is
offset
of
RRN
When
comes
can provide
the
records can result in expensive waste of space. In such cases the designer
Sometimes
such as the
it is
at
number of
beginning of the
file it
is
files,
kind of
information.
organization.
is
that
it
allows
access that is
collection of fields
143
44
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
is
which the file contains data that describe the primary data in
FITS files, used for storing astronomical images, contain extensive
metadata, in
the
file.
The
makes
file.
it
When
models
mix
possible to
a variety
files
One
important
way
all
to foster portability
file
structures.
If a
it
is
standardization,
becomes necessary
to convert
from
one format to another, it is still often much simpler to have one standard
format that all converters convert into and out of. UNIX provides a utility
called dd that facilitates data conversion. The UNIX environment itself
supports portability simply by being commonly available on a large
number of platforms.
KEY TERMS
Block.
Byte count
field.
byte count
field at the
beginning of
field
allows
program
a variable-length
The
record
use of
Canonical form. A standard form for a key that can be derived, by the
application of well-defined rules, from the particular, nonstandard
form of the data found in a record's key field(s) or provided in a
search request supplied by a user.
Delimiter.
One
or
more
a file.
Direct access.
location of
A
a
file
accessing
mode
that involves
jumping
to the exact
is
usually
KEY TERMS
accomplished by using
its relative
record
characteristic
its
first
of some
file
it
accommo-
file
of
new
tags for
new
new methods
for
Field.
The
record in
usually
made up of several
file.
fields.
File-access
file.
a file is
In general, the
access.
method. The combination of conceptual and physiused to distinguish one record from another and one
File organization
cal structures
field
from another.
An
example of
kind of
delimited
same
organization
is
fields.
Fixed-length record. A
same length. Records
ters so
file
numbers of variable-length
file
organization in which
are
length,
it is
all
all
file
file
file
that
is
used to
organization.
way of performing
retrieval
based merely on
a record's position.
Metadata. Data in a file that is not the primary data, but describes the
primary data in a file. Metadata can be incorporated into any file
whose primary data requires supporting information. If a file is going to be shared by many users, some of whom might not otherwise
have easy access to its metadata, it may be most convenient to store
the metadata in the
file is
file itself.
A common
Object-oriented
file access.
form of file
access in
which applications
for translating to
145
46
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
one record.
Relative record
a variety
number (RRN). An
RRN
offset
of
record
Self-describing
files.
information such
as the
number
file.
file
file's
header
is
good
ac-
Sequential access. Sequential access to a file means reading the file from
the beginning and continuing until you have read in everything that
alternative
is
direct access.
a file,
we
file.
If
we
EXERCISES
1.
the text
might be appropriate.
field structures
described in
described.
2.
comma,
period, colon,
EXERCISES
Can you
escape.
4. Suppose you need to keep a file in which every record has both fixedand variable-length fields. For example, suppose you want to create a file of
employee records, using fixed-length fields for each employee's ID
(primary key), sex, birthdate, and department, and using variable-length
fields for each name and address. What advantages might there be to using
such a structure? Should we put the variable-length portion first or last?
Either approach is possible; how can each be implemented?
5.
One
describing
ZP
and
its
is
this
chapter
represented
is
is
called labeled. In a
preceded by
a label
address record,
it
might appear
as follows:
i 1 1
this
be
a reasonable,
water STDKZP74075bbbb
structure?
6.
Define the terms stream of bytes, stream offields, and stream of records.
available
to
you
in
the
in
PL/I or
in a file
8.
field
COBOL.
10. If
you
list
the contents of a
file
11. If a
key
in a record
is
is
the
first
147
48
FUNDAMENTAL
of the record,
field
it is
FILE
STRUCTURE CONCEPTS
from the
rest
of the
fields.
Explain.
It
"dataless,
how many
14. In
is
how many
If the file
is
to find a record
record
is
not in the
blocked so 20 records
What
we assume that
do the assumptions change on a
magnetic disk? How do these changed
assumptions
seek.
Look up
How
to a
of sequential searching?
Why
and fgrep.
single-user
15.
file? If the
egrep,
16. Give a formula for finding the byte offset of a fixed-length record
which the RRN of the first record is 1 rather than 0.
17.
Why
is
in
program? Does
it
help if
we have
The
How
update
must the
deletion if
program
file
we do
lets
change
if
we do want
to reuse the
space?
19. In
between
EXERCISES
variable-length record
20.
The following
file
filled in.
How
long
dump
if
file
describes the
first
in a record in a
a file
the
record?
first
What
are
its
of the
is
not
contents?
Assume
we have
that
variable-length record
with
for a record
file
a particular
RRN.
we
are looking
contents of
find the
22.
23.
Why
is
it
file
access
and
file
organization?
What is an abstract data model? Why did the early file processing
programs not deal with abstract data models? What are the advantages of
using abstract data models in applications? In what way does the UNIX
concept of standard input and standard output conform to the notion of an
abstract data model? (See "Physical and Logical Files in UNIX" in Chap24.
ter 2.)
25.
What
is
26. In the
about the
scientific
metadata?
FITS header
files's
in Fig. 4.15,
structure,
context in
149
50
FUNDAMENTAL
FITS header
27. In the
program
to
determine
STRUCTURE CONCEPTS
FILE
in Fig.
how
is
the
file?
4.15, there
must be
is
file.
of 2,880 bytes,
a multiple
file
we
list
"keyword = value"
the
How
object-oriented
file
file
structures?
access?
How
How
does
do tagged
how
tagged
file
file
is
this
notion
structure support
ity?
ways
What
is
XDR? XDR
is
"Further Readings"
ways
that
it
files.
at
actually
much more
you have
files.
we
XDR
documentation (see
look
chapter),
up XDR and list the
access to
supports portability.
Programming Exercises
33. Rewrite writstrm so
the
new
it
The output of
of
writrec
and
following fixed-field
15 characters
name:
15 characters
First
Address:
30 characters
City:
20 characters
State:
2 characters
Zip:
5 characters
Make
36.
it
program described
in the preceding
problem so
it
uses blocks.
EXERCISES
37.
position in the
can find
it
if
th
its
record in
would read through the first 546 records, then print the contents of
th
record. Use skip sequential search (see exercise 21) to avoid
the 547
reading the contents of unwanted records.
file, it
program
38. Write a
from
a separate transaction
file
Then assume
by key. In the latter
than^m^?
particular order.
that
are sorted
case,
efficient
39.
Make any
a.
or
all
how
can you
file
file
than
RRN.
b.
entire record.
c.
40.
Modify
file.
when
record exceeds
record
down
to an acceptable size
modifications that
41.
changes in
the transaction
file
it
might be desirable
to sort
by RRN. Why?
42. Write a
program
dump. The
file
dump
that reads a
file
should have
screen).
43.
Develop
a set
7,
1949,
Aug.
7,
common
151
52
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
44. Write a
a.
b.
c.
d.
e.
FURTHER READINGS
Many
textbooks cover basic material on field and record structure design, but only
few go into the options and design considerations in much detail. Teorey and Fry
(1982) and Wiederhold (1983) are two possible sources. Hanson's (1982) chapter,
"Choice of File Organization," is excellent but is more meaningful after you read
the material in the later chapters of this text. You can learn a lot about alternative
types of file organization and access by studying descriptions of options available in
certain languages and file management systems. PL/I offers a particularly rich set of
alternatives, and Pollack and Sterling (1980) describe them thoroughly.
Sweet (1985) is a short but stimulating article on key field design. A number of
interesting algorithms for improving performance in sequential searches are
described in Gonnet (1984) and, of course, Knuth (1973b). Lapin (1987) provides a
detailed coverage of portability in UNIX and C programming. For our coverage o{
XDR, we used the documentation in Sun (1986).
Our primary source of information on FITS is not formally printed text, but
online materials. A good paper defining the original FITS format is Wells (1981).
The FITS image and FITS header shown in this chapter, as well as the
documentation of how FITS works, can (at the time of writing, at least) be found
on an anonymous ftp server at the INTERNET address 128.183.10.4.
a
PROGRAMS:
153
FILEIO.H
C Programs
The
programs
listed in the
name and
Writes out
writstrm.c
The programs
address information as
files.
stream of con-
secutive bytes.
readstrm.c
Reads
writrec.c
readrec.c
stream
file as
it
to the screen.
at
fields
getrf.c
fields.
find.c
file
ular key.
Combines
makekey.c
first
and
last
and
ucase(
),
to a
found
key
in
in strjuncs.c.
strfuncs.c
update.
be
changed.
Fileio.h
All of the
programs include
useful definitions.
were
/*
fileio.h
to be run
on
Some of
a
UNIX
header
file
called fileio.h
system
fileio.h
might look
like this:
*/
(continued)
54
FUNDAMENTAL
ude <stdio.h>
<fcntl .h>
nc
nc lude
STRUCTURE CONCEPTS
FILE
'define PMDDE
0755
#define DELIM_STR
'define DELIM_CHR
#define out_s t
"I"
j
r ( f d
wr
5 )
e(
fd )
( s )
5 t r
write((fd),DELIM_STR,1
'define
d_t o_recbuf f ( rb
treat ( rb
en(
5 ) )
Id)
s t r
ca
t (
rb DEL IM_STR)
,
fw ritstrm.c
writstrm.c
creates name and address file that is strictly a stream of
bytes (no delimiters, counts, or other information to
distinguish fields and records).
/*
*/
'include "fileio.h"
#def ine
ma
ou t_s
)
t r (
fd
write((fd),(s),strlen(s))
s )
nt
fd
creaUf
((fd =
pr int
("f
exitd);
>
i 1
PMODE )
i 1 ename
opening error
,
<
0)
program
opped\n" )
");
printf("\n\nType
get s( last
>
in a last name
PROGRAMS: READSTRM.C
155
while (strlen(last)
0)
>
get s(
ip)
get s( las
t )
>
*/
Hieadstrm.c
/ *
reads t rm
reads
*/
^include "fileio.h"
int readf ield( int fd, char sM);
mai n(
int
f d
30
char
char
int
lename
1 d_count
f
f
[ 1
5
(continued)
156
FUNDAMENTAL
STRUCTURE CONCEPTS
FILE
opped\ n"
exitd);
close(fd)
>
int
eadf
d(
char
fd,
nt
sM)
int
char
-
0;
while
s
sti
read(fd,&c
(
[
++
'
\0
,1 )
>
&&
!=
DELIM_CHR)
/*
'
*/
return (i);
Pvv
Writrec.c
/ *
writrec.c
creates name and address file using fixed length (2-byte)
record length field ahead of each record
*/
#include "fileio.h"
char recbuf f MAX_REC_S ZE +11;
= {
char *prompt
'Enter Last Name -- or <CR> to exit
First name
[
'"
>;
Address
City
State
Zip
/* null string to terminate the prompt loop */
*/
ma
C
[
57
f d, i
shor
pr
char response 50
char f i 1 ename
5
i
PROGRAMS: WRITREC.C
rec_lgt h
nt
"
gets(filename);
if
((fd - creat(f
l 1
ename PMODE
,
) )
<
0)
program
opped \ n"
>
prmtf ("\n\n
get
7.
s", prompt
>
response )
while
( s t r 1
en( response
0)
\0
recbuf f [03 =
1 d_t o_r ecbuf f (recbuff
response)
for (i=1; *prompt[i] != '\0'
i + +)
'
response)
d_t o_r ecbuf f (recbuff response)
;
>
*/
>
/*
question:
How does the termination condition work in the for loop:
for (i-1; *promptEi] != \0
i++)
;
"
"
refer to?
*/
58
/ *
FUNDAMENTAL
STRUCTURE CONCEPTS
FILE
Readrec.c
readrec
...
^include "fileio.h"
main(
rec_count
fld_count;
5can_po5
short rec_lgth;
char f i 1 ename
5
char recbuff MAX_REC_S IZE +11;
char f ield[MAX_REC_SIZE +1];
fd,
int
int
pr
get
if
ename)
((fd = open(f
s ( f
pr
"
i 1
nt
exitd
"f
i 1
ename 0_RD0NLY )
opening error
i 1
<
0)
program
s t
opped\n")
);
ec_count =
5can_po5 =
while
((rec_lgth = get_rec ( fd recbuff
r
) )
>
0)
>
>
/*
lose(f d
C PROGRAMS: GETRF.C
59
rsetrf.c
/*
get rf
...
get_rec(
get_fld(
in
*/
'include "fileio.h"
short
rec__lgt h
get_fld(char field [
char recbuffM
short scan_pos
short rec_lgth)
short fpos = 0;
if
/*
*/
/* scanning loop */
while ( 5can_po5 < rec_lgth &&
(f ieldCfpos++] = recbuff [scan_pos++]
!=
DELIM_CHR)
'XO
is a field*/
/*delimiter, replace with null*/
else
fieldCfpos] =
ret urn( 5can_po5
60
FUNDAMENTAL
STRUCTURE CONCEPTS
FILE
Find.c
/*
find.c ...
searches sequentially through
particular key.
file for
record with
*/
^include "fileio.h"
#define TRUE
#define FALSE
1
main(
fd, scan_pos;
short r ec_l gt h
int mat ched
char search_key[30]
k ey_f ound
30
char f i 1 ename
5
char recbuff MAX_REC_S ZE +13;
char f ield[MAX_REC_SIZE +11;
int
[ 1
pr i nt
get s (
if
pr
lastC30], firstC303;
ename )
((fd = open(f
f
");
i 1
nt
exitd
"f
i 1
ename D_RD0NLY )
opening error
i 1
<
0)
program
s t
opped\n")
);
>
/*
get
search key */
matched = FALSE;
while
(Imatched && (rec_lgth = get_r ec ( fd recbuff
) )
>
5can_po5 =
recbuff, scan_pos, rec_lgth);
5can_po5 = get_f ld( las t
recbuff, scan_pos, rec_lgth)
5can_po5 = get_f ld(f i r s t
first, key_found);
mak ekey( las t
if (strcmp (key_found, search_key) == 0)
matched = TRUE;
;
>
/*
if
{
if
(ma t ched
*/
161
PROGRAMS: MAKEKEY.C
wh
else
printf ("\n\nRecord not found. \n");
>
>
ques tions:
/ *
-why does scan_pos get set to zero inside the while loop here?
-what would happen if we wrote the loop that reads records
((rec_lgth = ge t_r ec ( f d ecbuf f ) ) >
&&
like this: while
ma t ched )
,
*/
Makekey.c
/*
r s t
s )
...
function to make a key from the first and last names passed
Returns the key in
through the functions arguments.
canonical form through the address passed through the
argument s.
Calling routine is responsible for ensuring
that s is large enough to hold the return string.
Value returned through the function name
the string returned through s.
is
the
length of
*/
irst
[ ]
,char
sM)
{
i
nt
en
=
enf
lenl
s t
s t r t r
cpy (
'
'
*/
*/
62
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
Strfuncs.c
/ *
51
rf uncs . c
strtrim(s) trims blanks from the end of the ( nu 1 1 - ermi nat ed)
string referenced by the string address s.
Nhen
done, the parameter s points to the trimmed string
The function returns the length of the trimmed
string.
*/
nt
for
(i
strlen(s)-1;
/* now that
to form a
==
'
i--)
= \0
s[++i
return(i);
]
i>=0 &&
soM)
'a'
&&
*si
<=
'z')
*si
&
0x5f
*si)
>
fupdate
/ *
update c ...
program to open or create a fixed length record file for
Records to be
updating.
Records may be added or changed.
changed must be accessed by relative record number
.
*/
^include "fileio.h"
#define REC_LGTH 64
163
PROGRAMS: UPDATE.C
"
Zip:
mi
menu(
recbuffM);
read_and_show(
change( )
ma i n(
int
int
as k_i nf o( char
as k_r r n( )
);
menu_choice
rn
by t e_po5
char f i lename
5
long 1 seek ( )
char recbuf f MAX_REC_S ZE + 1];/*buffer to hold
[ 1
record*/
/*then CREAT*/
/*initialize header */
/*write header rec*/
>
else
swi
ch(menu_cho ice)
case
printfC Input
ask_info(recbuff)
by t e_po5 = head.rec_count * REC_LGTH
1 seek ( f d
(
ong ) byte_pos,0);
);
sizeof(head);
(continued)
64
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
write(fd,recbuff REC_LGTH);
head r ec_count + +
,
break
case
/*
2:
...
*/
>
...
*/
>
break
/* end swi tch */
>
/* end while */
;
>
>
/*
menu( ) ...
local function to ask user for next operation.
Returns numeric value of user response
*/
static menu( ) {
int choice;
char response [10
1;
printf ("\n\n\n\n
FILE UPDATING PROGRAMNn" )
pr intf ("\n\nYou May Choose to:\n\n");
printf("\t1.
Add a record to the end of the file\n");
pr intf ("\ t2
Retrieve a record for Upda t i ng\n")
pr int f ( "\ t 3
Leave the Pr ogram\n\n" )
.
*/
C PROGRAMS: UPDATE.C
65
");
choice:
>
/ *
as k_i nf o(
...
*/
fie ld_count
i nt
char response 50
,
i
]
for
/*
get
/*
for
the fields
(i=0;
++
'\0)
*/
*prompt[i]
!=
'NO
i++)
>
>
/*
as k_r rn(
...
*/
static as k_r r n(
i
nt
r r
char response
[ 1
M \n\n nput
i nt f
the Relative Record Number of the Record
that\n");
");
pr i nt f ("\ tyou want to update:
get s( response)
rrn = atoi(response);
return(rrn)
pr
/*
read_and_show( ) ...
Note that this
local function to read and display a record.
function does not include a seek -- reading starts at the
current position in the file
*/
static
ead_and_show(
(continued)
66
FUNDAMENTAL
FILE
STRUCTURE CONCEPTS
int
MAX_REC_S ZE +13;
I
scan.pos, data_lgth;
5can_po5 =
readCfd, recbuff ,REC_LGTH);
;
M \n\n\n\nExi 5 1 i
i nt f
ng Record Content 5 \n" )
/* ensure that record ends with
\
recbuff CREC_LGTH] =
pr
'
null
*/
data_lgth = strlen(recbuff);
while ((scan_po5 = get_f 1 d( f eld recbuff scan_pos
data_lgth))
,
printf
>
0)
\t%s\n", field)
>
/ *
changeC
...
*/
{
static change( )
char response [
(
"\n\nDo you want to change this record?\n");
i nt f
Answer Y or N, followed by <CR> ==>");
printfC"
gets(response)
ucaseC response, response);
returnCCresponseCO] '== Y') ?
0);
pr
167
PASCAL PROGRAMS
Pascal Programs
The
follows.
writstrm.pas
Writes out
name and
secutive bytes.
readstrm.pas
Reads
writrec.pas
readrec.pas
stream
file as
it
to the screen.
get. pre
find. pas
file
at
fields
fields.
in readrec.pas.
for a record
with
a partic-
ular key.
update. pas
to be
added to
file,
or old records to be
changed.
stod.prc
In addition to these
contained in Appendix
files,
on
there
is
at
a variable
of
which contains
listing
of
tools. pre is
We have added line numbers to some of these Pascal listings to assist the
reader in finding specific
The
files
program statements.
do not contain
and stod.prc.
main programs
168
PASCAL PROGRAMS
Writstrm.pas
Some
Without
it
line 6
to handle
this directive
a directive to the
is
keyboard input
we would
WHILE
as a
Turbo
Pascal
com-
standard Pascal
file.
in
we choose not
conforming to standard Passtrng type, which is a packed array
The length of the strng is stored in
come
closer to
ORD(X)
X
is
is
the character
the length of
the string.
so
further operation
all
PROGRAM writstrm
on
outfile will
out-
file.
NPUT OUTPUT)
,
stream of
{$B->
8
9
1
CONST
DELIM_CHR
MAX_REC_SIZE
11
'I';
=
255;
12
13
14
TYPE
strng
inp_list
filetype
15
16
17
18
19
20
21
22
23
=
=
MAX_REC_S ZE of char;
packed array
first address ,c ity, state zip)
( last
packed arrayC1..40l of char;
[
VAR
response
resp_type
filename
outfile
array [inp
inp_list;
filetype;
text;
list]
of
strng
it
to
{$1
24
25
26
27
28
tools. pre)
Another directive,
too Is. pre
30
31
);
32
33
34
35
36
37
38
39
40
to
exit:
);
'
41
'
42
43
44
45
46
47
48
49
writeC
City: );
ead s trCresponsetcity] )
writeC'
State: ');
r ead_5 trCresponset state]
writeC'
Zip: ');
r
)-,
read_5 trCresponselzip]);
write the responses to the file }
resp_type := last TO zip DO
f wr i t e_5 trCoutfile, response! resp_type
50
51
for
52
53
54
input
55
56
57
58
59
69
BEGIN {main}
writeC Enter the name of the file:
readlnCfi lename)
assignCoutfi le f i lename)
rewriteCoutfi le)
29
to
exit:
');
END;
closeCoutfile)
END.
Readstrm.pas
PROGRAM readstrm
{
NPUT OUTPUT)
,
CONST
DELIM_CHR
MAX_REC_SIZE
!
=
255;
(continued)
170
PASCAL PROGRAMS
TYPE
5 1
rng
letype
packed array
MAX_REC_S ZE
packed array M..40] of char;
lename
nf
of
char;
VAR
f
f
letype
str
integer
i nt eger
5 1 rng
ld_count
ld_len
{$1
text
le
tools. prc>
strng):
int eger
VAR
i
ch
integer;
char
BEGIN
i
:=
0;
ch
'
s t r
=
[
ch
END;
i
:=
strCOJ := CHR(i);
readf ield
=
i
END;
:
BEGIN {MAIN}
write ('Enter the name of the file that you wish to open:
readln (filename);
assign(infile,filename);
reset (infile);
f
ld_coun t
');
fld_len := readf ie 1 d( i nf i le 5 t r )
while (fld_len > 0) DO
BEGIN
fld_count := fld_count +
2)
writeC field #
f 1 d_count
{
write_str()
wr i te_st r(st r )
,
'
'
is
in
tools. pre
fld_len
:=
END;
ose
( i
nf
readfield(infile str)
7
le)
END.
Writrec.pas
Note about
PROGRAM wntrec
This
is
NPUT OUTPUT)
,
on
line 69,
we
must be separated by
integer variables
1
file.
{$B->
CONST
DELIM_CHR =
MAX_REC_SIZE
8
9
TYPE
5;
255;
1
1
'I'
12
13
14
15
16
17
18
19
20
MAX_REC_S ZE
C 1
.40
of
of
char;
char;
VAR
filename
outfile
response
buffer
rec_lgth
{$1
filetype;
text;
5 t r
ng
strng;
integer;
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
PROCEDURE
C
strng;
and
s:
strng);
delimiter to end of
VAR
strng;
d_5 t r
BEGIN
cat_str(buff,s);
d_str[0] := CHRC1 );
:= DELIM_CHR;
d_str[1
cat_str(buff d_s t r
:
END;
(continued)
172
37
38
39
40
41
PASCAL PROGRAMS
BEGIN {main}
write( 'Enter the name of the file you wish to create
readln(filename)
assignCoutfile,filename)
rewriteCoutfile);
');
42
43
44
45
46
47
48
49
50
51
writeC
52
53
54
55
56
57
58
59
60
ead_5 t r( response)
f 1 d__t o_b ufferCbuffer
response)
First name
writeC
Address:');
writeC
read_5 tr(response)
f 1 d_t o_buf f er (buffer response)
writeC'
r ead_5 trCresponse)
f 1 d_t o_buf f er Cbuffer response)
writeC'
read_5 t rC response)
f 1 d_t o_buf f er Cbuffer response)
City:
'
61
State:
);
62
63
64
65
66
67
68
69
Zip:');
>
70
71
'
'
72
73
74
75
76
77
78
{
prepare for next entry >
writeC'Enter Last Name -- or <CR> to exit:
read_5 t rC response)
END;
closeCoutfile)
END.
Readrec.pas
PROGRAM readrec
{
');
NPUT OUTPUT)
,
173
{$B->
CONST
=
i nput_5 i ze
255
DELIM_CHR = h
MAX_REC_SIZE = 255;
;
TYPE
npu t_5 i ze
of char;
strng = packed array
filetype = packed array M..40] of char;
[
VAR
f
i 1
ename
out f
le
ec_coun
5can_po5
{$1
{$1
ex t
nt eger
ec_l gt h
1
tools. pre
get. prc>
eger
i nt eger
i nt eger
strng;
strng;
i
d_coun
buffer
field
f
letype
BEGIN {main}
write( 'Enter name of file to read:
readln (filename);
as5ign(outfile,filename)
reset (outfile);
');
rec_count
=
5can_po5
=
rec_lgth := ge t_rec out f i le buff er )
while rec_lgth >
DO
BEGIN
,rec_count);
wr telnC Record
rec_count := rec_count +
f ld_count
=
5can_po5 := ge t_f 1 d( ield buf f er scan_pos r ec_l gt h )
while 5can_po5 >
DO
BEGIN
);
writeC
f 1 d_count
Field
writ e_5 tr(field);
fld_count := fld_count + 1;
5can_po5 := get_f 1 d( f i e 1 d buf f er scan_pos r ec_l g t h
:
'
END;
rec_lgth
:=
END;
close(outfile)
END.
74
PASCAL PROGRAMS
Get.prc
VAR buffer:
text;
strng);
integer;
{
A function that reads a record and its length from file fd.
The function returns the length of the record. If EOF is
encountered get_rec() returns
>
VAR
rec_lgth
integer;
space
char
BEGIN
if EOF(fd) then
get_r ec
=
else
BEGIN
readCfd ,rec_lgth)
r ead( f d
space)
f r ead_5 tr(fd, buffer, r ec_l gt h )
get_rec := rec_lgth
END
:
END;
FUNCTION get_fld(VAR
rng buff er
;
rec_lgth:
{
integer):
VAR
f
pos
nt eger
BEGIN
pos
scanpos := scanpos +
fieldCfposl := buf fer scanpos
while (f ieldtfpos] <> DELIM_CHR) and (scanpos
BEGIN
fpos := fpos
1;
scanpos := scanpos +1;
fieldCfposl := buf fer scanpos
1
<
rec_lgth)DO
END;
if
fieldCfpos]
f ieldCO]
:
DELIM_CHR then
CHRCfpos -
:=
=
,v~l-
175
else
fieldCO] := CHR(fpos);
get_fld := scanpos
END
END;
Find. pas
PROGRAM find
i
NPUT OUTPUT)
,
{$B->
CONST
MAX_REC_SIZE
DELIM_CHR =
255;
=
'I'
TYPE
VAR
f
i 1
ename
out f
last
le
first
search_key
length
matched
rec_l gt
buffer
5can_po5
key_f ound
field
$
{ $
tools, pre
get pre >
letype
ext
s t rng
s t rng
s t rng
t
nt eger
boolean
nt eger
strng
i nt eger
strng;
strng;
;
BEGIN {main}
writeC'Enter name of file to search:
readlnCf i lename)
);
as5ign(outfile,filename)
reset(outfile);
(continued)
76
PASCAL PROGRAMS
r (
las
');
t )
');
read_5 1 r(first);
makekeydast first sear ch_k ey )
,
matched := FALSE;
rec_lgth := ge t_rec out f i 1 e buff er )
while ((not matched) and (rec_lgth
Beg
>
0)) DO
=
5can_po5
5can_po5 := ge t_f 1 d( las t buff er scan_pos rec_l gth )
5can_po5 := get_f 1 d( f i r 5 t buf f er scan_pos rec_l gt h )
makekey(last first k ey_f ound )
if cmp_s t r ( key_f ound search_key ) =
then
matched := TRUE
else
rec_lgth := get_rec(out f i 1 e buf f er )
:
END;
lose( out
if
le)
if
wr
i t
>
5can_po5
{
break out the fields >
5can_po5 := ge t_f 1 d( f i e 1 d buf f er scan_pos r ec_l gt h )
while 5can_po5 >
DO
BEGIN
writ e_5 tr(field)
5can_po5 := get_f 1 d( f i e Id buf f er scan_pos rec_l gt h
,
END;
END
else
writeln(
'
);
END.
Update. pas
Some
strngs,
177
The
lines
seek() statements
on
PROGRAM update
file.
The
NPUT OUTPUT )
,
Turbo
Pascal.
{$B->
{
CONST
MAX_REC_SIZE
REC_LGTH
DELIMCHR
255;
64;
'I';
TYPE
s t r
packed array
MAX_REC_S ZE of char;
packed array M..40] of char;
RECORD
len
integer;
data
packed array
REC_LGTH of cha
ng
filet ype
datarec
END;
VAR
f
i 1
ou
ename
f
i 1
response
menu cho ice
strbuff
by t
head
r r n
drecbuff
i
ec_count
{$1
{$1
tools. pr c
>
stod.prc
>
{ $
get
pre
PROCEDURE
{
d__t
s t r ng
integer
datarec
integer
datarec
integer
integer
pos
filetype;
file of datarec
char
i nt eger
;
>
end of buff
and
s:
strng);
delimiter to the
>
(continued)
178
PASCAL PROGRAMS
VAR
d_str
strng;
BEGIN
ca t_5 t r ( but f 5 )
d_str[0] := CHRC1);
d_strt
=
DELIM_CHR;
cat_str(buf f ,d_str)
:
END;
Returns numeric
VAR
choice
int eger
BEGIN
writeln;
writelnC
FILE UPDATING PROGRAM' )
writeln;
writelnC'You May Choose to: );
writeln;
writelnC'
Add a record to the end of the file');
writelnC'
2.
Retrieve a record for updating');
writelnC'
3.
Leave the program');
writeln;
writeC'Enter the number of your choice: ');
r eadl nCchoice)
writeln;
=
menu
choice
:
'
END;
response
strng;
BEGIN
{
clear the record buffer
clear_strCbuff)
:
179
writeC
Address:
)
ead_5 tr(response);
f 1 d_t o_b ufferCstrbuff response)
94
95
96
97
98
99
100
1
01
02
City:
');
read_5 tr(response);
f 1 d_ to bufferCstrbuff, response);
writeC'
State: ');
r ead_5 tr( response);
f 1 d_t o_buf f er (strbuff
response);
writeC'
Zip:');
r ead_5 tr(response);
fid' to buff er C strbuff response);
,
03
04
07
writeC
105
106
1
'
'
wr
i t
END;
108
09
110
1
12
13
FUNCTION ask_rrn:
integer;
114
1
15
116
1
17
18
19
120
VAR
rrn
integer;
BEGIN
i t e 1 n(
npu t the relative record number of the record that');
writeC'
you want to update: ');
read ln(rrn)
wr
'
wr i t e 1 n
122
as k r un
=
rrn
123
END;
124
PROCEDURE read_and_show;
125
126 {procedure to read and display a record. This procedure does not
127
include a seek -- reading starts at the current file position
128
129 VAR
130
5can_po5
i n t eger
131
dr ecbuf f
da t ar ec
132
integer
133
data__l g t h
integer
134
field
5 t r ng
135
strbuff
5 t r ng
136
BEGIN
137
scan pos
138
readCoutfi le drecbuff)
139
140
<
convert drecbuff to type strng }
141
strbuffCO] := CHR( drecbuff 1 en )
142
for i :=
to drecbuff .len DO
121
>
(continued)
80
PASCAL PROGRAMS
143:
strbufffi] := dr ecbuf f da t a [ i
144:
145:
wr i t e 1 n( Ex i s t i ng Record Contents');
146:
writeln;
147:
148:
data_lgth := 1 en_s t r ( 5 rbuf f )
149:
5can_po5 := get_f 1 d( f i e 1 d 5 rbuf f scan_pos da ta_l g t h )
while scan_pos >
D)
150:
BEGIN
151:
write_5tr(f ield)
152:
153:
5can_po5 := ge t_f 1 d( f i e 1 d 5 rbuf f scan = pos,data:= lgth)
154:
END
155: END;
156:
157:
158: FUNCTION change: integer;
159:
160: { function to ask the user whether or not to change the
161:
record.
Returns
if the answer is yes,
otherwise.
162:
163: VAR
164:
char;
response
165: BEGIN
166:
writeln('Do you want to change this record?');
167:
wnteC
Answer Y or N, followed by <CR> = = >);
168:
readln(response);
1 69:
writeln;
y ) then
170:
if (response =
Y ) or (response =
=
171:
change
172:
else
73
change
=
174: END;
175: BEGIN {main}
176:
write( 'Enter the name of the file: ');
177:
read 1 n( f i 1 ename )
178:
ass i gn( out f i 1 e f i 1 ename )
179:
180:
write('Does this file already exist? (respond Y or N): ');
181:
read ln( response )
writeln
82:
183:
if (response = Y') OR (response = y') then
184:
BEGIN
185:
open outfile
>
r ese t ( ou t f i 1 e )
{
186:
get header
>
read(out f i le ,head)
{
187:
{
read in record count >
rec_count := head.len
188:
END
189:
else
190:
BEGIN
create outfile
}
191:
rewr te(outf i le)
(
initialize record count }
{
192:
rec_count := 0;
]
'
'
'
'
'
193:
194:
195:
196:
197:
198:
199:
200:
201:
202:
203:
204:
205:
206:
20 7:
208:
209:
210:
211:
212:
213:
214:
215:
216:
217:
218:
219:
220:
221:
222:
223:
224:
225:
226:
227:
228:
229:
230:
231:
232:
233:
234:
235:
236:
237:
238:
239:
240:
241
rec_count;
to REC_LGTH DO
head.dataU] := CHR(O);
wr i te( out f i 1 e head )
head.len
for
:=
:=
>
{
i
END;
>
'
END;
2
>
BEGIN
rrn
{
if
:=
ask_rrn;
if
else
BEGIN
seek(outf
le ,rrn)
...
>
if
...
>
then
change =
BEGIN
writeln(' Input the revised Values: ');
ask_info(strbuff );
convert strbuff to type
{
stod(drecbuf f st rbuf f )
datarec }
seek(outf i le,rrn)
wr iteCoutf i le ,drecbuf f )
END
1
(continued)
82
PASCAL PROGRAMS
242
243
244
245
246
247
248
249
250
END;
menu_choice
END; { while
:=
menu
251
252
253
END
END
CASE >
<
END
Stod.prc
PROCEDURE stod (VAR drecbuff: datarec; strbuff; strng);
{
>
VAR
i
i nt eger
BEGIN
drecbuff. len
:
for
:=
:=
mi n( REC_LGTH
to drecbuff. len DO
drecbuff. datati]
:=
:=
END;
:=
en_s
strbuffCi];
{
Clear the rest of the buffer
while i < REC_LGTH DO
BEGIN
drecbuff .datati]
END
'
'
r ( s t
rbuf f
) )
>
Organizing Files
for Performance
CHAPTER OBJECTIVES
Look
at several
Look
at storage
space in a
I
Develop
way of reusing
file.
avail
I
of
linked
lists
and
stacks to
manage an
list.
internal
Outline some placement strategies associated with the reuse of space in a variable-length record file.
a binary
search
I
Develop
a keysort
files;
in-
pinned
record.
183
CHAPTER OUTLINE
5.1
Data Compression
Using
5.3
Notation
5.1.2 Suppressing Repeating
Sequences
5.1.3 Assigning Variable-length
5.1.1
a Different
Codes
5.1.4 Irreversible
Compression
Search
Techniques
5.1.5
5.2
Compression
UNIX
in
Reclaiming Space in
5.2.1
Search
5.3.4 Sorting a Disk File in
Files
5.3.5
RAM
Compaction
Keysorting
5.4
Reclaiming Space
Dynamically
for
Description of the
5.4.1
Method
Records
5.4.3
We
how
consider
how
a file
to
is
Method
important
be accessed
file
when
some
for the
is
cases reorganize,
file
deciding on
organize, or in
it is
we
a little different.
files in
system designer to
how
to create fields
continue to focus on
We look at
ways
to
need to
improve performance.
In the first section
smaller.
we
look
Compression techniques
at
how we
let
us
make
organize
files
files
to
make them
file.
sorting
them
we examine
the
DATA COMPRESSION
85
the
5.1
file.
Data Compression
In this section
reasons for
Use
we
look
making
at
files
some ways
to
make
smaller. Smaller
smaller.
files
Can be
Can be processed
as to take
up
al-
faster sequentially.
way
files
less space.
Many
in a file in
such
Some are very general and some are designed only for
of data, such as speech, pictures, text, or instrument data. The
variety of data compression techniques is so large that we can only touch on
the topic here, with a few examples.
compressing
data.
specific kinds
5.1.1
Remember our
fields,
fields
address
file
such
as these are
good candidates
many
from Chapter
4?
It
could represent
all
file
needed for
required
this field?
two ASCII
bytes, 16 bits.
How
we
we
could
encode all state names in a single one-byte field, resulting in a space savings
of one byte, or 50%, per occurrence of the state field.
This type of compression technique, in which we decrease the number
of bits by finding a more compact notation/ is one of many compression
techniques classified as redundancy reduction. The 10 bits that we were able to
throw away were redundant in the sense that having 16 bits instead of 6
provided no extra information.
name.
we
is
itself a
86
What
compression scheme? In
many:
By
we have made
the
file
unreadable
by humans.
We
whenever we add a new stateand a similar cost for decoding when we need
to get a readable version of state name from the file.
We must also now incorporate the encoding and/or decoding modules in all software that will process our address file, increasing the
complexity of the software.
incur
name
some
field to
our
file,
Sparse arrays of this sort are very good candidates for compression of
we
choose one
special,
file
pixels that
make up
in sequence, except
curs
in succession, substi-
DATA COMPRESSION
187
we
find that
How
We
would we
byte values?
22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24
The
first three pixels are to be copied in sequence. The runs of 24 and 26 are
both run-length encoded. The remaining pixels are copied in sequence. The
resulting sequence is
22 23 ff
24
07 25 ff
26 06 25 24
88
prevent
circumstances,
is
this?)
different
dash ("-").
You
frequently than others, so the codes for those values should take the least
amount of
form of redundancy
reduction.
most
successful
of these
the
is
Huffman
code,
One of the
probabilities of each value occurring in the data set, and then builds a binary
tree in
value.
More
is
indicated.
The
third
row
in the figure
shows
the
Huffman codes
Let ter
Probability:
Code
abed
0.4
0.1
010
0.1
011
0.1
0.1
0.1
0.1
0000
0001
0010
0011
that
would
"abde" would be
DATA COMPRESSION
others, so
it
number of
bits
case as
many
much more
is
1.
This
is
minimum
them, and
89
between
be recognized.
still
all
information in the
original data. In effect, they take advantage of the fact that the data, in
its
removed and
Another type of compression, irreversible
based on the assumption that some information can be
then reinserted
compression,
at a later time.
is
sacrificed.
pixel.
compression
Irreversible
less
is
common
when
be synthesized
at
is
in
is
lost
by
is
of little
voice coding,
5.1.5 Compression
often done
amounts of
distortion.
UNIX
V UNIX
compresses
file,
a file,
signalling to
"^Irreversible
compression
is
sometimes
is
called
reduced.
90
of
files it
as
it is
".Z"
to the
end
5.2
Reclaiming Space
Suppose
that the
record in
new
record
from the
You
a
is
in Files
variable-length record
file is
could append
it
end of the
to the
a way
What do you do with
modified in such
file
and put
rewrite the
whole record
at
(unless the
file
file
pointer
You
could
needs to be
has
Record addition;
Record updating; and
Record deletion.
If the
we
a file
is
cover in
this
when
It
is
only
is
no
when
followed by
When
is
we want
on the
effects
of record deletion.
Storage compaction
there
is
no data
makes
at all,
V UNIX
also support
as
RECLAIMING SPACE
Any
record-deletion
mark
place a special
address
us
for
Chapter
in
4,
we might
show
is
name and
first
address
similar to the one in Chapter 4 before and after the second record
marked
deleted.
as
(The dots
Once we
how
last field
at
to
to
name and
developed
file
strategy
191
IN FILES
is
and 2 represent
is
it is
This
little effort.
is
program
out
is
(Fig.
5.3c).
If there
compaction
is
records.
also possible,
It is
mark
field,
through
file
with
all
in a special
our example.
as in
records happens
some
time,
all at
special
is
a file
FIGURE 5.3 Storage requirements of sample file using 64-byte fixed-length records, (a)
Before deleting the second record, (b) After deleting the second record, (c) After compaction
the second record is gone.
(a)
(b)
(c)
92
The
decision about
how
number of deleted
accounting
it
often
at the
program
makes sense
end of the
fiscal
run
to
year or
some
for
Reclaiming
Space Dynamically
most widely used of the storage
There are some applications, however,
that are too volatile and interactive for storage compaction to be useful. In
these situations we want to reuse the space from deleted records as soon as
possible. We begin our discussion of such dynamic storage reclamation
with a second look at fixed-length record deletion, since fixed-length
records make the reclamation problem much simpler.
In general, to provide a mechanism for record deletion with subsequent
reutilization of the freed space, we need to be able to guarantee two things:
Storage compaction
is
reclamation methods
we
discuss.
We
mark records
as deleted
take the
To make
record reuse
A way
A way
Linked
to
to
Lists
structure in
its
a linked
list
linked
list
file;
and
all
of the
is
a data
successor in the
list.
to
RECLAIMING SPACE
193
IN FILES
list.
If you have a head reference to the first node in the list, you can move
through the list by looking at each node, and then at the node's pointer
field, so you know where the next node is located. When you finally
encounter a pointer field with some special, predetermined end-of-list
value, you stop the traversal of the list. In Fig. 5.4 we use a 1 in the pointer
field to mark the end of the list.
When a list is made up of deleted records that have become available
file,
new
the
list is
list.
When
inserting a
The
Stacks
which
So, if
all
simplest
insertions
we have
way
file,
to handle a
list
managed
as a stack.
RRN
Head
pointer
Head
\
S
RRN 3,
stack
is
a list in
RRN
5
it
list.
record
(3)
RRN
RRN
pointer
at
When
list is
an avail
numbers (RRN)
after
fixed-length record
RRN
is
new node
is
If the
94
This
is
The
list
Now we
it
We
to
to
contains
A way
A way
which
returns to a state in
2.
need
file;
and
to the top
a valid
node
reference,
then
Where do we keep
or
a separate file,
we need
to
structures.
when
is it
we know
where
to find
the stack?
Is it a
reusable slot
is
it.
separate
somehow embedded
list,
perhaps maintained in
file?
Once
again,
The
moved anywhere
we need them,
they are pushed onto the stack. They stay right where
located in the
file.
The
make one
is
next. Since
than with
in the
Suppose we
are
working with
(RRNs
fixed-length record
file
that
once
have been deleted, in that order, and that deleted records are marked by
first field with an asterisk. We can then use the second field of
a deleted record to hold the link to the next record on the avail list. Leaving
out the details of the valid, in-use records, Fig. 5.5(a) shows how the file
might look.
Record 5 is the first record on the avail list (top of the stack) since it is
the record that is most recently deleted. Following the linked list, we see
and
replacing the
that record 5 points to record 3. Since the link field for record 3 contains -1,
which
is
we know
that record 3
is
file after
record
1 is
also deleted.
Note
that
the contents of all the other records on the avail list remain unchanged.
Treating the list as a stack results in a minimal amount of list reorganization
when we push and pop records to and from the list.
If we now add a new name to the file, it is placed in record 1, since
RRN 1 is the first available record. The avail list would return to the
RECLAIMING SPACE
List
head
(first
available record) -* 5
2
Edwards
Bates
195
IN FILES
Wills
*-l
Masters
Masters
*3
Chavez
*3
Chavez
(a)
List
head
(first
available record)
*5
Edwards
Wills
*-l
(b)
List
head
(first
available record)
1st
new
Wills
rec
Edwards
Masters
2nd new
Chavez
rec
(c)
file
showing linked
new
lists of
deleted records,
records 3, 5, and
1,
in
(a)
records.
shown
configuration
the avail
list,
we
avail list
at
is
file.
name
empty and
still
another
is
added to the
that the
name
file,
file
two record
on
slots
without increasing
list
the
new
record
file.
linked avail
We
list
RRN
and
'
96
(or some other special mark) at the beginning of the record as a deletion
mark, followed by the RRN of the next record on the avail list.
Once we have a list of available records within a file, we can reuse the
the
RRN
RRN
appended
we would
of a reusable record
if
no reusable
write
slot,
or
slots are
available.
A way
list (i.e.,
place to
put
a link field);
An
An
algorithm for adding newly deleted records to the avail list; and
algorithm for finding and removing records from the avail list
when we
An
at
regard.
We
a deleted
we
the
first field,
followed by
is,
binary link
we
field
can place
a single asterisk in
their
RRNs,
the links
offsets themselves.
To
illustrate,
suppose
we
begin with
variable-length record
file
Brown
file
introduced
discarded characters.
RECLAIMING SPACE
197
IN FILES
HEAD. FIRST_AVAIL: -1
40 Ames! John! 123 Maple Stillwater OK !74075 64 Morrison Sebastian
!9035 South Hillcrest Forest Village OK 74820 45 Brown !Martha 62
5 Kimbark!Des Moines IA 50311
!
(a)
*
-1
45 Brown Martha 62
!
sample
file
cluded), (b)
stored
in
Sample
(a) Original
in-
characters).
list
of adding
moment we
we
find that
it
is
sometimes useful
It is
the
new
possible,
record
at
file.
98
Size
Removed record
72
(b)
FIGURE 5.7 Removal of a record from an avail list with variablelength records, (a) Before removal, (b) After removal.
Since this procedure for finding a reusable record looks through the
entire avail
list
if
necessary,
we do
not need
somewhere on
it.
list,
a sophisticated
If a
list,
just as
method
for
is
follows that
It
the
this
list.
is left
The
of our three-record
we
file
use as
padding between the last field and the end of the records. The padding is
wasted space; it is part of the cost of using fixed-length records. Wasted
space within
Clearly,
record
is
we want
to
we
records,
file
If
we
are
RECLAIMING SPACE
IN FILES
99
record.
a certain
file
is
as close as possible to
actual data
is
fixed in length,
One of the
choosing
in a fixed-length record
is
that they
file.
minimize
wasted space by doing away with internal fragmentation. The space set
aside for each record is exactly as long as it needs to be. Compare the
fixed-length example with the one in Fig. 5.9, which uses the variablelength record structure
a byte count followed by delimited data fields.
The only space (other than the delimiters) that is not used for holding data
in each record is the count field. If we assume that this field uses two bytes,
this amounts to only six bytes for the three-record file. The fixed-length
record file wastes 24 bytes in the very first record.
But before we start congratulating ourselves for solving the problem of
wasted space due to internal fragmentation, we should consider what
happens in a variable-length record file after a record is deleted and replaced
with a shorter record. If the shorter record takes less space than the original
record, internal fragmentation results. Figure 5.10 shows how the problem
FIGURE 5.10 Illustration of fragmentation with variable-length records, (a) After deletion of
the second record (unused characters in the deleted record are replaced by periods), (b) After
the subsequent addition of the record for Al Ham.
40 Ames
*
[
-1
45 Brown Martha 62
]
(a)
HEAD. FIRST_AVAIL: -1
40 Ames John; 123 Maple Stillwater OK 74075
OK J70332;
5 KimbarkiDes Moines IA 50311
!
(b)
200
HEAD. FIRST_AV'AIL: 43
FIGURE 5.1 1 Combatting internal fragmentation by putting the unused part of the deleted
back on the avail list.
slot
is
when
file
file is
deleted
added:
Ham|Al|28 E lm|Ada|OK|70332|
appears that escaping internal fragmentation
It
we
is
is
is
record. Since
37 bytes
list
Since
for the
Ham
record,
there
form
new
The 35
bytes
on the
still
Ham
avail
record.)
list
Figure 5.12 shows the effect of inserting the following 25-byte record:
Lee|Ed|Rt
2|Ada|OK|74820|
As we would expect, the new record is carved out of the 35-byte record that
is on the avail list. The data portion of the new record requires 25 bytes, and
slot originally
leted record.
HEAD,
FIRST AVAIL: 43
1
-1 ... 25 Lee Ed
40 Ames John 123 Maple Stillwater OK 74075 8 *
Rt 2 Ada OK 74820 26 Ham Al 28 Elm Ada OK 70332 45 Brown Martha 6
25 Kimbarkl Des Moines IA 50311
;
RECLAIMING SPACE
then
in the record
still
on the
avail
201
IN FILES
field.
list.
What are the chances of finding a record that can make use of these eight
Our guess would be that the probability is close to zero. These eight
bytes?
bytes are not usable, even though they are not trapped inside any other
The
record. This
is
the avail
list
space
is
actually
is
on
too
fragmented to be reused.
There are some interesting ways to combat external fragmentation.
One way, which we discussed at the beginning of this chapter, is storage
compaction. We could simply regenerate the file when external fragmentation becomes intolerable. Two other approaches are as follows:
If two record slots on the avail list are physically
them to make a single, larger record slot. This is
adjacent,
combine
Try
provides
on the
discussion
developing
is
of
if there are
no reason
avail
this
to
two
list is
presume
Exercise 15
list.
slot
at
framework
for
a solution.
The development of
warrants
matter.
It
among
alternative strategies
is
a topic that
is
not
however,
is
a different
obvious
as
it
might seem
at first
glance.
we
discussed
ways
to
We
list.
202
or whether
We
it is
a perfect
is
what
is
needed
fit.
records
looking for the place to insert the record to maintain the desired sequence.
If we order the avail list in ascending order by size, what is the effect on
the closeness of
fit
list?
Since the
between the
available slot
goes on.
The procedure
only
for
at the first
enough
list.
If the first
so
it
looks
record slot
is
not
large
to
203
What can you conclude from all of this? It should be clear that no one
placement strategy is superior for all circumstances. The best you can do is
formulate a series of general observations and then, given a particular design
seems most appropriate. Here are
have to be yours.
some
'
suggestions.
The judgment
will
files.
With fixed-length
to volatile, vari-
records, placement
is
sim-
&x
If space
first fit
5.3
accessing
magnify
that the
disk access
would
take 58 days.
So far we have not had to pay much attention to this cost. This section,
then, marks a kind of turning point. Once we move from fundamental
organizational issues to the matter of searching a file for a particular piece of
information, the cost of a seek becomes a major factor in determining our
And what is true for searching is all the more true for sorting. If
you have studied sorting algorithms, you know that even a good sort
involves making many comparisons. If each of these comparisons involves
approach.
a seek,
the sort
Our
is
agonizingly slow.
We
beyond simply
ways
to order
major
is
in
Files
any other
to retrieve or find
204
to look for
is
it
by
relative record
number (RRN). If the file has fixed-length records, knowing the RRN
us compute the record's byte offset and jump to it using direct access.
But what if we do not
want? How likely is it that
"What
RRN
of the record
we
RRN
is
much more
question
know
lets
is
know
likely to
more
"What
is
by
its
We
file is
of 1,000
KELLY BILL
by comparing
start
file
canonical form of the search key) with the middle key in the
file,
(the
which
is
whose RRN is 500. The result of the comparison tells us which half
of the file contains Bill Kelly's record. Next, we compare KELLY BILL with
the middle key among records in the selected half of the file to find out
the key
file Bill
Kelly's record
is
found or
we
is in.
This process
is
repeated
is
shown
comparisons to find
that
it is
not in the
is
in Fig. 5.13.
file.
Compare
this
w ith
T
An
algorithm for
in the
a sequential
at
most 10
or to determine
file,
of
a file
at
most
"
|_log FtJ
comparisons 1
"In this text, log x refers to the logarithm function to the base
intended,
it
is
so indicated.
2.
When
is
/*
205
*/
FUNCTION:
LOW :=0
HIGH := REC0RD_C0UNT
KEY_S0UGHT, REC0RD_C0UNT
/*
-
/*
find
/*
*/
1
*/
midpoint
*/
if
I*
/*
*/
*/
/
*/
else
return(GUESS)
endwh i
/*
*/
return (-1)
/*
if
loop completes,
)
function
in
found
*/
pseudocode.
A binary search is
wj
+-
comparisons.
therefore said to be
of the same
O (log n).
file
In contrast,
requires at
most
you may
recall
n comparisons,
and
on average
Vi n,
would
take at
most
1
-I-
[Jog 2,000J
=11
comparisons,
206
whereas
sequential search
-n
would average
=
1,000 comparisons,
File in
RAM
Consider the operation of any internal sorting algorithm with which you
The algorithm requires multiple passes over the list that is to be
sorted, comparing and reorganizing the elements. Some of the items in the
list are moved a long distance from their original positions in the list. If such
an algorithm were applied directly to data stored on a disk, it is clear that
there would be a lot of jumping around, seeking, and rereading of data.
This would be a very slow operation
unthinkably slow.
are familiar.
If the entire
alternative
is
contents of the
file
way we
can access
can be held in
sort.
it
RAM,
into
very attractive
having to incur the cost of a lot of seeking and the cost of multiple passes
over the disk.
This is one instance of a general class of solutions to the problem of
minimizing disk usage: Force your disk access into a sequential mode,
performing the more complex, direct accesses in RAM.
Unfortunately, it is often not possible to use this simple kind of
solution, but when you can, you should take advantage of it. In the case of
sorting, internal sorts are increasingly viable as the
increases.
which
in
good
sorts files in
Chapter
7.
illustration
RAM if
it
of an internal sort
is
UNIX
utility
sort utility,
is
described
207
Problem
Two
1:
Accesses
|_log
RRN
approach
performance,
retrieval
while
maintaining
still
the
Problem
we
2:
begin to look
Keeping
at
ways
to
a File Sorted Is
move toward
Very Expensive
often as
we
we
are
this ideal.
it:
working with
We must keep
a file to
we
Our
the
file
ability to
in sorted
leave the
in
file
unsorted
If,
as
an alternative,
substantially
on the
But we encounter
all
we
keep the
file
in sorted order,
difficulty
when we add
it
a record, since
new
we
can cut
to a handful
we want
down
of accesses.
file
to
keep
requires,
we
on the average,
that
actually
unsorted
The
file.
costs
of maintaining
a file that
it is
is
required
208
much more
frequently than
is
more than
As another example,
the
This can be an
circumstance, the
of keeping the
many
file
which record
additions can be accumulated in a transaction file and made in a batch mode.
By sorting the list of new records before adding them to the main file, it is
possible to merge them with the existing records. As we see in Chapter 7,
such merging is a sequential process, passing only once over each record in
file.
So, despite
appears to be
searching also
its
there are
efficient, attractive
us see
what
However, knowing
to the
file.
useful strategy.
lets
applications in
have to meet
will
at least
They
flr
y^iew record
is
file
They
tially
more
we
indexes.
They can
of the
file.
fall
Problem
3:
An
Internal Sort
works only
sort
if
is
we
is
we
An
file
ability
internal
into the
cannot do
that,
we
then
so large that
in order.
Our
file.
file
Files
into each of
a variation
large a
file it
its
limit
is
larger.
is
on
internal sorting
limited in terms of
how
new approach
to the
problem of finding
5.4
when
added; and
file.
Keysorting
Keysort,
sort a
keys;
sometimes referred
to as tag sort,
is
when we
RAM the only things that we really need to sort are the record
into RAM during the
therefore, we do not need to read the whole
file in
file
we
file
into
RAM,
sort
KEYSORTING
in
the
according to the
file
files
than
209
new
memory,
it
same amount of
RAM.
5.4.1 Description of the Method
To keep
record
things simple,
file
RAM
in
by internal sort
**
KEYNODES
array
Records
RRN
KEY
HARRISON SUSAN
Harrison Susan
KELLOG BILL
KelloglBilll 17 Maple...
HARRIS MARGARET
Harris
BELL ROBERT
Bell! Robert
In
RAM
387 Eastern...
8912
Hill...
On secondary store
210
%/
3 -^7*
.
KEYNODES array
Records
RRN
KEY
BELL ROBERT
Harrison Susan
HARRIS MARGARET
KelloglBUIl 17 Maple...
HARRISON SUSAN
Harris
KELLOG BILL
In
Bell
RAM
Robert 8912
I
Hill...
On secondary store
in Fig. 5.15.
way
387 Eastern..
KEYNODES
array and
file
file,
are
now
RAM
sequenced in such
according to
for
i:
to
number of records
file
to the record
whose
RRN
is
KEYNODES[i].RRN.
Read
this
record into
buffer in
RAM.
file.
cedure works
much
the
same way
that a
normal
internal sort
would work,
RAM
j card
it;
and
When we
read
them
all
we have
stored in
to
RAM.
211
KEYSORTING
PROGRAM: keysort
open input file as
N F LE
create output file as DUT_FILE
I
REC_CDUNT
/*
for
read in records;
1
:=
to
up KEYNODES array */
set
REC_C0UNT
KEY
KEY
thereby ordering RRNs correspondingly */
sort KEYNODESC
sort (KEYNODES, REC_C0UNT)
/*
/*
/*
for
out
i
:=
to
REC_C0UNT
close
end PROGRAM
*/
*/
].
RRN
for keysort.
fUUJ&
?/w>
file in
RAM
at
once.
But, while reading about the operation of writing the records out in
sorted order, even a casual reader probably senses a cloud on this apparently
before
we
desirable.
Look
them out
to the
input
sequentially. Instead,
file
is
worse than
that.
carefully at the for loop that reads in the records before writing
new
file.
You
KEYNODES[]
and read
we
are
to the
it
we
RRNs
through the
in sorted order,
moving
we have
working
in before writing
it
21 2
sorted
file
requires as
many random
As we have noted
records.
number of
difference
to
times,
read
all
What
is
worse,
we
file as
there
is
the records in a
same records
are
there are
an enormous
if
performing
file.
file
we must
of these
all
file,
The getting-something-for-nothing
by the
KEYNODES[]
array
not
is
at all a
file
Why
The fundamental
an entire record
when
is
File
an attractive one:
Back?
Why work
with
searching are concerned, are the fields used to form the key? There
is
all
interesting to ask
file
this
giving us trouble:
is
What
if
we just
skip the
two
This
5.17.
If
some
it
altogether.
KEYSORTING
Index
Original
file
file
BELL ROBERT
Harrison
HARRIS MARGARET
Kellogg
HARRISON SUSAN
Harris
KELLOGG BILL
file
Bill
I
213
17
Maple.
file.
talk
access
Much
record.
When a
we
file
particular choice
these
files
of terminology
containing an avail
that cannot
You
you consider
(such as an index
become what
file)
is
file
pinned record
or in
is
some other
one
file
moved,
these references
no longer lead
to the record;
file.
of deleted records.
they
list
if
file
if
can
make
we want
sorting
more
to support rapid
21
access
by key, while
deletion?
One
solution
still
is
to use an index
the
a
file
to
file in its
made
available
Once again,
we need to take
original order.
close look at the use of indexes, which, in turn, leads us to the next
chapter.
SUMMARY
we look at ways to organize or reorganize files to improve
performance in some way.
Data compression methods are used to make files smaller by re-encoding
data that goes into a file. Smaller files use less storage, take less time to
transmit, and can often be processed faster sequentially.
The notation used for representing information can often be made more
compact. For instance, if a two-byte field in a record can take on only 50
values, the field can be encoded using only 6 bits instead of 16. Another
form of compression called run-length encoding encodes sequences of
repeating values, rather than writing all of the values in the file.
A third form of compression assigns variable-length codes to values
depending on how frequently the values occur. Values that occur often are
given shorter codes, so they take up less space. Huffman codes are an example
of variable-length codes.
In this chapter
Some compression
techniques are
The
irreversible in that
UNIX utilities
In
fixed-length
record
file,
when
there
internal
is
variable-length record
file
when one
a record.
record
is
It
when
by record
SUMMARY
when
finding the
first field
of a fixed-length record
all
records in
is
mark
a special
fixed-length record
record
are the
file
same
size,
the reuse
by stringing together
all
form
a linked
list
of
other
added to the
slots are
file,
slot
is
slot;
avail
list is
list
removed from
to treat
it
as a stack.
Newly
way
to
available records
the avail
list.
Next,
we
form
still
linked
of available record
list
We
slots,
we need to be sure that a record slot is the right size to hold the new
Our initial definition of right size is simply in terms of being big
enough. Consequently, we need a procedure that can search through the
avail list until it finds a record slot that is big enough to hold the new record.
records
record.
Given such
on the
deleted records
and
a function,
avail
file
internally if the
develop
new
because
two or more
into
a
lost
is
it
reuse.
locked up inside
is
a record.
on the
much
avail
list.
space as
We
We
record slot
needed for
external fragmentation
There are
fragmentation.
compacting the
level
on the
a
number of
They include
avail
list
to
make
larger,
more
file
in a batch
mode when
reuse in a
way
adopting
minimizes
(3)
that
of the exercises
more
at
careful discussion.
the
record slots
is
left as
215
21 6
The placement
up
strategy used
it
is
to this point
is a first-fit
it."
by the variable-length
By
is
list
is
strategies:
big
available.
The
idea
is
to
slot
be
as
large as possible.
no firm rule for selecting a placement strategy; the best one can do
judgment based on a number of guidelines.
In the third major section of this chapter, we look at ways to find things
quickly in a file through the use of a key. In preceding chapters it was not
There
is
is
use informed
record number.
Now
This
key
develops
chapter
only
one
method of
finding
records
by
comparisons to
(log n)
file
Binary searching
searching, but
per record.
is
it still
The need
in applications
where
usually requires
or
becomes
number of records
two
accesses
especially acute
are to be accessed
by
key.
The requirement
that the file be kept in sorted order can be expenFor active files to which records are added frequently, the cost
of keeping the file in sorted order can outweigh the benefits of binary searching.
sive.
A RAM
files
that
we
on
relatively small
files.
This limits
KEY TERMS
The
third
partially
RAM
sort in
Instead,
most
it
from the
RAM
file.
list
requires
//
is
we merely w rite
T
the sorted
list
of keys off to
list
file.
records,
is
Chapter
6.
elsewhere
(in
same
the
position in the
file.
file
or in
some
other
file)
is
file
likely to
references to this
dangling pointers.
KEY TERMS
Avail
list.
list
chapter, this
list
new
is
linked
list
of deleted
records.
Best fit. A placement strategy for selecting the space on the avail list
used to hold a new record. Best-fit placement finds the available
record slot that is closest in size to what is needed to hold the new
record.
Coalescence.
If
is
is
is
sousht.
two
record space.
a larger
one
217
21
is
known
Coalescence
as coalescing holes.
is
way
to counteract the
of
rid
is
by sliding
between them.
external fragmentation
all
no space
lost
in a file in
such
way
as to
new
record.
a file.
Huffman
mentation.
Irreversible compression. Compression in which information
Keysort.
entire
tie
The keys
used to construct
new
sorted order.
less
cess
Linked
cific
RAM
file
new
file
from which
list
of keys
is
file
than does
of constructing
list.
are sorted,
version of the
that
is
it
requires
The
logical
order of
is lost.
linked
nodes
list is
in the
often
comput-
memory.
Pinned record.
record
is
pinned
it
by
its
when
physical location.
It is
file
pinned in the
to hold a
new
file.
EXERCISES
219
information.
Run-length encoding.
compression method
in
which runs of
number of
re-
repetitions
of
Stack.
kind of
list
same end.
Variable-length encoding. Any encoding scheme
place at
the
are
of different lengths.
More
in
ing
Worst
Huffman encod-
is
fit.
small the
new
record
is.
slot,
regardless of
list.
how
fit
fragmentation.
EXERCISES
our discussion of compression, we show how we can compress the
name" field from 16 bits to 6 bits, yet we say that this gives us a space
savings of 50%, rather than 62.5%, as we would expect. Why is this so?
What other measures might we take to achieve the full 62.5% savings?
1.
In
"state
3. What is the maximum run length that can be handled in the run-length
encoding described in the text? If much longer runs were common, how
might you handle them?
4.
Encode each of
results,
(a)
(b)
and indicate
01
01
01
01
the algorithm.
01 01 01 01 01 01 02 03 03 03 03 03 03 03 04 05 06 06 07
02 02 03 03 04 05 06 06 05 05 04 04
01
5.
From
Fig. 5.2,
6.
What
is
How
What about
external fragmentation?
a file?
220
7.
separate
new
file.
What
compaction compared
compaction
to
in
created?
8.
loss
9.
Why
if
there
is
significant
amount of fragmentation
in a
file.
11.
How
does
range of
in-place
at
12.
variable-length
record. Outline a procedure for handling such updating, accounting for the
containing the
list
deletion?
14. In
some
the record
is
files,
record?
to indicate that
Could
reactivation be
What
is
is
used?
16.
Why
record
do placement
files?
strategies
make
EXERCISES
17.
Compare
Make
a table
how
does this affect the performance of the binary and sequential searches?
19.
An
internal sort
files
small enough to
fit
in
RAM.
files
on systems
that use
virtual storage.
two primary
areas of difficulty:
jump around
Having
to
in the input
to
Design an approach to
this
problem
number of
less
RAM
than would
file;
is
a sort
to be
taking
memory.
Programming Exercises
21.
records to
fixed-length record
file
update. pas so
it
22. Write a
but that
23.
Develop
221
222
make
If there
is
record
Some
things to consider as
you
The
avail
list
avail
list,
you do
b.
possible to
it
merge
these
this,
will
list? If
you use?
low
the
newly deleted
How
record.
will
you look
for a deleted
as
links to
tions
would we encounter
if
list.
What
additional complica-
we were combining
the coalescing of
24.
25.
Modify
number
also
is
not in the
file, it
returns
not.
from
exercise 24 so
it
uses the
new
is
bin_search()
in the
file,
the
program should display the record contents. If the key is not found, the
program should display a list of the keys that surround the position that the
key would have occupied. You should be able to move backward or
forward through this list at will. Given this modification, you do not have
to remember an entire key to retrieve it. If, for example, you know that you
are looking for someone named Smith, but cannot remember the person's
first name, this new program lets you jump to the area where all the Smith
records are stored.
you recognize
You
the right
can then scroll back and forth through the keys until
first
name.
4.
file
of the
FURTHER READINGS
FURTHER READINGS
A
fragmentation and
storage.
garbage collection are considered in the context of reusing space within electronic
random
access
it is
used in electronic
RAM
Some
management
on secondary
in
RAM
when
storage.
are usually
allocation, including
223
Indexing
CHAPTER OBJECTIVES
Introduce concepts of indexing that have broad apfile systems.
file
maintenance.
to
list,
illustrating
bind an index
key
to an
files.
225
CHAPTER OUTLINE
6.1
What
6.2
6.3
Is
an Index?
6.6
Secondary Keys
6.7
Entry-Sequenced
6.7.1
File
6.7.2
6.4
Hold
6.5
in
Too Large
to
Memory
6.1
A
A
What
Is
6.8
Selective Indexes
6.9
Binding
an Index?
The
last
containing
fields. In later
chapters
of structures that
we
look
at
indexing
schemes that use more complex data structures, especially trees. In this
chapter, however, we want to emphasize that indexes can be very simple
and still provide powerful tools for file processing.
The index to a book provides a way to find a topic quickly. If you have
ever had to use a book without a good index, you already know that an
index is a desirable alternative to scanning through the book sequentially to
find a topic. In general, indexing is another way to handle the problem that
we explored in Chapter 5: An index is a way to find things.
Consider what would happen if we tried to apply the previous chapter's
methods, sorting and binary searching, to the problem of finding things in
a book. Rearranging all the words in the book so they were in alphabetical
order certainly would make finding any particular term easier but would
obviously have disastrous effects on the meaning of the book. In a sense, the
terms in the book are pinned records. This is an absurd example, but it
clearly underscores the power and importance of the index as a conceptual
tool. Since
it
works by
indirection, an index
file.
lets
file
Take,
as
as
227
FILE
much
record addition
less
a sorted file.
by
a library.
their titles, or
by
this is to
The
card catalog
paths to a
We
file.
to
variable-length record
files.
Let's begin
provides.
6.2
A Simple Index
to
We
There
are a
number of approaches
variable-length record
file
be preceded by
file
maintenance. This
is
permits
the structure
we
use.
Suppose
initials for
identification
number
Title
Composer or composers
Artist or artists
Label (publisher)
we formed
the record
company
label
combined with
the record
company's
228
Rec.
addr.
32t
77
INDEXING
ID
number
Label
LO\
RCA
2312
2626
Title
Composer(s)
Prokofiev
Maazel
Beethoven
Julliard
Corea
Corea
Beethoven
Giulini
Springsteen
Springsteen
Quartet in
Artist(s)
Sharp Minor
132
167
WAR
ANG
396
COL
DG
MER
COL
DG
442
FF
211
256
300
353
Touchstone
23699
Symphony No.
3795
38358
Nebraska
18807
Symphony No.
Coq d'or Suite
Symphony No.
75016
31809
139201
245
Beethoven
Karajan
Rimsky-Korsakov
Leinsdorf
Dvorak
Bernstein
Violin Concerto
Beethoven
Good News
Sweet Honev
the
tAssume there
is
Ferras
Sweet Honev in
in
Rock
the
Rock
ID number. This
will
make
since
it
should provide
form
canonical
field
For example,
LDN2312
How
we
keyed access to
and then use binary searching?
Unfortunately, binary searching depends on being able to jump to the
middle record in the file. This is not possible in a variable-length record file
there is no
because direct access by relative record number is not possible
way to know where the middle record is in any group of records.
could
organize the
Could we
individual records?
file
sort the
to provide rapid
file
An
alternative to sorting
illustrates
such an index.
is
On the right is
the data
file
file.
Figure 6.3
containing information
about our collection of recordings, with one variable-length data record per
recording. Only four fields are shown (Label, ID number, Title, and
Composer), but
it is
filling
out each
record.
(left justified,
file.
Each key
blank
is
filled)
corresponding
12-character
to a certain Label
ID
in the
Indexfile
Key
229
FILE
Datafile
Reference
Address of
field
record
ANG3795
167
32
LON
COL31809
353
77
RCA
COL38358
211
132
WAR
DG139201
396
167
ANG
DG18807
256
211
COL
FF245
442
256
DG
32
300
MER
300
353
COL
77
396
DG
132
442
LON2312
MER75016
RCA2626
WAR23699
2312
2626 Quartet in
C Sharp Minor
ANG3795
The
full
Violin Concerto
Beethoven
file.
containing the
field
first
139201
Prokofiev
18807
byte
at
file is
number 167
very simple.
It is
in the record
a
file.
fixed-length record
which each record has two fixed-length fields: a key field and a
byte-offset field. There is one record in the index file for every record in the
file
in
data
file.
Note
also
that
the index
is
first
sorted,
ANG3795
is
the
file.
first
file
is
not.
file is
entry
which means that the records occur in the order that they are
entered into the file. As we see soon, the use of an entry-sequenced file can
sequenced,
make record
with
addition and
file
maintenance
kept sorted by
some
much
key.
simpler than
is
the case
230
INDEXING
PROCEDURE retrieve_record(KEY)
find position of KEY in Indexfile /* Probably using binary search */
look up the BYTE_0FFSET of the corresponding record in Datafile
use SEEK() and the byte_offset to move to the data record
read the record from Datafile
end PROCEDURE
FIGURE 6.4 Retrieve _record():
Indexfile.
to the data
matter.
shown
Datafile are
in the
a single
file
procedure retrieve_record(
We
now
are
The index
because
with
it
By
dealing with
file is
two
files
contains
simple
KEY
in Fig. 6.4.
it
the index
from
Although
some
features
is
likely
it is
file
work with
considerably easier to
the data
is
comment:
that deserve
by Label ID
file.
file
file.
a limit
tity.
The
problems
key's uniqueness
is
truncated
away
as
it is
placed in the
6.3
case.
We
the keys
could, for
in Indexfile.
We
using
One of the
long
files
as the
index
is
much more
small
enough
record length
is
consisting of
no more than
short, this
is
not
a
to be held entirely in
a difficult
File
great advantages of
file
is
that record
sorted data
memory.
file as
If the index
files
231
is
met and
when
the index
is
INDEX[
is
].
read from
Later
too large to
fit
we
into
memory.
Keeping the index in memory as the program runs also lets us find
records by key more quickly with an indexed file than with a sorted one
since the binary searching can be performed entirely in memory. Once the
byte offset for the data record
is
hand, requires
found, then
The use of
a single
sorted data
is
file
that
is
coupled with
of different
all
on the other
file,
seek
number
the following:
Add
files;
into
file
file
file
and index;
file; and
Update records
in the data
file.
empty
files,
quite easily
Both the index file and the data file are created as
with header records and nothing else. This can be accomplished
by creating the files and writing headers to both files.
small
to
memory, so we define an array INDEX[ ] to
hold the index records. Each array element has the structure of an index
enough
fit
into primary
file
into
memory,
then,
is
simply
matter of
reading in and saving the index header record and then reading the records
file
into the
INDEX[
and since the records are short, the procedure should be written so it
reads a large number of index records at once, rather than one record at a
read,
time.
procedure rewrite_index(
232
INDEXING
PROCEDURE rewrite_index(
check a status flag that tells whether the INDEX [] array
has been changed in any way.
if there were changes, then
open the index file as a new empty file
update the header record and rewrite the header
write the index out to the newly created file
It is
if this
does not take place, or takes place incompletely. Programs do not always
run to completion.
failures, against the
to
wrong
One of the
time, and
memory and then writing it out when the program is over is that
copy of the index on'disk will be out of date and incorrect if the program
interrupted. It is imperative that a program contain at least the following
index into
the
is
two safeguards
Q'
There should be
when
the index
of
error:
mechanism
is
out of date.
One
program
to
know
soon
as the
set,
If a
program
know
is
out of date.
is
have access to a procedure that reconstructs the index from the data
file. This should happen automatically, taking place before any attempt is made to use the index.
Record Addition
Adding
new
file
requires that
we
add a record to the index file. Adding to the data file itself is easy. The
exact procedure depends, of course, on the kind of variable-length file
also
233
when we add
know
file
data record
we
should
location at
new
In a
Record Deletion
In
Chapter 5
we
describe a
files
number of approaches
to
of the
space occupied by these records. These approaches are completely viable for
our data file since, unlike a sorted data file, the records in this file need not
be moved around to maintain an ordering on the file. This is one of the great
advantages of an indexed file organization: We have rapid access to
individual records by key without disturbing pinned records. In fact, the
indexing
Of
when we
course,
delete a record
contained in
and shifting
we might mark
Record Updating
Q
we
Record updating
falls
into
two
categories:
by an
implemented while
that he or she
The update
is
still
way
to think
of
this
kind of change
is
as a
merely changing
a record.
file,
234
INDEXING
6.4
address of
in the byte_offset
in
Memory
too
following disadvantages:
Binary searching of the index requires several seeks rather than takmemory speeds. Binary searching of an index
on secondary storage is not substantially faster than the binary
searching of a sorted
file.
Index rearrangement due to record addition or deletion requires shifting or sorting records on secondary storage. This is literally millions
of times more expensive than the cost of these same operations when
performed in electronic memory.
Although these problems are no worse than those associated with the
file that is sorted by key, they are severe enough to warrant the
use of any
consideration of alternatives.
in
A
A
Any
time
simple index
consider using
such
is
as a B-tree, if
top priority; or
you need
These alternative
is
file
the flexibility of
access.
chapters that follow. But, before writing off the use of simpje indexes on
keyed access
it
file.
The index
provides the service of associating a fixed-length and therefore binary-searchable record with each variable-length data record.
235
If the
file
records, sorting and maintaining the index can be less expensive than
would be
,
cause there
is
less
information to
There
one
that
is
file.
move around
moving
file,
This
is
in the
simply be-
index
file.
we have
It,
in itself, can
simple indexes even if they do not fit into memory. Remember the analogy
between an index and a library card catalog? The card catalog provides
multiple views or arrangements of the library's collection, even though
there is only one set of books arranged in a single order. Similarly, we can
use multiple indexes to provide multiple views of a data
6.5
file.
One
business
using
is
key such
as
at this
who would
DG18807? What
want
is
point
is,
the
Symphony No.
9 record
by Beethoven."
our analogy between our index and a library card
Suppose we think of our primary key, the Label ID, as a kind of
catalog number. Like the catalog number assigned to a book, we have taken
care to make our Label ID unique. Now, in a library it is very unusual to
begin by looking for a book with a particular catalog number (e.g., "I am
looking for a book with a catalog number QA331T5 1959."). Instead, one
generally begins by looking for a book on a particular subject, with a
particular title, or by a particular author (e.g., "I am looking for a book on
functions," or "I am looking for The Theory of Functions by Titchmarsh.").
Given the subject, author, or title, one looks in the card catalog to find the
primary key, the catalog number.
Let's return to
catalog.
card catalog
Composer
number (primary
Along with
this
key), so can
we build
an index
is
that relates
library,
file
in a library.
In a
once you have the catalog number you can usually go directly to the
236
INDEXING
Composer index
Secondary key
Primary key
BEETHOVEN
ANG3795
BEETHOVEN
DG139201
BEETHOVEN
DG18807
key
BEETHOVEN
RCA2626
^4 5i^i9^ r/^.
COREA
WAR23699
DVORAK
COL31809
PROKOFIEV
LON2312
RIMSKY-KORSAKOV
MER75016
SPRINGSTEEN
COL38358
FF245
book
**>
p^;TF6r^'^t'
ja>
number.
since the
secondary indexes.
Record Addition
the
file
means adding
When
a
secondary index
is
The
237
end PROCEDURE
FIGURE 6.7 Search _on_secondary: an algorithm to retrieve a single record from Datafile
through a secondary key index.
is
changed
memory and
there.
Note that the key field in the secondary index file is stored in canonical
form (all of the composers' names are capitalized), since this is the form that
we want to use when we are consulting the secondary index. If we want to
print out the name in normal, mixed upper- and lowercase form, we can
pick up that form from the original data file. Also note that the secondary
keys are held to a fixed length, which means that sometimes they are
truncated. The definition of the canonical form should take this length
restriction into account if searching the index is to work properly.
One
primary
sample
index illustrated in Fig. 6.6, there are four records with the key
BEETHOVEN. Duplicate keys are, of course, grouped together. Within
this group, they should be ordered according to the values of the reference
index
is
fields. In this
Record Deletion
become
we
Deleting
file
all
refer-
238
INDEXING
Consequently, deleting
record
open by
left
the remaining
deletion.
were no longer
The
record problem.
would be pointing
and subsequent
be associated with different data records.
have carefully avoided referencing actual addresses in the
we
we do
another search,
record
file,
is
But we
This
valid.
that
this
has
condition. In a
check, protecting us
from trying
file is
to
open
is
to us
record-not-found
acts as a
kind of final
no longer
when we
exist.
delete a record
from
when we
index.
If there are a
is
of these indexes
all
especially important
is
waiting
at a
It is
also
important
in
an interactive
complete.
There
is,
of course,
a cost associated
with
Deleted records
files.
file
Record Updating
239
in the data
file.
file,
file
that result in
changing a record's physical location in the file also require updating the
secondary indexes. But, since we are confining such detailed information to
the primary index, data file updates affect the secondary index only when
they change either the primary or the secondary key. There are three
possible situations:
we may have
is
key index so
changed, then
it
stays in
a large
we
that
all
impact
update the
the secondary
indexes. This involves searching the secondary indexes (on the un-
6.6
title
all
poser);
and
can
as
Now we
240
INDEXING
Title index
Primary key
Secondary key
MER75016
GOOD NEWS
FF245
NEBRASKA
COL38358
QUARTET
IN C
SHARP M
RCA2626
LON2312
SYMPHONY NO.
ANG3795
SYMPHONY NO.
COL31809
SYMPHONY NO.
DG18807
TOUCHSTONE
WAR23699
VIOLIN CONCERTO
DG139201
Find
all
title).
What
is
more
interesting,
however,
is
that
we
request that combines retrieval on the composer index with retrieval on the
title index, such as: Find all recordings of Beethoven's Symphony No. 9.
Without the use of secondary indexes, this kind of request requires a
sequential search through the entire file. Given a file containing thousands,
or even just hundreds, of records, this is a very expensive process. But, with
the aid of secondary indexes, responding to this request is simple and quick.
We begin by recognizing that this request can be rephrased as a Boolean
AND operation,
two
"SYMPHONY NO.
file:
9'
241
We begin
list
ANG3795
DG139201
DG18807
RCA2626
Next we
that
search the
title
SYMPHONY NO.
have
9 as the
key:
title
ANG3795
COL31809
DG18807
Now we
combining the
in the
output
lists
which
is
match operation,
lists
are placed
list.
Composers
ANG3795
DG139201
DG 18807
RCA2626
We
AND,
members
so only the
Titles
ANG3795
CDL31809
>DG 18807
>
Matched list
>ANG379S
>DG18807
'
in
Chapter
7.
once
Finally,
we
file
we have
the
list
in
both
lists.
can proceed to the primary key index to look up the addresses of the data
records.
ANG
DG
',
',
This
3795
18807
is
useful in a
Then we can
!
Symphony No.
Symphony No.
',
Beethoven
Beethoven
Guilini
Karajan
secondary indexes,
in
way
them
9
9
order by
we have
title,
file
We can look at
242
INDEXING
Using the computer's ability to combine sorted lists rapidly, we can even
combine different views, retrieving intersections (Beethoven AND Symphony No. 9) or unions (Beethoven OR Prokofiev OR Symphony No. 9) of
these views. And since our data file is entry sequenced, we can do all of this
without having to sort data file records, confining our sorting to the smaller
index records which can often be held in electronic memory.
Now
indexes,
we have
that
we
can look
a general idea
at
ways
to
6.7
structures that
we have
developed so
far result in
two
distinct difficulties:
We
have to rearrange the index file every time a new record is added
to the file, even if the new record is for an existing secondary key.
For example, if we add another recording of Beethoven's Symphony
No. 9 to our collection, both the composer and title indexes would
have to be rearranged, even though both indexes already contain entries for secondary keys (but not the Label IDs) that are being added.
If there are duplicate secondary keys, the secondary key field is repeated for each entry. This wastes space, making the files larger than
necessary. Larger index files are less likely to be able to fit in elec-
tronic
6.7.1 A
One
memory.
First
Attempt
at a Solution
structure so
example,
it
we might
use
difficulties
associates an array
BEETHOVEN
ANG3795
a single
DG139201
secondary key,
DG18807
as in
RCA2626
Figure 6.9 provides a schematic example of how such an index would look
if
file.
243
Secondary key
BEETHOVEN
ANG3795
COREA
WAR23699
DVORAK
COL31809
PROKOFIEV
LON2312
RIMSKY-KORSAKOV
MER75016
SPRINGSTEEN
COL38358
FF245
DG139201
RCA2626
DG18807
each secondary
key.
The major
solution of our
is
toward the
file
every time a
new
record
is
added
to the data
file.
the recording
ANG
36193
we need
to
Piano Concertos
and
Prokofiev
Francois
PROKOFIEV
ANG36193
LON2312
Since we are not adding another record to the secondary index, there is no
need to rearrange any records. All that is required is a rearrangement of the
fields in the existing
244
INDEXING
Although
this
secondary index
it
new
file
so often,
it
a given key.
very likely case that more than four Label IDs will go with some key,
need a mechanism for keeping track of the extra Label IDs.
In the
we
does help avoid the waste of space due to the repetition of identical keys,
this
at a potentially
high
cost.
By
we might
easily lose
by not repeating
more
more
reference fields,
we
gained
identical keys.
we don't want to waste any more space than we have to, we need
whether we can improve on this record structure. Ideally, what we
would like to do is develop a new design, a revision of our revision, that
Since
to ask
new
file;
Allows more than four Label IDs to be associated with each secondary key; and
Does away w ith the waste of space due to internal fragmentation.
T
List of
References
important: that
Our
we
with
list
PROKOFIEV
collects together a
list
ANG36193
LON2312
245
of primary
key references
Lists
BEETHOVEN
ANG3795
COREA
DG139201
DVORAK
DG18807
PROKOFIEV
RCA2626
WAR23699
COL31809
LON2312
FIGURE 6.10 Conceptual view of the primary key reference fields as a series of
Similarly,
adding two
new Beethoven
list
recordings
lists.
enough space
IDs for each secondary key, the lists could contain hundreds of references,
if needed, while still requiring only one instance of a secondary key. On the
other hand,
if a list requires
internal fragmentation.
the
file
Most important of
all,
we need
can
we
set
way
is
index so
We
lost to
to rearrange only
of secondary keys
How
is
files?
file.
lists,
each of
The
simplest
fields
number of
secondary key
the
first
field,
and
corresponding
246
INDEXING
primary key reference (Label ID) in the inverted list. The actual primary key
references associated with each secondary key would be stored in a separate,
entry-sequenced file.
Given the sample data we have been working with, this new design
would result in a secondary key file for composers and an associated Label
ID file that are organized as illustrated in Fig. 6.11. Following the links for
the list of references associated with Beethoven helps us see how the Label
ID List file is organized. We begin, of course, by searching the secondary
key index of composers for Beethoven. The record that we find points us
to relative record
fixed-length
file,
looks like
this:
ANG3795
The
of 1. As
a value
DG18807
DG139201
in
our
earlier
RCA2626
programs,
we
36193
ANG
You
last
one
record
Piano Concertos
and
Prokofiev
and Label
Francois
can see (Fig. 6.11) that the Label ID for this new recording is the
ID List file, since this file is entry sequenced. Before this
in the Label
is
added, there
is
It
has a Label
ID of
lists
Since
The only time we need to rearrange the Secondary Index file is when
a new composer's name is added or an existing composer's name is
it was misspelled on input). Deleting or adding recomposer who is already in the index involves changing only the Label ID List file. Deleting all the recordings for a composer could be handled by modifying the Label ID List file, while
changed
(e.g.,
cordings for
file
247
Improved
revision of the
composer index
Label
BEETHOVEN
ID
LON2312
COREA
RCA2626
DVORAK
WAR23699
PROKOFIEV
ANG3795
RIMSKY-KORSAKOV
COL38358
SPRINGSTEEN
DG18807
MER75016
COL31809
DG139201
FF245
ANG36193
10
FIGURE 6.1
of
the task
is
list
of entries for
this
empty.
is
is
lists of
composer
List file
we do need
now since
quicker
to rearrange the
Secondary Index
file,
smaller.
Since there
is
less
it
in
RAM
less
is
files
off
of
on
sec-
tures.
[j
The Label ID
J
^
needs to be sorted.
List file
is
it
never
very easy to
Chapter
5.
*^T
7&
248
INDEXING
There
also at least
is
one potentially
such "togetherness"
such
as this,
it
is
is
with
locality;
a linked,
entry-sequenced structure
logical
file
structure.
One obvious
List file in
memory.
is
to
many
secondary indexes, except for the interesting possibility of using the same
Label ID List file to hold the lists for a number of Secondary Index files.
Even if the file of reference lists were too large to hold in memory, it might
be possible to obtain
the
file
in
memory
as
memory
at a time,
Several exercises
at
of
fundamental to the design of B-trees and
other methods for handling large indexes on secondary storage.
dividing the index into pages
6.8
is
Selective Indexes
Another interesting feature of secondary indexes
divide a
file
providing
into parts,
is
selective view.
For example,
titles
of
it
is
classical
we
combined
"List
all
into
Boolean
AND
as,
BINDING
6.9
249
Binding
systems that
utilize
physical address of
its
indexes
is:
At what point
in
time
is
the hey
bound
file
to the
associated record?
In the file
binding of our
constructed.
Binding at the time of the file construction results in faster access. Once
you have found the right index record, you have in hand theT>yte offset of
the data record you are seeking. If we elected to bind our secondary keys to
their associated records at the time of file construction, so when we find the
record in the composer index we would know immediately that
the data record begins at byte 353 in the data file, secondary key retrieval
would be simpler and faster. The improvement in performance is
particularly noticeable if both the primary and secondary index files are used
on secondary storage rather than in memory. Given the arrangement we
designed, we would have to perform a binary search of the composer index
and then a binary search of the primary key index before being able to jump
to the data record. Binding early, at file construction time, does away
at*?\Mflfi|
entirely with the need to search on the primary key.
The disadvantage of binding directly in the tile, of binding tightly, is that
reorganizations of the data file must result in modifications to all bound
DVORAK
index
deleted.
we set up, associating the secondary keys with reference fields consisting of
primary keys allows the primary key index to act as a kind of final check of
whether a record is really in the file. The secondary indexes can afford to be
wrong. This situation is very different if the secondary index keys are
bound, containing addresses. We would then be jumping directly
from the secondary key into the data file; the address would need to be
tightly
right.
250
INDEXING
binding
is
most
attractive
when
The
data
file is static
little
or no adding, de-
leting,
is
high priority.
For example, tight binding is desirable for file organization on a massproduced, read-only optical disk. The addresses will never change since no
new records can ever be added; consequently, there is no reason not to
obtain the extra performance associated with tight binding.
For
file
applications in
which record
is
additional
work
performance
is
file
more
SUMMARY
We
began
this
sorting as a
sorting,
way of structuring
indexing
permits
variable-length record
addition, deletion,
and
files.
perform binary
to
If the
file
an alternative to
searches
for
keys
in
retrieval can
indexed, entry-sequenced
be done
much more
quickly with an
file.
Indexes can do
methods based
us
is
a file
do
more desirable
usually makes these
usually the
SUMMARY
allows us to regard
a collection
records in a data
file.
files
We
lists
in
a library
author order,
card catalog
title
order, or
find that
of books
we
file,
we
but that
views.
In this chapter
indexes of
two
The need
The need
added
we
how
liabilities:
and
to the data
is
file.
of reference
fields
overly large
amount of
locality: Lists
among
most
the
lists,
files.
The concepts
of secondary indexes and inverted lists become even more powerful later, as
we develop index structures that are themselves more powerful than the
simple indexes that we consider here. But, even so, we already see that for
small files consisting of no more than a few thousand records, approaches
to inverted lists that rely merely on simple indexes can provide a user with
a great deal
of capability and
flexibility.
251
252
INDEXING
KEY TERMS
Binding. Binding takes place when
physical record in the data
key
is
file.
program execution.
In the
is
postponed
file.
in
file
An
index
a
is
key and
a partic-
actually retrieved
of program execution.
Entry-sequenced
Index.
between
is
a tool for
in the order
file.
finding records in
a file. It consists
of
key
this
Key
field.
The key
the canonical
field is the
lists.
that
is
being sought.
a file
in a
mance, since records that are in the same physical area can often be
brought into memory with a single read request to the disk.
Reference field. The reference field is the portion of an index record
that contains information about
where
taining the information listed in the associated key field of the index.
Selective index.
view of
file.
a specific subset
Simple index.
of the
file's
of
a
records.
all
built
linear
EXERCISES
EXERCISES
1.
Until now,
it
variable-length record
With
Does
file.
fixed-length record
mean
this
perform
to
binary search on
file it
that indexing
files?
2.
Why
is title
chapter? If
it
not used
were used
as a
as a
be considered in deciding on
3.
What
is
titles?
Explain
how
5.
When
record in
is
may
a data file
may
or
file.
whether the file has fixed- or variable-length records, and depending on the
type of change made to the data record. Make a list of the different updating
situations that can occur, and explain how each affects the indexes.
to the recordings
LON
1259
file,
Fidelio
Beethoven
and when
7.
What
is
8.
How
an inverted
list,
is it
Maazel
useful?
recording
LON
1259
Fidelio
Beethoven
Maazel
Suppose you have the data file described in this chapter, greatly
a primary key index and secondary key indexes organized
by composer, artist, and title. Suppose that an inverted list structure is used
to organize the secondary key indexes. Give step-by-step descriptions of
how a program might answer the following queries:
a. List all recordings of Bach or Beethoven; and
b. List all recordings by Perleman of pieces by Mozart or Joplin.
9.
expanded, with
253
254
10.
INDEXING
One
inverted
to use the
of the secondary index files. This increases the likelihood that the secondary
indexes can be kept in primary memory. Draw a diagram of a single Label
ID List file that can be used to hold references for both the secondary index
of composers and the secondary index of titles. How would you handle the
difficulties that this arrangement presents with regard to maintaining the
Label ID List
file?
of
Leave space for multiple references for each secondary key (Fig. 6.9).
Allocate variable-length records for each secondary key value, where
each record contains the secondary key value, followed by the Label
IDs, followed by free space for later addition of new Label IDs. The
amount of free space left could be fixed, or it could be a function of
the size of the original
12.
file
list
of Label IDs.
speed and
flexibility.
two important
affect
attributes
of a
and the effect of binding time on them, for a hospital patient information
system designed to provide information about current patients by patient
name, patient ID, location, medication, doctor or doctors, and illness.
Implement the
retrieve_record( )
deciding
how many
procedure outlined in
mechanism for
each record. At least
Fig. 6.4.
a
Jump
to the
byte_ofifiset,
field,
Build an index
file
true size
Datafile.
Use
many
c.
file
to decide
how
bytes to read.
as in
option
(b),
except use a
Jump
to the byte_offset
and read
Once
many
a fixed,
overly large
number of
memory
how
EXERCISES
Implement procedures
15.
to the
index
When
16.
some of
first
INDEXf
array
we do
we want to
the keys,
secondary key;
the
to read in
file.
given
all
of the
records for the given key. Write a variation of a binary search function that
returns the relative record
key.
17. If a Label
be held in
ID
List file
memory
in
the
first
as the
entirety,
if the
such
its
performance by retaining
number of
it
blocks are called pages. Since the records in the Label ID List
file
are each 16
function
would hold the most recently used eight pages in memory. Calls for a
specific record from the Label ID List file would be routed through this
function. It would check to see if the record exists in one of the pages that
is already in memory. If so, the function would return the values of the
record fields immediately. If not, the function would read in the page
containing the desired record, either writing out or dumping the page that
was used least recently. Clearly, if a page has been changed, it needs to be
written out rather than dumped. When the program is over, all pages still
in memory must be checked to see if they should be written out.
that
as
file is
file
entry sequenced,
is
there
19.
The Label ID
schemes
is
List file
is
files
than for
files
a sorted index,
such
the page in
which
as the
number of pages
How
it
belongs
is
full?
255
256
INDEXING
FURTHER READINGS
Wc
have
much more
where we take up
file
organizations.
the
The
The few
files,
texts that
we
list
by many other
file
Ullman (1980).
Tremblay and Sorenson
provides
users.
a similar discussion,
M.
list
structures
Loomis (1983)
along with some examples oriented toward COBOL
multilist
files.
lists
in
E.
S.
Cosequential Processing
and the Sorting of
Large Files
CHAPTER OBJECTIVES
Describe
tivities
a class
known
as cosequential processes.
all
vari-
number of
and matches.
I
in
RAM.
provides the basis for sorting
files.
disk and
selection.
files
associ-
disks.
Introduce
UNIX
merging, and
cosequential processing.
257
^^
CHAPTER OUTLINE
7.1
Model
for
Implementing
7.5.3
Cosequential Processes
7.5.4
Hardware-based Improvements
7.5.5
Merges
Model
7.5.6
7.2
7.5.7
Model
to the
7.5.8
7.5.9
More
Ledger Program
7.5.10 Effects of
A K-way Merge
A Selective Tree
Algorithm
7.3.2
for Merging
Large Numbers of Lists
7.4
Second Look
7.4.1
at
Sorting in
7.5.11
Reading
More
Multiprogramming
External Sorting
7.6
RAM
7.6.2
Drives?
Processors?
Multiway Merging
7.3.1
Multistep Merging
The Problem
7.3
the File
Size
7.1.1
Processing
7.6.3 Multiphase
Merges
Heap while
Sorting
in the File
7.7
Sort-Merge Packages
7.8
File
7.5
How Much
Time Does
7.8.1
Merge
Sort Take?
7.5.2 Sorting a File That
Is
UNIX
Ten
UNIX
Times Larger
the
sequential
lists to
in a merging, or union,
list.
259
ledger program.
an essential
chapter with
trade-offs,
7.1
A Model
for
information that
we
provide in
reality.
However,
it
is
appearance of simplicity
programs;
7.1.1 Matching
Names
Suppose
we want
Fig. 7.1.
This operation
We
a
in
Two
to output the
is
names common
lists
to the
two
lists
shown
in
list,
Lists
we
We begin by reading in the initial name from each list, and we find that
We output this first name as a member of the match set, or
intersection set. We then read in the next name from each list. This time the
they match.
name
lists
in List 2
is less
visually, as
we
than the
are
in List
1.
now, we remember
that
number of
well.
make
it
work reasonably
260
List
List 2
ADAMS
ANDERSON
ANDREWS
BECH
BURNS
CARTER
DAVIS
DEMPSEY
GRAY
JAMES
JOHNSON
KATZ
PETERS
ROSEWALD
SCHMIDT
THAYER
WALKER
WILLIS
ADAMS
CARTER
CHIN
DAVIS
FOSTER
GARWICK
JAMES
JOHNSON
KARNS
LAMBERT
MILLER
PETERS
RESTON
ROSEWALD
TURNER
lists for
cosequential operations.
Synchronizing:
list is
never so
far
end-of-file conditions:
When we
or
we need
program.
errors:
to halt the
When
list
that a
this
Handling
Recognizing
way
We
to detect
(e.g.,
it
file 1
duplicate
action.
Finally, we would like our algorithm to be reasonably efficient, simple,
and easy to alter to accommodate different kinds of data. The key to
accomplishing these objectives in the model we are about to present lies in
synchronization.
the way we deal with the second item in our list
At each step in the processing of the two lists, we can assume that we
have two names to compare: a current name from List 1 and a current name
from List 2. Let's call these two current names NAME_1 and NAME_2.
We can compare the two names to determine whether NAME_1 is less
NAME_2:
261
If
NAME_1
List
If
NAME_1
List 2;
than
NAME_2, we
name from
is
greater than
NAME_2, we
we
name and
name from
and
names
names from
If the
It
is less
1;
the
two
output the
lists.
turns out that this can be handled very cleanly with a single loop
containing
algorithm in Fig.
returns to
that
1
no extra
gets ahead
logic
of
is
ahead of List
1,
PROGRAM: match
call initialize!) procedure to:
- open input files LIST_1 and LIST_2
- create output file 0UT_FILE
- set MORE_NAMES_EXIST to TRUE
- initialize sequence checking variables
to get NAME_1 from LIST_1
to get NAME_2 from LIST_2
while (MORE_NAMES_EXIST)
if (NAME_1 < NAME_2)
call input () to get NAME_1 from LIST_1
/* match
names are the same */
write NAME_1 to 0UT_FILE
call input () to get NAME_1 from LIST_1
call input () to get NAME_2 from LIST_2
endif
endwhile
finish_up(
else
end PROGRAM
List
or the end-of-file
call input ()
call input()
when
262
condition
The
logic inside
the loop
them. Since
when
the
Note
as
we
are
names
implementing
is
a
a
equally simple.
name; the if
match process
'
Only
else
three
possible
logic handles
all
of
main loop would only obscure the main synchronization logic, they have
been relegated to subprocedures.
Since the end-of-file condition is detected during input, the setting of
the MORE_NAMES_EXIST flag is done in the inputf ) procedure. The
input( ) procedure can also be used to check the condition that the lists be in
strictly
list).
tasks.
The algorithm
in
PROCEDURE:
input
()
input arguments:
INP_FILE
PREVI0US_NAME
M0RE_NAMES_EXIST
:=
FALSE
PREVIOUS_NAME
end PROCEDURE
:=
NAME
*/
*/
263
PROCEDURE:
initialize(
PREV_1
PREV_2
L0W_VALUE
L0W_VALUE
:=
as LIST_1
as LIST_2
MORE_NAMES_EXIST
:=
TRUE
end PROCEDURE
FIGURE 7.4 Initialization procedure for cosequential processing.
of the
input()
would
use.
All
we need now
procedure
initialize(
that
)
to
begins
procedure,
the
shown
is
a description
in Fig. 7.4,
performs three
1.
It
2.
It
sets the
MORE_NAMES_EXIST
3.
It
sets the
guaranteed to be
PREV_1
and
less
to
special
The
two
lists
tasks:
LOW_VALUE
is
to a value that
list)
is
of setting
that the procedure ineffect
first
two records
in
way.
The
TRUE.
flag to
any
initialize(
files.
PREV_2
of the
provided in
Fig.
7.1,
following
to
work through
the pseudocode,
and
demonstrate to yourself that these simple procedures can handle the various
resynchronization problems that these sample lists present.
The three-way
test,
single-loop
model
lists
as
well as matching,
as
264
PROGRAM: merge
call initialize!) procedure to:
- open input files LIST_1 and LIST_2
- create output file 0UT_FILE
- set MORE_NAMES_EXIST to TRUE
- initialize sequence checking variables
call input
call input
()
()
while (MORE_NAMES_EXIST)
< NAME_2)
write NAME_1 to 0UT_FILE
call input () to get NAME_1 from LIST_1
if (NAME_1
/* match
names are the same */
write NAME_1 to 0UT_FILE
call input () to get NAME_1 from LIST_1
call input (J to get NAME_2 from LIST_2
endif
endwhile
finish_up(
else
end PROGRAM
FIGURE 7.5 Cosequential merge procedure based on a single loop.
the if
An
else
Note
that
we now produce
list contents.
merge is
between matching and merging is that with
construction since
important difference
a union of the
merging we must read completely through each of the lists. This necessitates a change in our input( ) procedure, since the version used for matching sets the MORE_NAMES_EXIST flag to FALSE as soon as we
detect end-of-file for one of the lists. We need to keep this flag set to
TRUE as long as there are records in either list. At the same time, we must
recognize that one of the lists has been read completely, and we should
avoid trying to read from it again. Both of these goals can be achieved if
265
we simply
NAME
set the
list
to
some
value
that
We
Fig. 7.6
files
file's
HIGH_VALUE
shows how
in-
ordered sequence.
OTHER_LIST_NAME
to
list
HIGH_VALUE. The
pseudocode
in
the
has reached
its
to
the
function
knows
end.
PROCEDURE:
input
()
input arguments
INP_FILE
PREVIOUS_NAME
OTHER_LIST_NAME
if (EOF)
else if (EOF)
NAME := HIGH_VALUE
else if (NAME <= PREVIOUS_NAME)
issue sequence check error
abort processing
endif
PREVIOUS_NAME
end PROCEDURE
NAME
*/
*/
266
Once
again, you should use this logic to work, step by step, through
provided in Fig. 7.1 to see how the resynchronization is handled
and how the use of the HIGH_VALUE forces the procedure to finish both
the
lists
lists
HIGH_VALUE
incorporating the
Summary
7.1.3
pieces of our
to a
it
more
Model
Generally speaking, the model can be applied to problems that involve the
processes)
files.
In this
summary of the
we assume
that
is
a list
of the
Assumptions
Comments
Two
In
Each
fields,
file is
sorted
and
all files
It is
the
some
cases, there
key value
that
is
the
same record
all files
have
structures.
fields.
must
exist a
high
in logical
The
is ir-
it
may
the
model
processing efficiency.
267
Assumptions
Comments
prohibit looking
ahead or looking back at records, but
such operations should be restricted to
subprocedures and should not be allowed to affect the structure of the
main synchronization loop.
nization loop.
in
memory.
in
place
components of the
model.
1.
cal
low
set to the
2.
One main
long
3.
Initialization.
all files
are read
the
logi-
first
all files
are
synchronization loop
as relevant records
is
remain.
as
if
cur r ent_f
i 1
two input
e1_k ey
cur rent_f
else if
else
/*
end
files,
cur r ent_f
>
e1_key
<
i 1
cur r ent_f
i 1
e2_k ey ) then
Input
read
form
e2_k ey ) then
files
and output
files
After
a successful
by comparing
when
sponding
5.
from
value.
4.
files.
record
is
is
set to
on the corre-
file.
High
curs.
occurred for
all
relevant input
terminates
files.
The
the need to add special code to deal with each end-of-file condition.
268
6.
All possible
activities are to
be relegated to
processes
that
is
is
than
7.2
we
problem of designing
general ledger
program
The
ledger contains
as
ledger
values
FIGURE 7.7 Sample ledger fragment containing checking and expense accounts.
Acct.
no.
Account
title
101
102
Checking account #1
Checking account #2
505
510
515
520
525
530
535
540
545
550
555
560
565
Advertising expense
Auto expenses
Bank charges
Books and publications
Interest expense
Legal expense
Miscellaneous expense
Office expense
Postage and shipping
Rent
Supplies
Travel and entertainment
Utilities
Jan
Feb
Mar
1032.57
543 78
2114.56
3094.17
5219.23
1321.20
25.00
195.40
0.00
27.95
103.50
25.00
307.92
5.00
27.95
255.20
25.00
501.12
5.00
87.40
380 27
12.45
57.50
21.00
500.00
112.00
62.76
84.89
17.87
105.25
27.63
1000.00
167.50
198.12
190.60
23.87
138.37
57.45
1500.00
241.80
307.74
278 48
Apr
269
Debit/
Acct.
no.
Check no
Date
Description
101
510
101
550
101
505
102
540
101
510
1271
1271
1272
1272
1273
1273
670
670
1274
1274
04/02/86
04/02/86
04/02/86
04/02/86
04/04/86
04/04/86
04/07/86
04/07/86
04/09/86
04/09/86
Auto expense
Tune up and minor repair
Rent
Rent for April
Advertising
Newspaper ad re: new product
Office expense
Printer ribbons (6)
Auto expense
Oil change
credit
78 70
78.70
500.00
500.00
87.50
87.50
32.78
32.78
12.50
12.50
.
is
illustrated in
Fig. 7.7.
The journal
file
contains the
file
is
because
at least
file as
this
journal
file
Once
it
the journal
contains
number
all
we
can
One
work
through the journal transactions, using the account number in each journal
entry to look up the correct ledger record. But this solution involves
seeking back and forth across the ledger file as we work through the
journal. Moreover, this solution does not really address the issue of creating
the output
list,
in
which
all
we
first
account, 101,
we would have
to
proceed
all
270
101
Checking Account #1
1271 04/02/86 Auto expense
1272 04/02/86 Rent
1273 04/04/86 Advertising
1274 04/09/86 Auto expense
- 78.70
- 500.00
- 87.50
12.50
Prev.
Prev.
510
5219.23
bal
1321.20
Checking account #2
670 04/07/86 Office expense
102
505
bal
32.78
New bal: 1288.42
Advertising expense
1273 04/04/86 Newspaper ad re: new product
25.00
Prev. bal:
Auto expenses
1271 04/02/86
1274 04/09/86
87.50
New bal:
112.50
78.70
12.50
New bal:
592.32
FIGURE 7.9 Sample ledger printout showing the effect of posting from the journal.
the
way through
account 101
as
the journal
we
collect
for
the
journal?
A much
better
solution
is
to
begin by collecting
all
the journal
transactions that relate to a given account. This involves sorting the journal
transactions
FIGURE 7.10
a list
ordered
by account number.
Debit/
Acct.
no.
Check no
Date
101
101
101
101
102
1271
1272
1273
1274
670
1273
1271
1274
670
1272
04/02/86
04/02/86
04/04/86
04/09/86
04/07/86
04/04/86
04/02/86
04/09/86
04/07/86
04/02/86
505
510
510
540
550
as in Fig. 7.10.
Description
Auto expense
Rent
Advertising
Auto expense
Office expense
Newspaper ad re: new product
Tune up and minor repair
Oil change
Printer ribbons (6)
Rent for April
credit
- 78.70
- 500.00
- 87.50
- 12.50
- 32.78
87.50
78.70
12.50
32.78
500.00
27
Ledger
Journal
list
101
Checking account #1
102
505
510
Checking account #2
Advertising expense
Auto expenses
FIGURE 7.1
Conceptual view
101
101
101
101
102
505
510
510
of cosequential
list
Auto expense
Rent
Advertising
Auto expense
Office expense
Newspaper ad re: new product
Tune up and minor repair
Oil change
1271
1272
1273
1274
670
1273
1271
1274
matching
of the ledger
and journal
files.
Now we can create our output list by working through both the ledger
and the sorted journal cosequentially meaning that we process the two lists
sequentially and in parallel. This concept is illustrated in Fig. 7.11. As we
start working through the two lists, we note that we have an initial match
on account number. We know that multiple entries are possible in the
journal file, but not in the ledger, so we move ahead to the next entry in the
journal. The account numbers still match. We continue doing this until the
,
We
file
tasks:
must produce
a printed
the beginning and current balance for each account, but also
lists all
We
it
is
the
most
difficult.
Let's
look
again at the form of the printed output, this time extending the output to
272
101
Checking account #1
1271
04/02/86 Auto expense
1272 04/02/86 Rent
1273 04/04/86 Advertising
1274 04/09/86 Auto expense
Prev.
102
515
520
bal:
1321.20
Advertising expense
1273 04/04/86 Newspaper ad
Auto expenses
1271
04/02/86
1274 04/09/86
32.78
New bal: 1288.42
new product
25.00
87.50
New bal:
78.70
12.50
New bal:
592.32
re:
Prev.
510
5219.23
Checking account #2
670 04/07/86 Office expense
Prev.
505
bal:
- 78.70
- 500.00
- 87.50
12.50
New bal 4540.53
bal:
112.50
Bank charges
Prev.
bal:
5.00
New Bal:
5.00
Prev.
bal:
87.40
New bal:
87.40
include a few
more accounts
as
shown
in Fig. 7.12.
As you can
see, the
all
From
is
a merge,
ledger
the
since even
not match
The
From
journal accounts,
straightforward
is
strictly
is
that
the
ledger
procedure must accept duplicate entries for account numbers in the journal
273
while
still
earlier
order, rejecting
The
as
functions that
variables that
ascending
duplicates.
all
our favor
in
in strict
test,
single-loop
let's
model works
look
files,
input
at the
identifying the
We
have
draw
from the
ledger.
would probably
return
the entire ledger record to the calling routine so that other procedures could
have access to things such as the account title as they print the ledger. We
are overlooking such matters here, focusing instead on the variables that are
PROCEDURE:
ledger_input
input arguments:
L_FILE
J_ACCT
file.
PREV_L_ACCT
read next record from L_FILE, assigning values to L_ACCT and L_BAL
if (EOF)
MORE_RECORDS_EXIST
*/
*/
/* sequence check
/* (permit no duplicates)
*/
*/
else if (EOF)
L_ACCT
:=
HIGH_VALUE
PREV_L_ACCT
end PROCEDURE
:=
L_ACCT
274
number
we
this
function
is
strictly
argument.
Figure 7.14 outlines the logic for the procedure used to accept input
file.
It is
including that
respects,
a
it
most
even
PROCEDURE:
ournal_input
as
we need
is
different in
have the
file.
input arguments:
J_FILE
L_ACCT
static,
MORE_RECORDS_EXIST
*/
*/
/* sequence check
/* (permit duplicates)
*/
*/
else if (EOF)
J_ACCT
HIGH_VALUE
:=
PREV_J_ACCT
end PROCEDURE
J_ACCT
275
PROGRAM:
ledger
PREV_L_BAL
L_BAL
call journal_input
*/
*/
while (MORE_RECORDS_EXIST)
if (L_ACCT
< J_ACCT)
*/
/* match
add journal transaction amount
/* to ledger balance for this account
else
*/
*/
end PROGRAM
FIGURE 7.15 Cosequential procedure to process ledger and journal
files to
produce printed
ledger output.
we
Fig. 7.15.
If the ledger
account
is
less
is
as follows:
<
276
HIGH_VALUE), we
PREV_BAL
date the
journal account
print the
title line
for the
new
variable.
2.
If the
3.
is
action
is
less
amount
We
it is
an un-
the
match
not read in
new
is a
reflection
a single
of
ledger
account.
The development of
cosequential processing
contributes to
its
this
model
adaptability.
We
it
model
7.3
how
illustrate this,
we now
of
extend the
to create a single,
as the order
of
more
which we want to merge K input
sequentially ordered output list. Kis often referred to
K-way
merge, in
K-way merge.
we
use to handle
lists of names (Fig. 7.5). This merging operation can be viewed as a process
of deciding which of two input names has the minimum value, outputting
name, and then moving ahead in the list from which that name is taken.
of duplicate input entries, we move ahead in each list.
Given a min() function that returns the name with the lowest collating
sequence value, there is no reason to restrict the number of input names to
two. The procedure could be extended to handle three (or more) input lists
that
In the event
as
shown
in Fig. 7.16.
which
lists
the
name
files
is
277
while (MORE_NAMES_EXIST)
NAME_3
...
NAME_K
if (NAME_1 == OUT_NAME)
call input () to get NAME_1 from LIST_1
if (NAME.2 == 0UT_NAME)
call input () to get NAME_2 from LIST_2
if (NAME_3 == 0UT_NAME)
call input () to get NAME_3 from LIST_3
if (NAME_K == 0UT_NAME)
call input () to get NAME_K from LIST_K
endwhile
FIGURE 7.16 /(-way merge loop, accounting
for duplicate
names.
Note that since the name can occur in several lists, every one of these //tests
must be executed on every cycle through the loop. However, it is often
possible to guarantee that a single name, or key, occurs in only one
we
C3 J
...
efficient.
list.
In
Suppose
listCK]
and suppose we reference the names (or keys) that are being used from these
lists at any given point in the cosequential process through another vector:
the procedure
shown
nameCK]
...
differs in
MORE_NAMES_EXIST
initial
flag.
three-way
test,
single-loop
move
as
it is
list,
is
as
simple
278
for
When we
in Fig. 7.17
begin merging
works nicely if K
number of
larger
noticeably expensive.
of Lists
no
larger than
lists,
the set of
is
minimum
reasons
value becomes
it is
rare to
want
eight
files
at
next
LOWEST
for
to K
if (nameti]
i
LOWEST
next
endwhile
IN
279
RAM
7, 10,
17
List
9, 19,
23
List
11, 13,
32
List 2
18, 22,
24.
List 3
List 4
List 5
15. 20,
30.
List 6
8, 16,
29.
List 7
Input
12, 14, 21
5,
in
6,
25
rio g2
for a
merge of K
lists.
7.4
a linear
In
Chapter 5
to
fit
of course, related to
is,
required to establish
this depth,
rather than
function of K.
A Second Look
enough
Ki
we
in
at Sorting in
RAM
RAM. The
operation
we
disk
file
that
is
small
steps:
from disk
RAM.
1.
Read the
2.
3.
Write the
entire
file
into
sort.
The
total
steps.
file
back to
disk.
file is
the
sum of the
We see that this procedure is much faster than sorting the file in place,
on the
disk, because
280
Can we improve on the time that it takes for this RAM sort? If we
assume that we are reading and writing the file as efficiently as possible, and
we have chosen the best internal sorting routine available, it would seem
not. Fortunately, there is one way that we might speed up an algorithm that
has several parts, and that is to perform some of those parts in parallel.
Of the three operations involved in sorting a file that is small enough to
fit into RAM, is there any way to perform some of them in parallel? If we
have only one disk drive, clearly we cannot overlap the reading and writing
operations, but
the
same time
whole
file in
memory
I/O:
Heapsort
internal sort
before
we
can
we have to
wait until
we have
an internal
is
reasonably
fast
time
key
a
it
new key
arrives,
it is
compared
because
it
means
that
we
tree.
is,
This
is
file is
and
if
it is
the largest
That
it
to the others,
as
they arrive in
loaded before
we
RAM,
start sorting.
Unfortunately, in the case of the selection tree, each time a new largest
key is found it is output to the file. We cannot allow this to happen if we
want to sort the whole file because we cannot begin outputting records until
we know which one comes first, second, etc., and we won't know this until
we have seen all of the keys.
Heapsort solves this problem by keeping all of the keys in a structure
called a heap. A heap is a binary tree with these properties:
Each node has a single key, and that key is less than or equal to the
key at its parent node.
i\2j It is a complete binary tree, which means that all of its leaves are on
no more than two levels, and all of the keys on the lowest level are
'*~\.)
^_
23456789
IT
/\
/\
281
RAM
P
ew^>
Q-
/\
FIGURE 7.19 A heap
in
both
Figure
its
tree form
and as
it
would be stored
in
an array.
7.
in an array.
of keys. In practice, each key has an associated record that is either stored
with the key or pointed to by a pointer stored with the key.
Property 3 is very useful for our purposes, because it means that a heap
is just an array of keys, where the positions of the keys in the array are
sufficient to impose an ordering on the entire set of keys. There is no need
for pointers or other dynamic data structuring overhead to create and
maintain the heap. (As we pointed out earlier, there may be pointers
associating each key with its corresponding record, but this has nothing to
do with maintaining the heap itself.)
set
in the array
The algorithm
we
two
parts. First
same time
that
we
The
shown
7.20.
Fig.
first
the File
we
essentially free.
in
in
it
comes
sample application of
this
algorithm.
This describes
how we
it
doesn't
To
tell
how
to
For
:=
to
REC0RD_C0UNT
make
282
FDCGHIBEA
New key to
be inserted
of the
Selected heaps
new key
in tree
form
12345678'
F
12
D F
12
C F D
^\
123456789
C F D G
123456789
C F D G H
123456789
C F D G H
123456789
B F C G H
123456789
BECFHIDG
123456789.
ABCEHIDGF
g'Nf
HI
we need
to look at
not going to do
a
block of records
F,
D, C,
all
of
RAM
the input buffer for each new block of keys can be part of the RAM
the records in the block before going on to the next block. In terms of
storage,
is set up for the heap itself. Each time we read in a new block, we
append it to the end of the heap (i.e., the input buffer "moves" as the
heap gets larger). The first new record is then at the end of the heap array,
as required by the algorithm (Fig. 7.20). Once that record is absorbed into
the heap, the next new record is at the end of the heap array, ready to be
absorbed into the heap, and so forth.
Use of an input buffer avoids doing an excessive number of seeks, but
it still doesn't let input occur at the same time that we build the heap. We
area that
just
IN
283
RAM
saw in Chapter 3 that the way to make processing overlap with I/O is to use
more than one buffer. With multiple buffering, as we process the keys in
one block from the file, we can simultaneously be reading in later blocks
from the file. If we use multiple buffers, how many should we use, and
where should we put them? We already answered these questions when we
decided to put each
new
a new
new
block
at the
by
we add
number of blocks
in the
file,
itself.
we append
employing
on
we
we have just
described,
where
RAM-sized
set
of input buffers.
Now we
read in
new
blocks
as fast as
new
new
each
can, never
if
speeds.
it is
First, let's
final step
File
look
at the
Again, there
is
nothing inherent in
this
algorithm that
(Fig. 7.23).
lets it
overlap
with I/O, but we can take advantage of certain features of the algorithm to
make overlapping happen. First, we see that we know immediately which
record will be written first in the sorted file; next, we know what will come
second; and so forth. So as soon as we have identified a block of records, we
can write out that block, and while we are writing out that block we can be
identifying the next block, and so forth.
Furthermore, each time we identify a block to write out, we make the
heap smaller by exactly the size of a block, freeing that space for a new
output buffer. So just as was the case when building the heap, we can have
as many output buffers as there are blocks in the file. Again, a little
coordination is required between processing and output, but the conditions
exist for the two to overlap almost completely.
A final point worth making about this algorithm is that all I/O that it
performs is essentially sequential. All records are read in in the order in
which they occur in the file to be sorted, and all records are written out in
sorted order. The technique could work equally well if the file were kept on
tape or disk.
More
importantly, since
all
minimum amount
I/O
is
sequential,
of seeking.
we know
that
it
284
Total
SZ
added
is
is
while heap
is
built here.
1
Fourth input buffer
heap
is
is
filled while
FIGURE 7.22 Illustration of the technique described in the text for overlapping input with
heap building in RAM. First read in a block into the first part of RAM. The first record is the
first record in the heap. Then extend the heap to include the second record, and incorporate
that record into the heap, and so forth. While the first block is being processed, read in the
second block. When the first block is a heap, extend it to include the first record in the second block, incorporate that record into the heap, and go on to the next record. Continue until
all blocks are read in and the heap is completed.
For
:=
to
contents of a heap
in
sorted order.
REC0RD_C0UNT
7.5
Way
Merging as a
In
Chapter 5
we
FILES
when we needed
RAM. The
285
ON DISK
on Disk
chapter offered
were
a partial,
but
which we needed
RAM,
Once the keys were sorted, we then had to bear the substantial cost
of seeking to each record in sorted order, reading each record in and
then writing it out into the new, sorted file.
With keysorting, the size of the file that can be sorted is limited by
the number of key/pointer pairs that can be contained in RAM.
Consequently, we still cannot sort really large files.
RAM
or a keysort, suppose
we
RAM
RAM
RAM.
The multiway merge algorithm
RAM
some temporary
amount of overhead
variables,
we
can create
a sorted subset
work
area,
We
is
of our
almost
and
full file
full,
by
sorting
call
1,000,000 bytes of
RAM
-7
-r
100 bytes per record
Once we
again filling
example,
we
10,000 records.
RAM,
286
iT
oo
<X~wOO
80 internal sorts
^^
iic^-i
schematic view of
this
file
and subsequent
containing
all
files,
the original
is
features:
files
of any
size.
Reading of the input file during the run creation step is sequential,
and hence is much faster than input that requires seeking for every
record individually
(as in a keysort).
Reading through each run during merging and writing out the sorted
records
is
also sequential.
Random
we
If a
heapsort
RAM
is
in section 7.4,
we
in-RAM
FILES
287
ON DISK
part does not add appreciably to the total time for the merge.
Since I/O
is
7.5.1
To compare
takes.
long
We do this
it
files
this
do
takes to
merge
sort
looks promising.
how much
file
time
and seeing
it
how
specifications are listed in Table 3.2. (Please note that our intention here
mean anything
is
sorting external
We
the
we have
in
posited.
files.)
computing environment:
Entire
files
and
a single
seek
is
We
(extents),
such
on disk
way
is
into
when I/O
is
performed:
of these
1:
Since
we
sort the
file
file.
in
In
RAM
for merging;
and
out to disk.
in order.
RAM
Step
a
file
288
we
fill
up 80 times
to
form the 80
runs. In
computing the
total
time to input
calculations, the role that each plays can vary significantly depending on the
approach used.
From Table 3.2 we see that seek and rotational delay times are 18 msec"
and 8.3 msec, respectively, so total time per seek is 26.3 msec* The
transmission rate is approximately 1,229 bytes per msec. Total input time
for the sort phase consists of the time required for 80 seeks, plus the time
required to transfer 80 megabytes:
"
Access:
80 megabytes
Transfer:
1,229 bytes/msec
=
=
67 seconds.
Total:
Step
2:
2 seconds
65 seconds
Writing Sorted Runs out to Disk In this case, writing is just the
the same number of seeks and the same amount of data
reverse of reading
to transfer.
So
it
RAM
RAM
RAM
80 runs x 80 seeks
80 megabytes
is still
is
6,400 seeks.
is still
=168
seconds. Since
65 seconds.
computing environment has many active users pulling the read/write head to
other parts of the disk, seek time is actually likely to be less than the average, since many
of the blocks that make up the file are probably going to be physically adjacent to one another on the disk. Many will be on the same cylinder, requiring no seeks at all. However,
for simplicity we assume the average seek time.
""Unless the
*For simplicity, we use the term seek even though we really mean seek and rotational delay.
Hence, the time we give for a seek is the time that it takes to perform an average seek followed by an average rotational delay.
1st
ii
FILES
289
ON DISK
mi mi
ii
ii ii
ii
ii
H Ml Ml
1
II
III
II
II
II
800,000
sorted records
ii
ii
ii
FIGURE 7.25 Effect of buffering on the number of seeks required, where each run
large as the available work area in RAM.
Step
4:
Unlike steps
buffer,
before
we
it is
are
file,
and
now
actually
2,
we need
to
know how
To compute
is
as
using that
merged.
To keep
matters simple,
let
us assume that
we
we
need to make
80,000,000 bytes
4,000 seeks.
Transfer time
is still
is
=105
seconds.
65 seconds.
The time estimates for the four steps are summarized in the first row in
7.1. The total time for this merge sort is 537 seconds, or 8 minutes,
57 seconds. The sort phase takes 134 seconds, and the merge phase takes 403
Table
seconds.
We
is
use
two
approximately the
size
of
a track
we
drive.
290
TABLE
7.1
Time estimates
for
merge
is
80-megabyte
sort of
in
Number
Amount
Seek + Rotation
Transfer
of
Transferred
(Megabytes)
Time
Time
(Seconds)
(Seconds)
Seeks
Total Time
(Seconds)
Sort: reading
80
80
65
Sort: writing
80
80
65
67
6,400
80
168
65
233
Merge: reading
Merge: writing
Totals
67
4,000
80
105
65
170
10,560
320
277
260
537
Chapter
5.
The
last
for loop:
/*
/*
for
out
i
:=
to
*/
*/
REC_CDUNT
RRN
file.
That is 800,000 seeks. At 26.3 msec per seek, the total time required to
perform that one operation works out to 21,040 seconds, or 5 hours, 50
minutes, 40 seconds!
Clearly, for large files the merge sort approach in general is the best
option of any that we have seen. Does this mean that we have found the best
technique for sorting large files? If sorting is a relatively rare event and files
are not too large, the particular approach to merge sorting that we have just
looked at produces acceptable results. Let's see how those results stand up
as we change some of the parameters of our sorting example.
7.5.2 Sorting a
The
first
File
That
Is
applicability of a
computing technique
to ask
how
this
approach stands
Before
we
look
at
how
bigger
file affects
291
ON DISK
FILES
merge sort, it will help to examine the kinds of I/O that are being done in
the two different phases, the sort phase and the merge phase. We will see
that for the purposes of finding ways to improve on our original approach,
we need pay attention only to one of the two phases.
A major difference between the sort phase and the merge phase is in the
amount of sequential (vs. random) access that each performs. By using
we
guarantee that
follows.
the
these buffers get loaded and reloaded at unpredictable times, the read step of
the merge phase is to a large extent one in which random accesses are the
norm. Furthermore, the number and size of the RAM buffers that we read
the run data into determine the number of times we have to do random
accesses. If we can somehow reconfigure these buffers in ways that reduce
the number of random accesses, we can speed up I/O correspondingly. So,
if we are going to look for ways to improve performance in a merge sort
algorithm, our best hope is to look for ways to cut down on the number of random
accesses that occur while reading runs during the
What about
is
Improvements
in the
On
merge phase.
when we measure
way we
organize runs.
merge
sort.
To sum up, since the merge phase is the only one in which we can
improve performance by improving the method, we concentrate on it from
the
we started
file is
for instance,
is
800,000?
"'"It
is
environment there
pulling the read/write head to other parts of the disk between reads and writes, possibly
it
the
r^
292
TABLE 7.2
Time estimates
merge
phase
800-megabyte
for
sort of
in
Number
Amount
Seek + Rotation
Transfer
of
Transferred
(Megabytes)
Time
Time
(Seconds)
(Seconds)
Seeks
Total Time
(Seconds)
Merge: Reading
640,000
800
16,832
651
Merge: Writing
40,000
800
1,050
651
1,703
680,000
1,600
17,882
1,302
19,186
Totals
we increase
space, we
17,483
If
RAM
RAM
RAM
The times
for the
merge phase
are
summarized
in
Table
7.2.
Note
that
process
its
corresponding run.
we want
File Size
Obviously, the big difference between the time it took to merge the
8-megabyte file and the 800-megabyte file was due to the difference in total
seek and rotational delay times. You probably noticed that the number of
100
is
file is
number of seeks
of the runs
so
K seeks
is
two
files.
file,
and
We
can
K-way merge of K
as large as the
293
ON DISK
FILES
is
of
size
RAM
space
K runs
all
= I x
size
of each run,
of the records
This
we
brief,
altogether, the
as files
grow
large,
can expect the time required for our merge sort to increase rapidly.
would be very
nice if
we
this
It
time.
Allocate
as disk drives,
RAM,
nels;
Perform the merge in more than one step, reducing the order of each
merge and increasing the buffer size for each run;
Algorithmically increase the lengths of the initial sorted runs; and
Find ways to overlap I/O operations.
In the following sections
with the
first:
Invest in
we
look
at
each of these in
detail,
beginning
more hardware.
We
Increasing the
Increasing the
Increasing the
amount of RAM;
number of disk drives; and
number of I/O channels.
RAM
294
Roughly speaking,
of
substantial effect
on
fewer initial runs during the sort phase, and it means fewer seeks per run
during the merge phase. The product of fewer runs and fewer seeks per run
means
a substantial
RAM
would
runs
increase
Number
Increasing the
Number
of I/O Channels
If there
is
at
the
it is
unlikely that 800 channels and 800 disk drives are available, and
even
all
if
FILES
295
ON DISK
they were,
buffers
increasing
substantially.
So
we
control over
how
ways
are likely to
at least
improve performance
is
a large
some such
system
the case,
this
is
we have some
to
our hardware
if
control.
we
specifically to
that
we
algorithmic ways to
One of the
hallmarks of
problem,
as
opposed
enormous
difference in cost
RAM
this
K-way merge
Each record
is
If a selection tree
is
Since
is
K-way merge of N
records
(total) is a
func-
log K.
directly proportional to
N,
this is
an
0(N
tion (measured in
is
it is
RAM,
all
296
We
have seen that one of the keys to reducing seeks is to reduce the
that we have to merge, thereby giving each run a bigger
number of runs
we accomplished
this
On
buffer space
per run.
scheme
is
is
When
all
When compared
to our original
runs.
32 runs
32 runs
VV V
800-way merge,
this
32 runs
800
we
FILES
is
and avoid
files at a
number of
a large
297
ON DISK
But, since
file.
we are able
seeks. When we
time,
disk
analyzed the seeking required for the 800-way merge, disregarding seeking
for the output
we
file,
files.
Let's
multistep merge.
First
Merge Step
initial
runs, each
of the
total buffer
is
Hence, in this
800 = 20,000 seeks, and
The
So,
total
number of seeks
by accepting the
number of
for the
two
steps
25,600
from 640,000
20,000
to 45,600,
we
45,600.
reduce the
we
and
haven't
RAM.
total
We now
have to transmit
all
of the
we
merge
is
When we add
Table
If
summarized
7.3.
we have done is
trade
time
We
to find a
way
we
do even
7.3 that
we have
reduced
total seek
it is
file,
we
where
three-step merge would
we may
to the point
have reached
point of
diminishing returns.
298
Time estimates
TABLE 7.3
for two-step
merge
sort of
in
800-megabyte
total
file,
time
assuming use
is 1 hour, 31
of
minutes.
Number
Amount
Seek + Rotation
Transfer
of
Transferred
(Megabytes)
Time
Time
(Seconds)
(Seconds)
Seeks
Total Time
(Seconds)
1st
Merge: Reading
25,600
800
673
651
1,324
1st
Merge: Writing
40,000
800
1,052
651
1,703
20,000
800
526
651
1,177
40,000
800
1,052
651
1,703
125,600
3,200
3,303
2,604
5,907
Totals
if
we
could
somehow
runs? Consider, for example, our earlier sort of 8,000,000 records in which
Our
Suppose
we
are
somehow
initial
able
to
one megabyte.
to
do only
the
number of seeks
is
available
RAM
number of seeks
is
""For
number
320,000 seeks,
end of
(1988, 1990).
In general, if
we
can
somehow
FILES
299
ON DISK
we
known
as replacement selection.
Replacement selection
from memory
replacing
it
with
implemented
1.
is
key
and then
Replacement selection can be
selecting the
the input
list.
as follows:
Read
in a collection
creates a
2.
3.
4.
Repeat step 3 as long as there are records left in the primary heap
and there are records to be read in. When the primary heap is empty,
make the secondary heap into the primary heap and repeat steps 2
and 3.
To
see
how
this
works,
input
list
of only
keys.
As
six
keys and
let's
begin with
We
only three
three keys
fit
select the
300
Input:
67,
21,
12,
47,
5,
16
Remaining input
21,
67,
21,
67
12
(P
47
16
12
47
16
21
67
47
16
67
47
21
47
-
67
67
Output run
3)
67
12,
16,
12,
21,
16,
12,
47
21,
16,
12,
47
21,
16,
12,
list.
member of the
memory
in
set
locations.
In this
happens
The new
example the entire file is created using only one heap, but what
fourth key in the input list is 2 rather than 12? This key arrives
if the
memory
The
too
late to
be output into
its
keys:
During the
5 has already
first
run,
when
RAM
keys.
runs.
The replacement
expense,
replacement selection's
ability
to
Two
questions emerge
memory
locations,
at this point:
merge can be
create longer,
two
major
and therefore
FILES
301
ON DISK
Input:
33,
18,
24,
58,
14,
17,
67,
21,
7,
12,
47,
5,
16
Memory
Remaining input
33,
18,
24,
58,
14,
17,
7,
21,
67,
33,
18,
24,
58,
14,
17,
7,
21,
67
33,
18,
24,
58,
14,
17,
7,
21
33,
18,
24,
58,
14,
17,
33,
18,
24,
58,
14,
17
33,
18,
24,
58,
14
33,
18,
24,
58
12
(P
47
16
12
47
16
67
47
16
67
47
21
67
47
7)
67
(1?)
7)
(14)
(17)
7)
33,
18,
24,
33,
18,
24
33,
18
58
33
Output run
3)
67,
12,
16,
12,
21,
16,
12,
47,
21,
16,
12,
47,
21,
16,
12,
second
14
17
14
17
58
24
17
58
24
18
58
24
33
58
33
58
58
14,
58,
17,
14,
18,
17,
14,
17,
14,
24,
18,
33,
24,
18,
17,
14,
33,
24,
18,
17,
14,
FIGURE 7.28 Step-by-step operation of replacement selection working to form two sorted runs.
1.
2.
Given P locations
question
is
that,
clever
way
to
discovered by E.
a circular track
situation
shown
we
expect re-
The answer
to the
show
F.
a circular
From Donald Knuth, The Art of Computer Programming, 1973, Addison-Wesley, Reading,
Mass. Pages 254-55 and Figs. 64 and 65. Reprinted with permission.
+
302
snow
snowplow
is
it
Once
the
it
is
is
snow
A new run is
P.
it is
formed
in the
it
will
approach
snow
is
at
constant height
linearly in front
a stable situation in
of the plow
amount
is
twice the
lllHHHIHil
Falling
snow
Future snow
Existing
snow/=fc(o^==!
|
^(Op~
So,
given
random ordering of
we
keys,
FILES
303
ON DISK
hold in
half as
many
selection to
assuming
the
runs as does
RAM
of
sorts
and the
see in a
make do with
RAM
memory
contents,
have access to
moment, the replacement
less
sort
memory
sort.)
It is
than 2P. In
many
applications,
produce runs
is
not
wholly
2P.
RAM
of
a series
if
that,
the input
is
already sorted.)
files.
Selection
it
does to so
many
other
which means,
is
prohibitive. Instead,
in turn, that
we
To
is
cost,
we want
Some of it
and the
affect
it
all
of
this
heapsort area
(a)
In-RAM
i/o buffer
(b)
Replacement
RAM
sort.
heapsort area
selection:
some of
available space
is
used for
i/o.
304
RAM
For the
records into
sorting
memory
10,000 records
methods such
until
full,
is
it
we
as heapsort,
at a time, until
the sort step requires 1,600 seeks: 800 for reading and 800 for writing.
we might
2,500 records,
so
we
at a
time,
takes 8,000,000/2,500
about 8,000,000/15,000
will be
we
step
we end up making
which hold an
15,000/18.73
801 seeks
Table 7.4 compares the access times required to sort the 8 million
records using both a
sort and replacement selection. The table
RAM
800-way merge and two replacement selection examples. The second replacement selection example, which produces runs of
40,000 records while using only 7,500 record storage locations in memory,
assumes that there is already a good deal of sequential ordering within the
includes our
initial
input records.
It
is
clear that,
one third
as
many
seeks as
RAM
sorting.
RAM
we would
sorting,
probably not in
reality
>s
to
c Z
09
Mo
cd
o B
C/5
"O
ed
to
ed
to
b*XH
^
<
OT
XV
CO
_C
CD
o E
.a o
M
GO
C CO
CO
3 =3
CO V|_
o
"O
i_
o CD
o -O
CD
E
c 3
o C
33
E
00
HZ
,i
(ft
0P
5 c
**
CD
fi
O 3
o
^m
03
Z3
cr
CD
o
D
CO
cd
[/)
-C
o
CO
o
u M
_*
o
u
i
C o
3
^_
CD
Z3
cr
CD
i_
CO
CD
E
'-
o
CD
s?
CD
S/3
d
o
o
o o
CD
CO
CO
CD
o c s
N 3 5Cfl
Un
03
CD
o CO
c c
o
CD
CO
E
03 CD
Q. O
E
o
03
(m
5-
a
5/3
Q.
CD
^t
r-
LU
E O
CN
<U
~-
_o
Uh
rs
O
-
t/3
i_
00
ri
<
hh
2^
Q,^o
c2
^
as
gJS
u 12
o C
^-n
305
>s
ed
73 .3
c
CD
E
CD
O
LO
LD
00
"*
CM
00
en
<*
"*
vC
2 =
C/3
TO
p<
A 03 ^
o o .5
PN -w
Q.
CD
Im
mC
HH
TO
o
CO
.>
<
ce
<~
HZ
^
CM
K
CM
ScJS
o
^
'
'
'
.c
*->
<+-
.Q
GO
r
C
C
CO
3
cz^
s S 2
co
O
O
SO O
lO O
CM CM
o
o
o
oo
q
^C CM
r- ^c
00 rH
cn in
CM
oo"
so
L.
o
o
CD
> *
* %
00
Cl
cd
las
X C
x c
8-3
-5
S
CM
LO
00
l<sl
*-*
TO
8
CO
CM
"5
o
2 CM
^
00 J,
Lj
4-1
-a
"^
fi
*5
CD
8|
o c
CD
*~
CD
>
8J
1
c
.3 J o
y. a u-
o
o
O
8
LO
o
LO
o
o
LO
cm'
cm'
TO
n_ -C
C
O
._2
&
TO
CD
fe.2
OO
CD
CD
CO
s o S
tf
c*o
t2
If)
P^
LU
1
<
1
306
J_,
4->
T3
r^
s-
u
ed
2
<
1m
&
Cm
<
o
,
00
'
-S
Cm
o
vi
es:
-MJ
*m
(/>
c/5
S o c
CJ
<~>
^^
-r
'->
^ C
8 OT!
-- u O rt o
Cm <y U
o c3 J
IU -c- S > Cm O
From Table
used.
less in
7.5
we
it
was
FILES
307
ON DISK
number of seeks
dramatically
is
method
as
important
as the use
of multistep, rather
Furthermore, since the number of seeks required for the merge steps is
smaller in all cases, while the number of seeks required to form runs
much
latter
RAM
replacement selection
still
less
dramatic.
RAM
two very
It
50%; and
If
(2)
seeking
we have two
to take advantage
two
means
as
that
much
as
virtually eliminated.
is
of them.
We
we should
memory
configure
also configure
as follows:
memory
We
allocate
buffers each for input and output, permitting double buffering, and
tree.
This arrangement
might proceed
to take advantage
is
Let's see
of
how
the
merge
sort process
this configuration.
First,
We
we
fill
up
move
replace those
records with records from one of the input buffers, adjusting the tree in the
usual manner. While
filling the
we empty one
tree,
we
can be
other one from the input disk. This permits processing and input
c 5
co .5
.2
a C
co
OS
Hhhi
c 5 S c
.22 *S
0)
5 I &
ho* oft
tt
c
2
*
tN
00
J*
03
0)
LO
<T)
C
St
0:
co
CO
>N
03
s?
<N
r
^
4-
^
S
.,
04
c/5
lo
CN
o
LO
o
O
OC
CN G\
i
O
V
O U
'J
o
c
E
o
On
^H
||3
*-H
v.
<N
X o
C
i
CN
LT5
o
in
CN
CN
w
O C
CO
<si
s g
s -8
S ? S
o
h
.,
>*
O O
PC _Q
>-
T o
So.
ji
oilC
too
X R O
j~.
rt
J-c
? o
p5c OC
<u
<u
co
&
a
308
C
O
u
O
u
CO
ft
X>-
-a
>.
>,
CN
in
<N
>^
CO
X ,
v x U
ScSD
*-*
'E
CO
CO
Ctf
-Q .S
4-'
CO
CO
FILES
ON DISK
309
input
buffers
output
buffers
buffers
from the
tree,
we
same time
we
that
are filling
this
two
Isn't
it
drives can
course the
we
more
drives
we have
to
Up
more?
hold runs during the merge
three, or four, or
to a point this
is
true,
but of
up with the data streaming in and out. And there will also be a point at
which I/O becomes so fast that processing can't keep up with it.
But who is to say that we can use only one processor? A decade ago, it
would have been far-fetched to imagine doing sorting with more than one
processor, but it is very common now to be able to dedicate more than one
processor to a single job. Possibilities include the following:
310
Massively
these
not appropriate, in
newer
this text, to
But just
many more
we
look
at
as the
changes over
external sorting,
commonplace.
7.5.10 Effects
of
Multiprogramming
We
taking place.
in
dedicated
I/O.
On
multiprogramming
is
to
allow the operating system to find ways to increase the efficiency of the
overall system
it
different jobs.
was doing
So
CPU
CPU
much
what
real
performance will be
like
on
that can
should be our goal to add these various tools to our conceptual toolkit for
designing external sorts and to pull them out and use them whenever they
It
are appropriate.
following:
full listing
of our
new
set
311
For
in-RAM
lap input
Use
as
time
With
RAM
as possible.
more
much
It
it
much
is
merge.
the
initial
can decrease
is
on
the system.
Keep
mind
relative costs,
7.6
faster to
we would
sorts
be remiss
if
we
did not
Theorem A.
It is
difficult to decide
is
best in a
given situation.
because of the
is
a starting point.
312
Viewed from
we
1.
2.
Merge
Replacement selection
is
and
file.
almost always
good choice
as a
method
for
amount of seeking
selection
more than
offsets the
Given
of
how
to
such
it is
clear that
it is
in the
7.31.
is
The numbers
expressed in an alternate,
By
all
initial
At the start of
Tl contains one run consisting of four initial runs
run consisting of two initial runs. This method of illustration
step 3,
tape drive
followed by
more compact
Tape
Step
313
Contains runs
Tl
Rl
R3
R5
R7
T2
T3
T4
R2
R4
R6
R8
R1-R2
R3-R4
R5-R6
R7-R8
R9-R10
Tl
T2
R1-R4
R5-R8
R9-R10
T3
T4
R9
RIO
Tl
Step 2
Step 3
T2
T3
T4
Tl
Step 4
Step 5
T2
T3
T4
R1-R8
R9-R10
R1-R10
Tl
T2
T3
T4
FIGURE 7.31
more
grow
shows
of
10
runs.
way some of
combine and
one run that is copied
again and again stays at length 2 until the end. The form used in this
illustration is used throughout the following discussions on tape merging.
Since there is no seeking, the cost associated with balanced merging on
tape is measured in terms of how much time is spent transmitting the data.
In the example, we passed over all of the data four times during the merge
phase. In general, given some number of initial runs, how many passes over
the data will a two-way balanced merge take? That is, if we start with
runs, how many passes are required to reduce the number of runs to 1?
clearly
the
314
XI
T2
T3
T4
11111
11111
Step 2
Step 3
4 2
Step 4
Step 5
Step
22222
10
in
more compact
table notation.
number
two
runs, the
is
from which
it
can be
shown
N=
N<
1,
that
P =
In our simple example,
Hog, N~l
file
there
writing overlap perfectly, each pass takes about 11 minutes," so the total
1"
time
is 1
merges, even
when
a single
is
disk drive
used.
far
in seek times.
we want
to
tells
us that
improve on
this
approach,
it is
clear that
we must
to reduce the
find
ways
the formula
the order of
assumes the 6,250 bpi tape used in the examples in Chapter 3. If the transports speed
200 inches per second, the transmission rate is 1,250 Kbytes per second, assuming no
blocking. At this rate an 800-megabyte file takes 640 seconds, or 10.67 minutes to read.
""This
is
we have
315
and 10 for output, at each step. Since each step combines 10 runs, the
number of runs
after
each step
is
step.
we have
Hence,
(Vxof
N<
and
p = Rogio
In general,
at
~l
last)
is
k.
N initial
runs
is
v = r~iog N~i.
fe
For
|~logio
The balanced merging algorithm has the advantage of being very simple; it
easy to write a program to perform this algorithm. Unfortunately, one
is
reason
it
is
simple
is
that
it
is
We
4,
we
shows how we can dramatically reduce the amount of work that has to be
done by simply not copying the extra run during step 3. Instead of merging
this run with a dummy run, we simply stop tape T3 where it is. Tapes Tl
and T2 now each contains a single run made up of four of the initial runs.
We rewind all the tapes but T3 and then perform a three-way merge of the
runs on tapes Tl, T2, and T3, writing the final result on T4. Adding this
intelligence to the merging procedure reduces the number of initial runs that
must be read and written from 40 down to 28.
The example in Fig. 7.33 clearly indicates that there are ways to
improve on the performance of balanced merging. It is important to be able
to state, in general terms, what it is about this second merging pattern that
saves work:
We
316
T2
Tl
Step
11111
Step 2
Step 3
T3
T4
2 2 2
2 2
Merge
ten runs
Merge
eight runs
Step 4
10
FIGURE 7.33 Modification of balanced four-tape merge that does not rewind
between steps 2 and 3 to avoid copying runs.
We
we merge some
Specifically,
in step 4.
We
we merge
the runs
steps.
and some
in step 3
from T3
in
two
phases.
These
runs from a tape in phases, are the basis for two well-known approaches to
The
a
initial
distribution of runs
The
is
is
the
such that
merging.
at least
the initial
number of available
In general,
is
merge
initial
numbers of runs.
how
two-way merge)
to 25.
is
tape drives.
runs distributed on four tape drives. This merge pattern reduces the
of
these
It is
is
number
balanced
consequence
what happens
than 5-3-2.
We
T3, but
Tl. Obviously,
second
if
is
4-3-3
rather
we
as a
step.
How
efficient
merge
pattern?
initial
T2
Tl
1
11111
Step 2
..111
Step 3
...
Step 4
....
Step
T4
33
10
Step 5
T3
Merge
six
Merge
five
317
runs
runs
2.
patterns, given an
initial distribution?
3.
N runs
Given
optimal
to
particular, the
consult
Knuth
(1973b).
RAM
memory
to allocate to
limited.
we want
to sort
RAM
we
can expect 5,334 runs of 1,500 records each, versus 534 when there is a
RAM. For a one-step merge, this 10-fold increase in the
megabyte of
number of runs
No wonder
no seeking, were
tapes,
preferred.
which
now
and require
31
now
RAM
is
order of the merge. Since disks are random-access devices, very large order
merges can be performed, even if there is only one drive. Tapes, however,
are not random-access devices; we need an extra tape drive for every extra
run we want to merge. Unless a large number of drives is available, we can
only perform low-order merges, and that means large numbers of passes
over the data. Disks are better.
7.7
Sort-Merge Packages
Many
large
very good
files.
7.8
UNIX
has a
number of utilities
for
in
UNIX
nothing
introduce
some of
these
utilities.
For
full
details,
consult the
we
UNIX
documentation.
UNIX
sorting of large
is
in
UNIX
files
of the type
we
sort-merge packages are not generally available on UNIX systems. Still, the
sort routines you find in UNIX are quick and flexible and quite adequate for
the types of applications that are common in a UNIX environment. We can
divide
UNIX
two
sorting into
IN
319
UNIX
command, and
(2)
UNIX
The
Command
sort
(A
lexical order.
character
'\n'.)
command
sorted
one
is
line
line
By
is
sort
command
named on
has
ASCII
many
file in
different
ascending
too large to
file is
is
The
fit
in
RAM,
file
sort
its
input
file
name from
performs
merge
sort. If
file
the
to be
more than
files.
To
sort the
file,
enter
$
sort team
Chris Mason Junior 9.6
Jean Smith Senior 7.8
Leslie Brown Sophomore 18.2
Pat Jones Freshman 11.4
Pat Jones Junior 3.2
Notice that by default sort considers an entire line as the sort key.
Hence, of the two players named "Pat Jones," the freshman occurs first in
+po5
where posl
tells which
of the
$
-pos2
tells
how many
sort
file
+1
If pos2
is
-2 team
to start a
fields to skip
end with.
Hence, entering
field to
line.
causes the
a
key with.)
that allows
you
last
names. (There
is
also
320
Use "dictionary"
ordering:
others, allow
Only
you
letters, digits,
signifi-
cant in comparisons.
-f
the canonical
is
form
that
we
-r
Notice that
and within
in
compares groups of
Chapter 4, records
are lines, and fields are groups of characters delimited by white space. This
is consistent with the most common UNIX view of fields and records
sort sorts lines,
characters delimited
within
The
UNIX
text
by white
it
files.
Library Routine
qsort
lines
UNIX
The
table are
its
nel,
int
of a
is
file,
loaded into
records. In C, qsort ()
RAM,
defined as
is
follows:
qsortCchar *ba5e,
The argument
base
is
int
a pointer to the
argument, compar(
),
is
the
is
name of
width,
int
*compar (
is
the
) )
number of
The last
that qsort(
uses to
UNIX
utility,
UNIX
provides a
cmp
Utilities in
difj]
The
sort
In this section
we
processing.
Suppose you find in your computer that you have two team files,
one called team and the other called my team. You think that the two files are
the same, but you are not sure. You can use the command cmp to find out.
IN
321
UNIX
cmp compares two files. If they differ, it prints the byte and line number
where they differ; otherwise it does nothing. If all of one file is identical to
the first part of another, it reports that end-of-file was reached on the
shorter file before any differences were found.
For example, suppose the
file
contents:
team
myteam
cmp
tells
differ:
char 23 line
it
files.
useful if you just want to know if two files are different, but it
you much about how they differ. The command diff gives fuller
information, diff telh what lines must be changed in two files to bring them
cmp
diff
doesn't
is
tell
team myteam
diff
1a2
>
Stacy Fox Senior
.6
3c4
<
Pat Jones Junior 3.2
1
Pat
>
The "la2"
in the first
file,
we need
to add line 2
file.
the second
This
leading
is
"<"
indicates that
file
to
followed by
a listing
is
from the
first
lines,
file,
where the
and the
">"
file.
means
file
322
work with
lines
of text.
file.
Notice that
diff,
like sort,
is
designed to
It
text
files.
comm
Whereas diff tells what is different about two files, comm compares
which must be ordered in ASCII collating sequence, to see what
they have in common. The syntax for comm is the following:
two
files,
file2
both
files.
lists lines
that are
For example,
$
$
you
1, 2,
The
sort, diff,
representative of
what
is
available in
UNIX
for sorting
and cosequential
that
SUMMARY
In the first half
and apply
merge
it
of this chapter,
to
sorting.
we
develop
cosequential processing
updating
we
model
most
files.
We
we
don't
SUMMARY
common
to
to
two
and
lists,
merge of two
lists.
all
model.
certain assumptions
enumerate these assumptions in our
formal description of the model. Given these assumptions, we can describe
the processing components of the model.
The real value of the cosequential model is that it can be adapted to
more substantial problems than simple matches or merges without too
In
its
much
alteration.
We
We
illustrate this
by using
the
model
to design a general
input
files.
model
to a
model might be extended to deal with more than two input lists. The
problem of finding the minimum key value during each pass through the
main loop becomes more complex as the number of input files increases. Its
solution involves replacing the three-way selection statement with either a
multiway
We
more
more
k,
list
structure
conveniently.
minimum key
so,
tree.
to a
problem
we
that
RAM
sorts
1.
Break the
file
nal sorting
2.
Merge
into
solution
sort.
when
merge
two or more
a file
is
sort involves
two
in-RAM
steps:
methods; and
the runs.
Ideally,
the
we would like
merge
step with
323
324
the
amount of internal
We
and/or
both cases, the order of each merge step can be reduced, increasing the
of the internal buffers and allowing more data to be processed per seek.
sizes
Looking
means
that
total data
we need
we
see
how
number of seeks
dramatically, though
it
in
also
to read
transmission time).
The second
realized
is
replacement selection.
total
number of seeks
required by the
two
different approaches,
only
when
there
is a
it
we find
on
that
performs substanfile.
I/O with
tapes does not involve seeking, the problems and solutions associated with
tape sorting can differ from those associated with disk sorting, although the
fundamental goal of working with fewer, longer runs remains. With tape
sorting, the primary measure of performance is the number of times each
record must be transmitted. (Other factors, such as tape rewind time, can
also be important, but
attention to
we do
file
sorting
always
good choice
the
files
on the
tapes. In
is
most
cases,
it is
is
how
drives
to distribute
more other
almost
is
number of
Two
number of runs
approaches to doing
this are
KEY TERMS
balanced merges
number of output
a fe-way
balanced merge,
same number of
all
input
same
tapes as there are input tapes, and the input tapes are read
step.
decreased by a
is
factor
as
polyphase merge or
cascade merge)
among
merge and
but one of
all
as a result
can
number of times each record has to be read. It turns out that the
distribution of runs among the first set of input tapes has a major
on the number of times each record has to be read.
decrease the
initial
effect
Next,
available
we
a listing
of
UNIX
utilities,
flexible
utilities
and
which
effective.
are
We
cosequential processing.
KEY TERMS
Balanced merge. A multistep merging technique that uses the same
number of input devices as output devices. A two-way balanced
merge uses two input tapes, each with approximately the same number of runs on it, and produces two output tapes, each with approximately half as many runs as the input tapes. A balanced merge is
suitable for merge sorting with tapes, though it is not generally the
best method (see multiphase merging).
cmp. A UNIX utility for determining whether two files are identical.
Given two files, it reports the first byte where the two files differ, if
they
comtn.
differ.
A UNIX
utility for
determining what
files, it
lines
two
files
have
first file
and not
in the second,
in
common,
and the
lines
325
326
diff.
A UNIX
determining
utility for
two files. It
make it like
all
between
first file to
from the
file to make it like the first, and the lines that need to be
changed in the first file to make it like the second.
heapsort. A sorting algorithm especially well suited for sorting large
second
files
that
fit
in
RAM
because
variation of heapsort
is
its
selection algorithm.
HIGH_VALUE. A
By
assigning
HIGH_VALUE
is
greater
as the
current key value for files for which an end-of-file condition has
been encountered, extra logic for dealing with end-of-file conditions
can be simplified.
fc-way merge. A merge in which k input files are merged to produce
one output
file.
LOW_VALUE. A
Multiphase merge.
merge
efficiently at
is
every
Multistep merge.
multistep tape
such that
at least
the
in
initial
merge
in
which not
all
of runs are merged separately, each set producing one long run consisting of the records from all of its runs.
These new, longer sets are then merged, either all together or in several sets. After each step, the number of runs is decreased and the
step. Rather, several sets
is
increased.
file.
is
is
a single
merge
final step
Although
multistep
theoretically
merge, it
and it may be the only reasonable way
the number of tape drives is limited.
to
perform
merge on tape
if
KEY TERMS
or runs, being
files,
is
maximized
ploys
every step.
at
general-purpose
UNIX
em-
Replacement
selection.
with
new
list.
When new
it
memory
in
records are
brought in whose keys are greater than those of the most recently
output records, they eventually become part of the run being created. When new records have keys that are less than those of the
most recently output records, they are held over for the next run.
Replacement selection generally produces runs that are substantially
longer than runs that can be created by in-RAM sorts, and hence can
help improve performance in merge sorting. When using replacement selection with merge sorts on disk, however, one must be careful that the extra seeking required for replacement selection does not
outweigh the benefits of having longer runs to merge.
Run. A sorted subset of a file resulting from the sort step of a sort
merge or one of the steps of a multistep merge.
Selection tree. A binary tree in which each higher-level node represents
the winner of the comparison between the two descendent keys. The
minimum (or maximum) value in a selection tree is always at the
root node,
making
ing several
lists.
It is
also a
good
key structure
in
replacement selection
(Tournament
sort,
an internal
merg-
sort, is also
merge
selection tree.)
expected
in a cosequential opera-
sort.
within
a single
objective
is
to
This
is
done by
ble.
as
second
simple as possi-
ing) to subprocedures.
as
much
327
328
Theorem
(Knuth).
It is
difficult to decide
is
EXERCISES
1.
section
in the
same
PREV_1
and
not
set to
4.
it
example
5.
Use
the /e-way
in section 7.2,
with the
merge example
new
procedure that
is
/e-way match.
6.
are
Figure 7. 17 shows a loop for doing a /e-way merge, assuming that there
no duplicate names. If duplicate names are allowed, one could add to the
In section 7.3,
keys
at
two methods
list
Compare
this.
a linear
in terms of numbers of
comparisons for k = 2, 4, 8, 16, 32, and 100. Why do you think the linear
approach is recommended for values of k less than 8?
8.
two approaches
800,000-record
file
RAM
EXERCISES
b.
How
long does
it
file
described in Chapter 5?
c.
Why
work
if
there
is
one megabyte of
RAM
9.
How much
seek time
is
amount of available
10.
Performance
7.5 if the
why
is
50 msec and
500 K? 100 K?
is
is
the
number of comparisons
in sorting
comparisons. Explain
required to perform
is
number of
files.
sorts,
we made
the simpli-
fying assumption that only one seek and one rotational delay are required
for
access. If this
more
time would be required to perform I/O. For example, for the 80-megabyte
file
used in the example in section 7.5.1, for the input step of the sort phase
all records into
for sorting and forming runs"), each
RAM
("reading
many
accesses.
and that
all files
b.
c.
Now
let's
assume
that the
must be accessed
total
12.
is
of 10
now
affect the
Derive two formulas for the number of seeks required to perform the
step of a one-step /e-way sort merge of a file with r records divided
merge
into k runs,
If
an internal sort
of each run is M,
the length of each run
Assume
RAM
is
equivalent to
M records.
used for the sort phase, you can assume that the length
but if replacement selection is used, you can assume that
is
is
about 2M.
Why?
a quiet
each of which
is
329
330
15.
so 10 cylinders
may be
files,
move
the
Assume we need
16.
patterns starting
c.
8-4-2
7-4-3
6-5-3
d.
5-5-4.
a.
b.
17.
25 16 45 29 38 23 50 22 19 43 30
runs are of length
following runs
(a
1.
After
1 1
initial sorting,
1:
24/36/13/25
Tape
2:
16
45
29
38
23
50
Tape
3:
22
19
43
30
11
27
b.
1,
24 36 13
list is
2,
on tape
4. Initial
Tape
a.
tapes
list
sort the
47
fourth
phases.
c.
Comment on
4-6-7
distribu-
Programming Exercises
19.
in
20.
in
in section 7.1
or Pascal.
or Pascal.
in section 7.
FURTHER READINGS
21.
Implement
Examine
the contents of
COMMON
files
FURTHER READINGS
The
presentation of a
a bit
algorithms to do sequential
Knuth
some
Knuth
book in
summary of the
subject.
his
this chapter.
VAX
sort utility
is
331
CHAPTER OBJECTIVES
Place the development of B-trees in the historical
Look
might be
paged AVL
as
trees.
I
Provide an understanding of the important properties possessed by B-trees, and show how these
properties are especially well suited to secondary
storage applications.
Describe variations of the fundamental B-tree algorithms, such as those used to build B * trees and
B-trees with variable-length records.
333
CHAPTER OUTLINE
8.1
Introduction:
B-Tree
Concatenation
8.2
8.13.1
8.3
8.4
AVL
8.5
8.6
8.7
as a
Solution
and
Redistribution
Trees
Utilization
8.15
the
Top-Down
B*
Trees
B-Trees
8.16.1
Bottom
LRU
Replacement
Height
and Promoting
8.8
Splitting
8.9
8.10
B-Tree Nomenclature
8.18 Variable-length
8.11
C Program
Keys into
Properties
B-Tree
Depth
8.17
Placement of Information
Associated with the Key
Pascal
to Insert
Program
to Insert
Keys into
B-Tree
8.1
Introduction:
Computer
science
The Invention
is a
young
of 1970,
of the B-Tree
discipline.
after astronauts
major, general-purpose
file
system that
is
later, it is
hard to think of
a
B-tree design.
335
By
Comer was
able
article,
to
that
state
We
become
de facto,
is,
when Comer
standard
the
database system."
"the B-tree
1979,
first
back
in the
work goes
of the issues
we
raise
indexing chapter.
In this paper
index for
we
a collection
The key x
information
random
namely
identifies
file.
(x, a)
By
an index
of fixed
we mean
size physically
unique element
in
the
index,
the
a.
associated
is
access
file.
For
this
is
of no
further interest.
We assume that the index itself is so voluminous that only rather small
parts
store.
time
as
opposed
The
class
which have
random
to a true
at
and
rather high data rate once the transmission of physically sequential data has
been
initiated.
moving head
disks,
keys
to
file itself
changes,
it
elements,
retrieve
are:
fixed and
cells.
to search
or better, where
/ is
index, and k
is
a device
"From
Ada-Informatica, 1:173-189,
permission.
New
the notion
they have
where k
is
of a B-tree
336
FILE
ORGANIZATIONS
page
size
One
provides
matter before
last
we
begin:
Why
the
name
B-tree?
Comer
(1979)
this footnote:
'
lie origin
of 'B-tree
As we
Creight].
'
'
shall see,
Mc-
Others suggest that the "B" stands for Boeing. Because of his contribuhowever, it seems appropriate to think of B-trees as "Bayer"-trees.
tions,
8.2
Statement
of the
Problem
many
seeks.
is
slow.
specific
storage
is,
This fundamental
problems:
Searching for
key on
disk
often involves seeking to different disk tracks. Since seeks are expensive, a search that has to
we
is
ate
number of the other keys in the index, index maintevery nearly impractical on secondary storage for indexes
consisting of only a few hundred keys, much less thousands of keys.
We need to find a way to make insertions and deletions that have
moving
nance
a large
is
only local effects in the index, rather than requiring massive reorganization.
337
AX
CL
DE
list
HN
FT
FB
NR
KF
JD
PA
RF
WS
TK
SD
YJ
of keys.
8.3
critical
cost of keeping a
list
in Fig. 8.1,
shown
we
can express
can be constructed as
representation of the
8.2. In each
and
two
left
list
it
is
simple matter to
first
node, the
in Fig. 8.2.
levels
and right
tree
shown
and right
linked
in Fig.
children
of the node.
If
each node
is
(RRNs) pointing
which the
link fields
it is
Note
KF.
list
of keys.
338
FILE
ORGANIZATIONS
tree.
reached the leaf level and that there are no more nodes on the search
We
But
to focus
important
new
on the
costs
a tree.
file
to be able to
perform
binary search.
random
to
is
file
noticeable,
We
no longer
Note that the
Key
Left
Right
child
child
Left
Key
FB
HN
JD
KF
RF
10
CL
SD
11
NR
AX
12
DE
YJ
13
WS
PA
14
TK
FT
Right
child child
in Fig.
8.2.
339
order.
that if
we add
records in the
new key
to the
such
file,
file
The very
all
as
L V, we
list.
The
tree
with
carried
is
from
is
LV
this is
it
to the
as
good
added
is
Search performance on
balanced state.
By
a leaf
level.
to complete balance,
this tree
we mean
is
still
good because
the tree
is
in a
balanced
where
the paths
of one
from root
is
we
as close as
can get
same
length.
NP MB TM LA UF ND TS NK
Just searching
down through
The
tree
is
now
shown
its
correct
in Fig. 8.6.
is
undesirable
any binary search tree, but is especially troublesome if the nodes of the
tree are being kept on secondary storage. There are now keys that require
seven, eight, or nine seeks for retrieval. A binary search on a sorted list of
these 24 keys requires only five seeks in the worst case. Although the use of
a tree lets us avoid sorting, we are paying for this convenience in terms of
extra seeks at retrieval time. For trees with hundreds of keys, in which an
in
too high.
more
340
HN
CL
/ \DE
AX
FILE
ORGANIZATIONS
PA
/ \JD
FT
/
NR
/
LV
WS
RF
/X
TK
YT
\TM
'\
NP
UF
/
MB
\ND
TS
NK
FIGURE 8.6 Binary search tree showing the effect
8.4
of
added
keys.
AVL Trees
we
Earlier
which keys
are entered
can result in
some very
letters
A-G,
and that
\
\D
\G
tree.
as
Suppose,
for
we receive these
we receive them
341
AVL TREES
produces
is,
in fact,
linked
list,
The
tree as
we
receive
elegant
method
known
as
new
A VL trees,
in
a height-balanarftree^
of difference that
a
common
AVL
tree
is
honor of the
that
which
M.
Landis
who
first
is
An AVL
defined them.
a limit
placed on the
M^
tree
amount
AVL trees.
not in balance
By
mance
is
tree.
known
as
1.
It
An
is
HB(k)
marked with an X.
make AVL
setting a
subtrees,
maximum
AVL
trees
important are
as follows:
in searching;
Maintaining
minimum
level
AVL
two
of perfor-
and
a tree in
trees
The trees illustrated in Fig. 8.8 have the AVL, or HB(1) property. Note
no two subtrees of any root differ by more than one level. The trees in
The two
/\
One
of
nodes of the
member of a more
trees,
to reorganize the
root. In an
is
somehow
Add'son-Vel'skii and E.
is
is
/\
X
/
m m
342
tions
confined to
is
of the
tree.
tree.
AVL
to build
large to
fit
in
memory,
AVL
it is
The
fact that an
AVL
tree
is
BCGEFDA
is
For
given
and the
AVL
same sequence,
completely balanced
N possible keys,
looks
of the
tree.
For an
So,
is
tree, the
AVL
(N +
tree, the
1.44 log 2
levels.
key,
at
log 2
levels
tree resulting
1)
(N+
at
2)
an
AVL
tree, the
FIGURE 8.1
structed using
343
is
very interesting
AVL
result,
no more than
shown
among
others,
have
other insertion into the tree and for approximately every fourth deletion. So
height balancing using
AVL
methods guarantees
that
we
will obtain a
at a cost that
When we
memory.
more
key
is less
two problems
that
we
is
identified earlier in
this chapter:
Keeping an index
we
is
seeks;
and
expensive,
second problem.
8.5
many
in sorted order
Now we
first
problem.
again
we
are confronting
It
is
critical feature
of
what
is
seek and fast data transfer leads naturally to the notion of paging. In
a paged
few bytes.
of the disk, you read
in
take as
many
as 12 seeks.
344
ORGANIZATIONS
FILE
A A
mi
nnnn nnnn
A A
a
/\
/\
9 9 *
-3,
n n n
A ^A
S
l\ l\ l\ l\
/\ /\
3 #
/\
/\
nn
ft
A A
A A
/wwwi
S:
/\ /\ l\ l\
/i
/in
tree.
Clearly, breaking the tree into pages has the potential to result in faster
we
much
faster retrieval
have considered up to
this
is
(N +
log 2
where
is
the
number of keys
full,
log fe+1
1)
number of seeks
balanced tree
(N +
required for
is
1)
where
is,
number of keys
=!
345
binary tree
is
It is
makes
the
log 511 +
(134,217,727
1)
27 seeks
1)
3 seeks.
8.6
from this sorted list. Most importantly, if we plan to start building the tree
from the root, we know that the middle key in the sorted list of keys should
be the root key within the root page of the tree. In short, we know where to
begin and are assured that
a
this
set
of keys
in
balanced manner.
Unfortunately,
receiving keys in
the
problem
is
we must
much more
inserting
complicated
them
we
as
soon
as
we
if
we
are
receive
CSDTAMPIBWNGURKEHOLJYQZFXV
We
will build a
paged binary
we
we
as
is
it
clearly illustrates
root. In this
of three
When you
maximum
346
ORGANIZATIONS
FILE
A A
H
\O
A
Y
we want
there.
beginning of the
They
in
are
total set
balance.
Once the wrong keys are placed in the root of the tree (or in the root
of any subtree further down the tree), what can you do about it?
Unfortunately, there is no easy answer to this. We cannot simply rotate
entire pages of the tree in the same way that we would rotate individual
keys in an unpaged tree. If we rotate the tree so the initial root page moves
down to the left, moving the C and D keys into a better position, then the
S key is out of place. So we must break up the pages. This opens up a whole
world of possibilities and difficulties. Breaking up the pages implies
rearranging them to create new pages that are both internally balanced and
well arranged relative to other pages. Try creating a page rearrangement
algorithm for the simple, three-keys-per-page tree from Fig. 8.13. You will
find it very difficult to create an algorithm that has only local effects,
rearranging just a few pages. The tendency is for rearrangements and
adjustments to spread out through a large part of the tree. This situation
grows even more complex with larger page sizes.
So, although we have determined that the idea of collecting keys into
pages is a very good one from the standpoint of reducing seeks to the disk,
347
we have
confronting
CK
at least
way
two unresolved
How
do we ensure
good
We
are
still
questions:
up the
set
less
evenly?
B^
How
ple, that
There
is,
size
page?
as
we have
of our sample
tree:
How
which
a large
article
8.7
A number
8.8
Splitting
at a
and Promoting
of pointers. There
is
no
348
FIGURE 8.14
Initial ieaf of
seven.
one.
we have
at
node
is
called
an order-eight B-tree.
eight pointers.
Our
initial leaf of
the tree might have a structure like that illustrated in Fig. 8.14 after the
insertion of the letters
B C G E
The
DA
starred (*) fields are the pointer fields. In this leaf, as in any other
of
all
do not lead
pages
the pointers
no children
usually
contain
an
is
invalid
By
such
value,
as
1.
Note,
also usually
is
stored with the key, such as a reference to a record containing data that are
associated with the key. Consequently, additional pointer fields in each
page might actually lead to some associated data records that are stored
But,
elsewhere.
Building the
a single
page
first
is
our present
easy enough.
insert the
key into
memory,
its
for
of no further interest."
is
we
come
keys
in?
Suppose
we
leaf to
when
shown
a
to
add
is
in Fig. 8.15
searching. In short,
J key.
we want
can between the old leaf node and the new one, as
Si nre wgjiow have two lea ves, we need to create
we
tree to enable us to
as additional
we
349
need to create
leaves. In this
In this
in
two
8.16.
steps to
make
how
paged binary
splitting
The sequence
is
CSDTAMPIBWNGURKEHOLJYQZFXV
We use an order-four B -tree
page), since this corresponds to the page size of the paged binary tree.
such
more
Using
promotion.
We
split
omit
explicit indication
of the pointer
fields so
we
can
fit
larger tree
Figure
remaining keys in the sequence are added. We number each of the tree's
pages (upper left corner of each node) so you can distinguish the newly
added pages from the ones already in the tree.
Note
is
leaf.
as the
Also note that the keys that are promoted upward into the tree
of keys we want in a root: keys that are good
separators.
pages
fill
up,
Insertion of C,
S,
and
D
c D
Insertion of
5:
c D
A added without
incident:
A C D
Insertion of
split
\<^r
M forces another
'3
A C
P,
I,
B,
and
^cx/vCtX*^
W inserted
D N
A B C
N causes another
followed by the promotion of N. G, U, and R are
Insertion of
split,
added
to existing pages:
A B C
FIGURE 8.17 Growth of a B-tree, part
is imminent.
350
I.
The
which
split-
m^ o<? 33
root.
is
added
to a leaf:
D K
r
E G
A B C
Insertion of
H causes
a leaf to split.
/
M
T U
P R
H is
A B C
T U
Insertion of
splits
are added:
B C
II.
M
The
root splits to
T L V
add
new
level;
X Y
inserted.
351
352
8.9
Now
that
we
have had
look
a brief
at
how
work on paper,
make them work
B-trees
let's
in a
Page Structure
We
used by
As you
a B-tree.
and
in the following
many
different
and
structures ex-
pressed in
In C:
struct BTPAGE {
short
KEYCDUNT;
char
KEYCMAXKEYS]
CHILDCMAXKEYS+1
short
>
PAGE;
;
in
PAGE */
*/
3SS^S
* /
In Pascal:
TYPE
BTPAGE
RECORD
KEYCDUNT
KEY
CHILD
nt eger
ar ray [
ar ray [
1
1
MAXKEYS] of char;
MAXCHILDREN] of integer
END;
VAR
PAGE
BTPAGE;
Given this page structure, the file containing the B-tree consists of a set
of fixed-length records. Each record contains one page of the tree. Since the
keys in the tree are single letters, this structure uses an array of characters to
hold the keys. More typically, the key array is a vector of strings rather than
in a B-tree
of order four.
353
Part of a B-tree:
2
H K
/
A B C
'NT
(a)
Contents of
PAGE
-
for pages 2
and
KEYCOUNT f ^
{
KEY
3:
array
CHILD
p*
array
*>*
Page 2
Page 3
NIL
NIL
NIL
(b)
FIGURE 8.19 A B-tree of order four, (a) An internal node and some leaf
nodes, (b) Nodes 2 and 3, as we might envision them in the structure
PAGE.
Searching
The
first
procedure. Searching
yet
-^n
^-~*n
still
is
B-tree algorithms
a
good
we examine
are a tree-searching
it is
relatively simple
alternatively
on
entire pages
and
at successively lower
of the tree until it either finds the key or finds that it cannot descend
further, having reached beyond the leaf level. Figure 8.20 contains a
description of the searching procedure in pseudocode.
354
FUNCTION:
FILE
ORGANIZATIONS
FOUND_POS)
end FUNCTION
FIGURE 8.20 Function search (RRN, KEY,
F0UND_RRN, FOUND_POS)
searches
re-
cursively through the B-tree to find KEY. Each invocation searches the page refer-
enced by RRN. The arguments FOUND_RRN and FOUND_POS identify the page
and position of the key, if it is found. If searchO finds the key, it returns FOUND.
it goes beyond the leaf level without finding the key, it returns NOT FOUND.
Let's
work through
the function
by hand, searching
for the
If
RRN
RRN
not
NIL,
so
the
(2).
function reads the root into PAGE, then searches for K among the elements
of PAGE.KEY[]. The K
not found. Since K should go between D and
tree illustrated in Fig. 8.21.
argument equal
We begin by
RRN
to the
of the root
This
with the
is
is
RRN
stored in
On
PAGE.CHILD[1]. The
the next
is
searches
not
found.
RRN
RRN
is 3.
for
This
and
stored in
levels
of
return
I,
PAGE.CHILD[2].
Since this call is from a leaf node, PAGE.CHILD[2] is NIL, so the call
search() fails immediately. The value NOT FOUND is passed back
function
value of this
call, search()
We
PAGE.KEY[0], and
the
RRN
is
key
PAGE.CHILDfO].
is
in a
not found.
page
is
355
D N
/
G
A B C
/.XT
M
P R
T U
FIGURE 8.21
FOUND_POS,
2 of page
3,
3.
to look for
that
It
it
M, which
is
in the tree.
and 2
in
it
It
follows the
finds the
M in
FOUND_RRN
and
FOUND.
and Promotion There ar e two important obsermake abo ut the insertion, splitting, and promotion proc ess.
Insertion, Splitting,
vations
*a
It
we
level;
&r
can
all
the
way down
to the leaf
and
at the leaf level, the work of inand promotion proceeds upward from the bottom.
Consequently,
we
as
having three
phases:
1.
search-page step
the recursive
2.
The
recursive call
the tree as
3.
it
itself,
before
Insertion, splitting,
recursive
call;
call,
it;
and
after the
fol-
We
356
Before inserting
FILE
ORGANIZATIONS
$:
After inserting
$:
H N
in Fig.
8.18.
Now let's see how the insert!) function performs this splitting and
promotion. Since the function operates recursively, it is important to
understand
insert()
how
function that
we
CURRENT_RRN
on successive
calls.
The
The
RRN
use.
As
is
currently in
cends the
RRNs
in-
KEY
The key
PROMO_KEY
that
is
to be inserted.
to carry
PROMO_R_CHILD
This
is
is
a split,
357
When
split.
is
is
inserted with
it.
and
PROMO_R_CHILD,
makes
NO PROMOTION
PROMO_KEY
PROMOTION if
arguments
it
done and
ERROR if the insertion cannot be made.
Figure 8.23 illustrates the way the values of these arguments change as
the insert() function is called and calls itself to perform the insertion of the
$ character. The figure makes a number of important points:
promotion,
nothing is promoted, and
a
as the
fected
by
an insertion
function
path of successive
The
if
calls
CURRENT_RRN
descending the
calls itself,
is
This search
tree.
splitting
As each
recursive
call returns,
we
PROOtherwise, we
MOTION,
therefore return
that the
NO PROMOTION
PROMO_KEY
and
from
we
are able to
splitting,
and
That means
from this level
this level.
PROMO_R_CHILD
have no meaning.
Given
this
introduction to the
insert()
function's operation,
shown in
Fig. 8.24.
we
are ready
We have already
PAGE
The page
NEWPAGE
New
POS
The position in
or would occur
P_B_RRN
The
that insert()
page created
currently examining.
if a split occurs.
PAGE
(if it is
present)
(if inserted).
relative record
level. If a split
is
number promoted
occurs
at the
from below up
next lower
level,
to this
P_B_RRN
P_B_KEY
P_B_RRN,
is
inserted into
PAGE.
358
KEY =
CURRENT RRN =
NO PROMOTION
PROMO_KEY = <undefined>
PROMO_R_CHILD = <undefined>
Return value:
fc
Search
step
Recursive
call
Insertion and
KEY =
splitting logic
mm
CURRENT RRN
Return value: PROMOTION
PROMO_KEY = H
PROMO R CHILD = 12
Search
step
Recursive
call
Insertion and
KEY =
splitting logic
CURRENT RRN
PROMOTION
PROMO_KEY = B
PROMO R CHILD = 11
Return value:
Search
fet
step
Recursive
Insertion
KEY =
call
and
4n
splitting logic
PROMOTION
PROMO_KEY = $
PROMO R CHILD = NIL
Return value:
Search
step
Recursive
Insertion
call
and
splitting logic
FIGURE 8.23 Pattern of recursive calls to insert $ into the B-tree as illustrated
in Fig.
8.22.
insert
FUNCTION:
if CURRENT_RRN
NIL then
PROMO_KEY := KEY
PR0M0_R_CHILD := NIL
return PROMOTION
else
read page at CURRENT_RRN into PAGE
search for KEY in PAGE.
let POS := the position where KEY occurs or should occur.
if KEY found then
RETURN_VALUE
KEY,
P_B_RRN, P_B_KEY)
if RETURN_VALUE
KEY
in a
B-tree.
number CURRENT_RRN.
sively until
it
finds
KEY
(CURRENT_RRN,
The
If
in a
KEY,
PROMO_R_CHILD, PROMO_KEY)
page is not a
page or reaches a
this
If
it
finds KEY,
it
issues an error
ERROR. If there is space for KEY in PAGE, KEY is inserted. Otherwise, PAGE is split. A split assigns the value of the middle key to
PROMO_KEY and the relative record number of the newly created page to PROMO_R_CHILD so insertion can continue on the recursive ascent back up the tree. If a promotion does occur, insertO indicates this by returning PROMOTION. Otherwise, it returns
NO PROMOTION.
message and
quits, returning
359
360
PROCEDURE:
split (I_KEY,
I_RRN,
FILE
ORGANIZATIONS
PAGE,
copy all keys and pointers from PAGE into a working page that
can hold one extra key and child.
insert I_KEY and I_RRN into their proper places in the working page.
allocate and initialize a new page in the B-tree file to hold NEWPAGE.
set PR0M0_KEY to value of middle key, which will be promoted after
the split.
set PR0M0_R_CHILD to RRN of NEWPAGE.
copy keys and child pointers preceding PR0M0_KEY from the working
page to PAGE.
copy keys and child pointers following PR0M0_KEY from the working
page to NEWPAGE.
end PROCEDURE
FIGURE 8.25 Split (l_KEY, l_RRN, PAGE, PROMO_KEY, PROMO_R_CHILD, NEWPAGE), a
procedure that inserts l_KEY and l_RRN, causing overflow, creates a new page called
NEWPAGE, distributes the keys between the original PAGE and NEWPAGE, and determines
which key and RRN to promote. The promoted key and RRN are returned via the arguments
PROMO_KEY
and
PROMO_R_CHILD.
When
functions.
coded
in a real language,
between the
insertf)
uses a
number of support
split(),
original
RRN
split()
is
to
procedure, which
You
is
promoted from
the
how
split()
working page
moves
all
of the
data.
Note
that
CHILD RRNs
are transferred
is
the
RRN
promoted key. Figure 8.26 illustrates the working page activity among
the working page, and the function arguments.
The version of splitf) described here is less efficient than might
sometimes be desirable, since it moves more data than it needs to. In
Exercise 17 you are asked to implement a more efficient version of split().
the
PAGE, NEWPAGE,
361
We
need
routine to
tie
together our
insert!
and
split(
procedures and to do some things that are not done by the lower-level
Our
routines.
Open
Read
driver
Create
It is
new
PAGE
root node
that the
file,
do the following:
to put the
tree.
routine driver
assumed
able to
keys in the
The
must be
when
shown
RRN
insert(
splits the
is
file itself.
of data in splitO.
PAGE
7D
K
Working page
^
w
I_KEY
into
(B)
and
I_RRN
working page.
^f
RRN
(12) of
t
PROMO RRN
NEWPAGE.
NEWPAGE
PAGE
PROMO KEY
7*-
362
FILE
ORGANIZATIONS
does
exist, driver
opens
it
the
first
key
these to create a
8.10
new
root.
B-Tree Nomenclature
Before moving on to discuss B-tree performance and variations on the basic
B-tree algorithms, we need to formalize our B-tree terminology. Providing
careful definitions
of terms such
must be present
as order
and
terms relatin g to
Barges.
Reading
is
that literature
not uniform in
its
us e_ojl
363
B-TREE NOMENCLATURE
Comer
(1979),
and
number of keys
few
that can
be in a page of a tree. So, our initial sample B-tree (Fig. 8.16), which can
hold a maximum of seven keys per page, has an order of three, using Bayer
full
when
it
when
it
contains seven
keys?
by
references a
maximum, not
minimum, and
it
keys.
Use of Knuth's
number of keys in a
definition
B-tree page
is
always one
Consequently,
less
fact
than the
that the
number of
of order m, the
are divided as
between the new page and the old page. Conseq uently,
every page except the root and the leaves has at east ml 2 descendents
Expressed in terms of a ceiling function, we can say that the minimum
It follows that the minimum numnumber of descendents is \ m/2
as possible
.__
1.
\J&pL
(The
notion of leaf
as the
364
8.1
FILE
ORGANIZATIONS
2.
3.
The
4.
1.
at least [
precise
m/2~\ de-
scendents.
6.
8.12
at least
it is
a leaf).
A
A
5.
root has
m -
keys.
is
might
know
that
you need
and
of order 512
(maximum of 511
that,
given the
it is
reasonable
Given these two facts, you need to be able to answer the question, "In the
worst case, what will be the maximum number of disk accesses required to
locate a key in the tree?" This is the same as asking how deep the tree
will be.
We can answer this question by beginning with the observation that the
number of descendents from any level of a B-tree is one greater than the
number of keys contained at that level and all the levels above it. Figure 8.28
illustrates this relation for the tree
we
T his tree contains 27 kevs fall the letters of the alphabet and S). If you co unt
me number of potential descendents trailing from the leaf level, you see that
there are 28 of them.
Next we need
to observe that
from any
of
level
we
B-tree of
some given
order. This
is
of interest because
we are interested in the worst-case depth of the tree. The worst case occurs
when every page of the tree has only the minimum number of descendents.
In such a case
iimal breadth.
it
tree
and
365
H N
ddd
dd
Fo r
root page
(A/
1)
T U V
X Y
dddddddd
dd
leaf level.
mil 1 de scendents.
he third
level,
then, co ntains
2
X [ ml2~\
minimum of [ m/2~\
descendents, the general pattern of the relation between depth and the
Minimum number
Level
1
0?
of descendents
(root)
x \ mll\
x fm/21 x [~m/2~|or 2 x
3
2 X \ml2~}
tf
\
'
mil'}
4?
d-\
2 x fro/21
So.
in
general,
for
any
level
d of a B-tree,
thaf level
2
\^
x fm/2l d-\
the
minimum numbe r of
366
Wejcnow
Let
level.
that
s call
ORGANIZATIONS
N keys
free with.
descendents from
N+
we know
than the
at
that the
number
We
d.
of height d
a tree
its
leaf
minimum number
of
as
N+1>2X
since
has
between the
relationship
FILE
[ ml2~]
d-
cannot be
tree
d,
we
less
arrive
+ logrw2l
((N
l)/2).
a B-tree
A/keys. Let's find the upper bound for the hypothetical tree that
at the start
of
numbers
d
<
with
we describe
we
find that
or
d
<
3.37.
8.13
performance
is
is
describe earlier; in
we
state the
follow ing:
at least
\~
m/2~\ de-
scendents;
A
A
We
page contains
at
least [ ml2~\
keys; and
keys.
We
jJiese
367
The
simplest situation
is
illustrated in case
Consequently, deletion involves nothing more than removing the key from
the page and rearranging the keys within the page to close up the space.
M (case
Deleting the
is
swap
its
it
not in a
lea f, there
is
an easy
wav
to get
it
into a leaf:
We
wit h
r6/2~|Therefore,
we
2.
this
more than
the
minimum number
it
underflow
has the
same
full
page.
Concatenation
is
of splitting. Like
splitting,
it
can
underflow
is
just
what happens
in
our example.
Our concatenation of pages 3 and 4 pulls the key D from the parent page
down to the leaf level, leading to case 5: The loss of the D from the parent
page causes it, in turn, to underflow. Once again, redistribution does not
solve the problem, so concatenation
must be used.
1: No action.
Delete J from page 5. Since page 5 has more
than the minimum number of keys,
Case
J can
Case
2:
Swap
X Y
X Y
successor.
6),
6.
Case
3:
Redistribution.
among pages
2, 7,
and
8 to restore balance
between leaves.
Promote
move U and V
into page
7.
u V
368
,i
Case
4:
Concatenation.
it cannot be
addressed by redistribution. Concatenate the
keys from pages 3 and 4, plus the D from
page 1 into one page.
Underflow
u V
X Y
X Y
New
page
3:
Now
5:
page
Underflow
moves up to
here
C D E F
Case
6:
it is
c D E F
U V
X Y
369
370
Note
ORGANIZATIONS
the
that
FILE
(Q and W) had
Case 6 shows what happens when concatenation propagates all the way
The concatenation of pages 1 and 2 absorbs the only key in the
root page, decreasing the height of the tree by one level.
The steps involve d in deleting keys from a B-tree can be su mmarized as
to the root.
follow
If the
key
successor,
to be deleted
which
Q1
fuA
If the leaf
>~v
(Aj
now
further action
If the leaf
swap
a leaf,
it
with
its
immediate
required.
is
now
not in
is
in a leaf.
is
left
and right
siblings.
a.
If a sibling
has
more than
the
minimum number
of keys, redis-
tribute.
b.
two
more than
the
minimum,
is
3-6
concatenate the
into
one
leaf.
to the parent.
tree
decreases.
8.13.1 Redistribution
Unlike concatenation, which is a kind of reverse split, redistribution is a
idea. Our insertion algorithm does not involve operations analogous to
new
redistribution.
It is
guaranteed to have
Note
it
that
of Fig. 8.29),
these nodes are not siblings. Redistribution algorithms are generally written
this restriction?
how
single deletion in a
WAY
37
one key from a sibling into the page that has underflowed, even if the
distribution of the keys between the pages is very uneven. Suppose, for
example, that we are managing a B-tree of order 101. The minimum
number of keys that can be in a page is 50, the maximum is 100. Suppose
we have one page that contains the minimum and a sibling that contains the
maximum. If a key is deleted from the page containing 50 keys, an
underflow condition occurs. We can correct the condition through
redistribution by moving one key, 50 keys, or any number of keys that falls
between 1 and 50. The usual strategy is to divide the keys as evenly as
possible between the pages. In this instance that means moving 25 keys.
8.14
recall,
to redistribution; splitting
as
to
Improve
during insertion
Way
it is
not
all
instances of overflow.
desirable to
a set
use redistribution
of B-tree maintenance
deletion.
is
way of
avoiding,
two approximately
or
a full
at
least
page and
some
of the overflowing keys into another page. The use of redistribution in place
of splitting should therefore tend to make a B-tree more efficient in terms
of its utilization of space.
It is possible to quantify this efficiency of space utilization by viewing
the amount of space used to store information as a percentage of the total
amount of space required to hold the B-tree. After a node splits, each of the
two resulting pages is about half full. So, in the worst case, space utilization
in a B-tree using two-way splitting is around 50%. Of course, the actual
degree of space utilization
lias
shown
approaches
is
Yao
(1978)
372
includes
some experimental
of
in a space utilization
insertions.
possible,
testing
When
ORGANIZATIONS
results that
67%
show
that
two-way
splitting results
random
the experiment
space utilization
by Davis
FILE
85% when
redistri-
8.15
B* Trees
and amplification of work on B-trees
In his review
Knuth (1973b)
in 1973,
new
He calls
afi* tree.
Consider
system
in
which we
rules
are postponing
splitting
we
through
are considering
any page other than the root, we know that when it finally is time to split,
the page has at least one sibling that is also full. This opens up the possibility
of a two-to-three split rather than the usual one-to-two or two-way split.
Figure 8.30 illustrates such a
The important
split.
is
that
that are each about two-thirds full rather than just half
possible to define a
new
B*
it
results in pages
full.
tree,
This makes it
which has the
following properties:
maximum
1.
2.
the root
of m descendents.
and the leaves has
at least
(2m
l)/3 de-
scendents.
3.
The
4.
5.
6.
A
A
root has
at least
it is
a leaf).
keys.
The
critical
changes between
this se t
affects
373
Original tree:
A C D F
H K
T V X
Two-to-three-split:
H K M
key B.
C D
T V X
split.
To impleme nt B * tree proced ures, o ne must also deal with the question
of sj>h tting. the root, which, bv definition, never has a sibling. If there is no
sibling, no two-to-three split is possible. Knu th suggests allowing the roo t
grow to a size larger than the other pages so. when it does split, it can
producetwo pages that are each about tw o-thirds full. This sugges tion has
to
However,
all
it
pa,g e s
level
adhere to
B*
8.16
We
have seen
very
that,
efficient, flexible
ties after
its
balanced proper-
374
few disk
we have
so
FILE
accesses.
far,
full
the structural
ORGANIZATIONS
at all
from pages
mean
that
we
B-tree has
than that.
we
we
cannot hold
we
we have
it
ALL
of an index in
RAM does
there.
megabyte of
RAM
of
for
any given time. Given a page size of 4 K, holding around
64 keys per page, our B-tree can be contained in three levels. We can reach
any one of our keys in no more than three disk accesses. That is certainly
acceptable, but why should we settle for this kind of performance? Why not
try to find a way to bring the average number of disk accesses per search
down to one disk access or less?
Thinking of the problem strictly in terms of physical storage structures,
retrieval averaging one disk access or less sounds impossible. But,
remember, our objective was to find a way to manage our megabyte of
index within 256 K of RAM, not within the 4 K required to hold a single
page of our tree.
We know that every search through the tree requires access to the root
page. Rather than accessing the root page again and again at the start of
and just keep it there.
every search, we could read the root page into
requirement from 4 K to 8 K, since we
This strategy increases our
need 4 K for the root and 4 K for whatever other page we read in, but this
is still much less than the 256 K that are available. This very simple strategy
reduces our worst-case search to two disk accesses, and the average search
index storage
at
RAM
RAM
to
first level
This simple,
an important,
ho 1rl
cnt1 ir
number of B-tree
more
RAM, we
pages, perhaps
can
5, 1CL
or
more. As we read pages in from the disk in response to user requests, we fill
if we
"up trie buffer. Then, when a page is requested, we access it from
read
we
then
in
RAM,
can, thereby avoiding a disk access. If the page is not
RAM
375
into the buffer from secondary storage, replacing one of the pages that
was previously there. A B-tree that uses a RAM buffer in this way is
sometimes referred to as a virtual B-tree.
it
is
is
We
2.
It
is
called
faults:
was once
in the buffer
new
page.
The first cause of page faults is unavoidable: If we have not yet read in
and used a page, there is no way it can already be in the buffer. But the
second cause is one we can try to minimize through buffer management.
The critical management decision arises when we need to read a new page
into a buffer that is already full: Which page do we decide to replace?
One common approach is to replace the page that was least recently
used; this
is
called
LRU
page
is
was
always read in
first,
which
different
is
from
is
LRU
method keeps track of the actual requests for pages. Since the root is
requested on every search, it seldom, if ever, is selected for replacement.
The page to be replaced is the one that has gone the longest time without a
request for use.
Some
research
number of pages
by Webster
that
(1980)
shows the
TABLE
8.1
Effect of using
a simple
of increasing the
3.00
1.71
LRU
LRU replacement
Buffer Count
effect
strategy
10
1.42
20
0.97
376
LRU
using a simple
FILE
It lists
ORGANIZATIONS
the average
numbers of page
number of disk
accesses per
height.
B+
as
an illustration of the
15% of
the tree in
RAM (20 pages out of the total 140) reduces the average number of accesses
per search to less than one.
Note
that
all
The
results are
decision to use
the
we
LRU
replacement
is
based on the
recently than
how
it is
we
are
are to
is
another,
more
direct
way
of the
Our
Always retain
larger amount
of buffer space, it might be possible to retain not only the root, but also all
of the pages at the second level of a tree.
Let's explore this notion by returning to a previous example in which
and a 1-megabyte index. Since our page
we have access to 256 K of
size is 4 K, we could build a buffer area that holds 64 pages within the
area. Assume that our 1 megabyte worth of index requires around 1.2
megabytes of storage on disk (storage utilization = 83%). Given the 4 K
page size, this 1.2 megabytes requires slightly more than 300 pages. We
RAM
RAM
377
assume
It
that,
followed by 9 or 10 pages
level,
Using
at
page
a single
all
at the
root
the remaining
remaining buffer slots are used to hold leaf-level pages. Decisions about
which of these pages to replace can be handled through an LRU strategy.
For many searches, all of the pages required are already in the buffer; the
search requires no disk accesses.
to a
number
that
is less
It is
it is
than one.
when
it
comes time
Augmenting the LRU
to
to
keep
from 1.42
8.16.3 Importance
It is
difficult to
scheme
down
the
buffers.
10-page buffer,
of Virtual B-Trees
page buffering
the
in
we have
secondary storage. As
amount of memory
is itself a
way
We
easy to
fall
into the
sufficient solution to
must be maintained on
that
emphasized, to
it is
fall
to reduce the
is
to lose
amount of memory
to the
8.17
Placement
of Information
we
itself,
setting aside
any
We
paraphrased Bayer and McCreight and stated that "the associated information
is
of no further interest."
of interest. Rarely do
we ever want
is,
378
themselves.
we
ORGANIZATIONS
FILE
It
want
really
is
file:
number
we
The
couple
reduced,
is
found, no
and the
tree
more
first
tends
become
to
since
taller
there
fewer
are
descendents from each page. So, the advantage of the second method
that,
is
we need
to index
is
key and
if
Given
a B-tree
we
a
its
store
pointer
512 bytes available for keys and associated information, the two fundamental storage alternatives translate into the following orders of B-trees:
Information stored with key: four keys per page
order
order 33
worst-case
developed
for
finding
w/key)
d(info elsewhere)
we
and
tree.
depth
of B-trees
earlier:
d (in fo
So, if
the
five tree;
** +
500.5
6.66
k)g 17 500. 5
3.
log3
19
store the information with the keys, the tree has a worst-case
depth of six
second method
record in the worst case.
access, the
a
still
disk
find
379
general,
In
then,
where
key/record
tion
8.18
from
pair,
it is
it
in a separate
file.
many
One way
this.
information in
to
a
handle
separate,
this
lists
variability
is
to
variable-length record
to allow a variable
Up
example of
are an excellent
place
associated
the
would
file;
the B-tree
file.
Another approach
to this point
legally hold.
keys,
we might
as
much
implementing
a structure
in a
internal fragmentation. If
we
fields
larger
fewer
levels.
Accommodating
As we saw
with variable-length
more keys
it
in earlier
can allow us
does
in a page, then
away with
we have
a tree
with
380
FILE
ORGANIZATIONS
The
idea
is
that
we want
promoted upward
to
in
level.
SUMMARY
We begin this chapter by picking up the problem we left unsolved at the end
of Chapter
6:
RAM memory,
that
work
well
if
secondary storage
is
most evident
in
two
if
We
first
it
can be kept
in order
without sorting.
we need
a balanced tree to
after repeated
random
insertions.
We
see that
to
do
AVL
this,
trees
discovering that
provide
way of
amount of overhead.
Next we turn to the problem of reducing the number of disk accesses
required to search a tree. The solution to this problem involves dividing the
of the
tree can
be retrieved with
SUMMARY
work on
B-trees,
Our
splitting,
The formal
Once
we
work
when
begin
We
find
full
Trees using
Next we turn
to the matter
this
B*
combination of
trees.
a virtual B-tree.
fit
secondary storage.
then
We
we
memory do
If we hold pages
into
rather
full
Indexes that
entirely
can save the expense of reading these pages in from the disk again.
method
from
RAM,
which pages
One
to keep.
Keeping the root has the highest priority, the root's descendents have the
The second method for selecting pages to keep in
RAM is based on recentness of use: We always replace the least-recentlyused (LRU) page, retaining the pages used most recently. We see that it is
possible to combine these methods, and that doing so can result in the
ability to find keys while using an average of less than one disk access per
next priority, and so on.
search.
We
it
is
attractive
is
the
it
it is
file.
tree,
often advantageous to
381
382
We
records within the pages of a B-tree, noting that significant savings in space
in the height
variable-length records.
The modification of
many
variations
of the
of the
on B-trees
tions.
KEY TERMS
AVL
2.
at
least
|~w/2l
descendents.
3.
The
4.
5.
6.
A
A
root has
at least
it is
page contains
at least
[ ml2~\
a leaf).
keys.
keys.
pages always
upward from
new
lies in
overly long branches); they are shallow (requiring few seeks); they
accommodate random
at least
50%
low
cost
storage
utilization.
B*
B*
special B-tree in
trees generally
Height-balanced tree.
each node there is a
tree structure
limit to the
with
a special
property: For
amount of difference
that
is
allowed
EXERCISES
among the heights of any of the node's subtrees. An HB(k) tree allows subtrees to be k levels out of balance. (See AVL tree.)
Leaf of a B-tree. A page at the lowest level in a B-tree. All leaves in a
B-tree occur at the same level.
Order of a B-tree. The maximum number of descendents that a node
in the B-tree
can have.
few disk
ac-
cesses.
into a
RAM
RAM
EXERCISES
1.
part or
all
when
RAM-based
they
questions should help bring these drawbacks into focus, and thus reinforce
the need for an alternative structure such as the B-tree.
a.
There are two major problems with using binary search to search
simple sorted index on secondary storage: The number of disk ac-
cesses
is
larger than
index sorted
is
we would
substantial.
like;
it
binary
383
384
b.
Why
c.
In
is it
what way
tree?
d.
a file
completely
full,
what
maximum number
the
is
the tree
is
paged
in the
manner
in a
not paged,
key?
If
with each
page able to hold 15 keys and to branch to 16 new pages, what is the
maximum number of accesses required to find a key? If the page size
is increased to hold 511 keys with branches to 512 nodes, how does
the maximum number of accesses change?
e. Consider the problem of balancing the three-key-per-page tree in
Fig. 8.13
Why
is it
difficult to create a
more
g.
Although B-trees
downward from
3.
Why
from an
is
a leaf
still
commonly
this so?
node of a B-tree.
How
does
a leaf
internal node?
the top.
contains at least
f.
node
512 keys),
it
might be possible
to use the
Why? What
are
time?
4.
sets
Show
from loading
the following
of keys in order.
a.
b.
c.
d.
C GJ X
CGJXNSUOAEBHI
CGJXNSUOAEBHIF
CGJXNSUOAEBHIFKLQRTVUWZ
ASCII code
for Z.)
Draw
385
EXERCISES
6.
Given
What is
What is
a.
b.
the
the
maximum number
minimum number
of descendents from
of descendents from
page?
page (ex-
d.
What
What
e.
How many
c.
is
the
is
the
the root?
a leaf?
dents?
f.
What
the
is
maximum
if
it
contains 100,000
keys?
Using
method
similar
records,
a.
Retrieve
b.
Add
c.
Delete
d.
Retrieve
Assume
arrived at
9.
Show
deleted
record;
a record;
a
record; and
all
page buffering
your answer.
that
is
file
in sorted order.
how you
is
five.
D H
A B C
r
F
K L
N O
\v
386
10.
A common
unless
11.
100%
it is
full.
Discuss
FILE
ORGANIZATIONS
is
grow deeper
this.
key from
node
in a B-tree.
You
look
at
the right sibling and find that redistribution does not work; concatenation
would be
12.
What
Do you
it
introduce?
compare with
13.
What
is
difference
the
is
improvement does
does
You
necessary.
option here.
that
B*
between
tree offer
over
B*
How
can
it
an
a B-tree,
How
a virtual
is
keys in
15.
We
noted
that,
it is
a separate file.
possible to optimize a
Programming Exercises
16.
at the
traversal
of the
tree
shown
recursive
a
B-tree
in Fig. 8.18:
(((A,B,C)D(E,F,G)H(I,J)K(L,M))N((0,P)Q(R)S(T,U,V)W(X,Y,Z)))
17.
it
program
key
is
18.
Write
19.
delete kevs
from
a B-tree.
not very
efficient.
Rewrite
in a B-tree.
a
FURTHER READINGS
program
characters.
Write
a data file in
which
FURTHER READINGS
Currently available textbooks on
discussions
on
B-trees.
access to B-tree.
Uses of B-trees for secondary key access are covered in many of the previously
cited references. There is also a growing literature on multidimensional dynamic
indexes, including a B-tree- like structure called a k-d B-tree. K-d B-trees are
387
388
described in papers by Ouskel and Scheuermann (1981) and Robinson (1981). Other
tries and grid files. Tries are
and data structures, including Knuth (1973b) and
Loomis (1983). Grid files are covered thoroughly in Nievergelt et al. (1984).
An interesting early paper on the use of dynamic tree structures for processing
files is "The Use of Tree Structures for Processing Files," by Sussenguth (1963).
Wagner (1973) and Keehn and Lacy (1974) examine the index design considerations
that led to the development of VSAM. VSAM uses an index structure very similar
to a B-tree, but appears to have been developed independently of Bayer and
McCreight's work. Readers interested in learning more about AVL trees will find a
good, approachable discussion of the algorithms associated with these trees in
covered
in
many
Standish (1980).
tree operations
texts
on
files
Knuth (1973b)
and properties.
takes a
more
AVL
C Programs
to Insert
Keys
389
into a B-Tree
The C program that follows implements the insert program described in the
text. The only difference between this program and the one in the text is
that this program builds a B-tree of order five, whereas the one in the text
builds a B-tree of order four. Input characters are taken from standard I/O,
with
q indicating
end of
The program
data.
from
driver, c
insert, c
Contains
several
files:
program
insertf),
it,
and supervises
splitting
and promotions.
btio.c
btio.c.
Contains the
btutil.c
split ()
All the
/*
rest
file
called bt.h.
bt.h.
header file for btree programs
.
*/
#def
#def
#def
#def
#def
#def
MAXKEYS
MINKEYS
ne
ine
i ne
i ne
ine
ine
i
NIL
NDKEY
MAXKEYS/2
(-1
/@/
NO
YES
typedef s t rue t {
shor t keycoun t;
char
keyCMAXKEYS]
shor t childEMAXKEYS+1
>
BTPAGE
;
/*
/*
/*
*/
* /
the ac t ua 1 keys
ptrs to rrns of descendants*/
(continued)
390
#define PAGESIZE
FILE
ORGANIZATIONS
zeof (BTPAGE
key);
Driver.c
/*
driver.c...
Driver for btree tests:
Opens or creates b-tree file.
Gets next key and calls insert to insert
If necessary, creates a new root.
key in tr
*/
#include <stdio.h>
^include "bt .h"
ma i n(
promoted; /* boolean:
int
/*
/*
if
e
(btopenO)
root
get root ()
root
c r
5e
ea t e_t
/*
/*
if
ee(
root
*/
it
*/
391
B-TREE: INSERT.C
>
btclose();
>
Insert.
/ *
insert .c.
...
rrn:
*pr omo_r_ch
key:
i 1
*promo_key:
*/
char *promo_key)
/* current page
BTPAGE page,
/* new page created
newpage;
int found, promoted; /* boolean values
short
pos
char
p_b_rrn;
p_b_key;
if
*/
if
split occurs
*/
*/
/*
/*
*/
*/
(rrn == NIL)
>
btread(rrn, &page);
found = search_node( k ey
&page, &pos);
if (found) {
printfC Error: attempt to insert duplicate key: %c \n\007", key);
return (0);
,
>
(continued)
392
>
else
i t(p_b_key
p_b_rrn &page ,promo_key promo_r_chi
btwrite(rrn, &page);
btwrite( *promo_r_chi Id, &newpage);
return (YES);
/* promotion */
spl
Id
Anewpage);
>
Btio.c
/*
btio.c...
Contains btree functions that directly involve file i/o:
^include "stdio.h"
#include "bt .h"
^include "fileio.h"
int
btfd;
bt open(
/*
btclose(
close(btfd)
>
( )
*/
short root
long lseekC
B-TREE: BTIO.C
393
exitd
return (root);
>
put root
shor
root)
lseekCbtfd,
teCbtf d
wr
OL,
0);
&root
2)
char key
creatC"btree.dat M ,PMODE);
/* Have to close and reopen to insure
closeCbtfd);
/* read/write access on many systems.
btopenC);
*/
/* Get first key.
key = getcharC);
return ( c r ea t e_r oo t ( k ey NIL, NIL));
btfd
long
lseekC), addr;
>
btwr
i t
eC shor
rrn,
BTPAGE *page_ptr)
>
);
*/
*/
394
Btutil.c
btut ll.c...
/*
--
c r
ea
pageinitO
( )
--
get and
put
*/
'include "bt.h"
c r
ea
short
left,
short
right)
BTPAGE page;
short rrn;
rrn = getpageC);
page i n i t ( &page )
page.key[01 = key;
page.childtO] = left;
=
right;
page ch i 1 d
page.keycount = 1;
btwriteCrrn Apage)
putroot(rrn)
return(rrn)
;
[ 1
page i
i t
(BTPAGE *p_page)
/*
p_page: pointer to
page
*/
int
for
))
<
MAXKEYS;
(j = 0;
= NDKEY;
p_page->key[
=
NIL;
p_page- >ch i Id
j
>
p_page->childtMAXKEYS]
NIL;
int
for
(i
0;
*po5
<
&&
key
>
p_page- > k ey
i 3
if
395
B-TREE: BTUTIL.C
> k
ey *pos
[
else
return (NO);
key
/ *
not
is
page */
in
short
key,
{
i
for
=
p_page- > k eycoun t
key < p_page=
p_page- > k ey i
p_page - > k ey i =
p_page- >ch i 1 d i +
p_page- >c h i 1 d i
(1
>
ey
&&
>
0;
i--)
>
p_page- >keycount++
=
p_page- > k ey i
key;
=
r_child;
p_page- >ch i 1 d i +
;
/* split ()
Argument s
inserted
promoted up from here
to be inserted
promoted up from here
pointer to old page structure
pointer to new page structure
key to be
key to be
child rrn
rrn to be
key:
promo key:
r_child:
promo r child:
p_oldpage:
p_newpage:
*/
{
i
short mid;
char
wor k k eys [MAXKEYS+
short wor k c h MAX KE YS+ 2
t
*/
tells where split is to occur
*/
temporarily holds keys, before split
/* temporarily holds children, before split*/
/*
1
/ *
/*
&&
for
/*
[
i ]
*/
*/
>
l
[
>
0;
- - ) { / *
>
workkeystil
workchti+1]
key;
r_child;
*promo_r_chi Id = getpageC);
i n i t
p_newpage )
(
page
/*
/*
*/
*/
(continued)
396
FILE
ORGANIZATIONS
for
*/
*/
*/
*/
*/
*/
397
The
Pascal
program
that follows
in the text.
The only
difference
{$B->
{$1
{$1
btutil.prc}
insert .pre}
The $B
as a
instructs the
standard Pascal
The
Turbo
file.
files btutil.prc
and
in the
driver. pas
driver
program described
parallels the
in the text.
insert. pre
btutil.prc
Contains
all
Driver.pas
PROGRAM btree
NPUT OUTPUT)
,
398
FILE
ORGANIZATIONS
{$B->
CONST
MAXKEYS = 4;
MAXCHLD = 5;
MAXWKEYS = 5;
MAXWCHLD = 6;
NOKEY = '@'
NO = FALSE;
YES = TRUE;
NULL = -1
TYPE
BTPAGE = RECORD
keycount
integer;
{number of keys in page
}
key
array [1.. MAXKEYS] of char;
{the actual keys
}
child
array
MAXCHLD of integer; {ptrs to RRNs of descendents}
:
END;
VAR
promoted
boo 1 ean
oo t
pr omo_r
integer
promo_k ey
btfd
char
file of BTPAGE
MINKEYS
PAGESIZE
integer
i nt eger
{$1
{$1
ey
btutil.prc}
insert. pre}
BEGIN {main}
MINKEYS
PAGESIZE
if
MAXKEYS DIV 2;
sizeof (BTPAGE)
{try to open btree.dat and get root}
btopen then
root
root
:=
get root
{if btree.dat
else
ead( k ey )
create_tree
q'
DO
not
there,
create it}
399
promoted
if
i nser t ( roo t
k ey
promo_r rn promo_k ey )
then
c r eat e_root ( promo_k ey root promo_r rn
:=
pr omo ted
root
ead( key)
END;
:=
btclose
END
Insert.prc
FUNCTION insert (rrn: integer;key: char;VAR pr omo_r_ch i
VAR promo_key: char): boolean;
Function to insert
integer;
VAR
{current page
{new page created if split occurs
{tells if key is already in B-tree
{tells if key is promoted
{position that key is to go in
{RRN promoted from below
{key promoted from below
page,
newpage
found
promoted
BTPAGE;
pos
b_rrn
p_b_key
boolean
i n t eger
char
GIN
{past bottom of tree... "promote"
(rrn = NULL) th
{original key so that it will be
BEGIN
{inserted at leaf level
promo key := key;
promo r child := NULL;
insert := YES
END
else
BEGIN
btreadCrrn ,page)
found := search n ode ( k ey page pos )
if (found ) then
BEGIN
key);
attempt to insert duplicate key:
wr telnC Error
insert := NO
END
if
'
}
>
(continued)
400
else
BEGIN
promoted := 1 nser t ( page ch 1 1 d pos
k ey
p_b_r rn p_b_k ey )
if (NOT promoted) then
insert := ND
{no promotion}
else
BEGIN
if (page, keycount < MAXKEYS) then
BEGIN
{OK to insert key
p_b_rrn page ) {and pointer in this
i ns_i n_page( p_b_key
btwrite(rrn,page);
{page.
insert := NO
{no promotion}
END
else
BEGIN
spl i t ( p_b_k ey p_b_r r n page promo_k ey
promo_r_ch ild,newpage);
btwrite(rrn,page)
btwrite( promo_r ch ild,newpage);
insert := YES
{promotion}
END
END
END
]
END
END:
Btutil.prc
FUNCTION btopen
BOOLEAN;
{Function to open "btree.dat"
it returns false}
:
if
it
VAR
response
char;
BEGIN
assign(btfd, 'btree.dat );
write('Does btree.dat already exist? (respond
readln(response)
writeln;
if (response = 'Y') OR (response = 'y') then
BEGIN
reset(btfd)
btopen := TRUE
END
else
btopen := FALSE
:
'
END;
or
N):
*);
}
}
401
PROCEDURE btclose;
{Procedure to close "btree.dat"}
BEGIN
close (btfd);
END;
FUNCTION getroot
integer;
{Function to get the RRN of the root node from first record of btree.dat)
:
VAR
root
BTPAGE;
BEGIN
seek(btfd,0);
if (not EOF) then
BEGIN
r ead( btfd, root);
getroot := r oo t
eycoun t
END
else
wr i t e 1 n( Er r or
Unable to get root.')
:
'
END;
FUNCTION getpage
integer;
{Function that gets the next available block in "btree.dat" for
BEGIN
getpage := f i 1 es i ze( b t f d )
:
new page)
END;
VAR
j
eger
BEGIN
for
:=
to MAXKEYS
DO
BEGIN
:= NOKEY;
p_page. key[
:= NULL;
p_page ch i 1 d
1
END;
p_page.child[MAXKEYS+1
:=
NULL
END;
>
VAR
rootrrn
BTPAGE;
BEGIN
seek(btf d,0)
rootrrn. keycount
:
:=
root;
(continued)
402
FILE
ORGANIZATIONS
pageinit (rootrrn);
write(btfd,rootrrn)
END;
BTPAGE)
END;
BTPAGE)
END;
btwrite(rrn,page)
putroot(rrn)
create_root := rrn
;
END;
integer;
FUNCTION create_tree
{creates "btree.dat" and the root node}
VAR
rootrrn
int eger
BEGIN
:
rewnteCbtf d)
r
ead( key);
END;
( k
ey NULL NULL
,
integer;
403
In
VAR
1
eger
BEGIN
i
:=
while ((i
l
pos
<=
page
eye oun t
AND (key
>
p_page
ey
l ] )
DO
AND (key
p_page
ey pos
[
) )
then
END;
Id:
BTPAGE);
VAR
i
integer;
BEGIN
:= p_page
i
eycount
while ((key < p_page
BEGIN
:=
p_page k ey i
p_page ch i 1 d i +
1;
ey
l
[
page
:=
ey
AND
i
[
p_page ch i
.
>
1)) DO
l ]
END;
p
page
eye oun t
p_page k ey
p_page ch i 1
[
p_page
:=
eye oun
key;
:=
r_child
:=
END;
{split node by creating new node and moving half of keys to new node.
Promote middle key and RRN of new node.)
VAR
i
integer;
{temporarily holds keys,}
of char;
workkeys
array
MA XNKE YS
{
before split}
of integer; {temporarily holds children
workch
MA XNCHLD
array
before split }
{
:
(continued)
404
ORGANIZATIONS
FILE
BEGIN
:=
to MAXKEYS
DO
1
BEGIN
workkeyslil := p_o 1 dpage k ey
workchEi] := p_o dpage ch i 1 d
for
}
}
END;
workchCMAXKEYS+1
:=
p_o 1 dpage ch
.
i 1
i
MAXKEYS
=
while ((key < wor k k ey s i )
AND
BEGIN
workkeystil := wor k k ey 5 i workchEi+1] := workchtil;
:
>
:=
1)) DO
}
key;
r_child;
i 1
page
omo_r_ch d := getpage;
i n i t
p_newpage )
(
for i
TO MINKEYS DO
=
pr
workchti+11
END;
wor k k ey s
MAXKEYS-
BEGIN
:= workkeystil;
p_o 1 dpage k ey[ i
:= workchtil;
p_o 1 dpage ch i 1 d i
:= wor k k ey s i + + M NKEYS
p_newpage k ey i
:= wor k ch i + + M NKEYS
p_newpage. child! i
p_oldpage. key[ i+MINKEYSl
{mark second half of old
=
NOKEY;
p_oldpage.child[ i+1 +MINKEYS] := NULL
<page as empty
]
>
>
>
>
>
>
>
END;
p_oldpage.child[MINKEYS+1
if
wor k ch M NKEYS+
:=
odd(MAXKEYS)
t
hen beg i
:= wor k k ey s MAXNKEYS
p_newpage. k ey M NKEYS+
p_newpage.child[MINKEYS+2] := wor k ch MAXNCHLD
:= wor k ch MA XWCHLD=
p_newpage.chi ld[MINKEYS+1
[
end
else
:= wor k ch MA XNCHLD
p_newpage.chi 1 d M NKEYS+
p_newpage. keycount := MAXKEYS - MINKEYS;
p_oldpage. keycount := MINKEYS;
{promote middle key
promo_key := wor k k eys M NKE Y S+1]
1
END;
>
4
The B Tree Family
Access
CHAPTER OBJECTIVES
Introduce indexed sequential
files.
Show how
sequence
an index
set to
structure.
B+
trees.
Illustrate
fix
B+
how
tree can
variable
number of separators.
Compare
B+
405
CHAPTER OUTLINE
9.1
9.2
9.6.2
Blocks
Adding
9.5
The Simple
9.6
Simple Prefix
Maintenance
9.6.1
9.8
9.9
9.10
B+
Prefix
B+
Sequence Set
Blocks:
in the
9.7
Sequence Set
9.4
Loading
Trees
9.11 B-Trees,
Tree
Variable-order B-Tree
Prefix
B"
B+
Trees in Perspective
Tree
Changes Localized
to Single
9.1
The
Indexed:
file
file
can be seen
as a set
between two
of records that
is
alternative
indexed
by key;
or
Sequential:
The
ous records
The
file
no seeking),
method
these views
is
new
one.
Up
to this point
we
by
separate B-tree. This structure can provide excellent indexed access to any
suppose that
we
also
want
cosequential processing
to use this
we want
file as
to retrieve
file
Now let's
system are
by key.
407
so much so
any situation
On
in
is
that
a
it is
unacceptable
frequent occurrence.
show
us that a
file
processing,
an unacceptable structure
is
by key
delete records
What
in
random
when we want
to access, insert,
and
order.
if
keyed access
batch processing,
when
when
amount of
during
both batch processing of
charge slips and interactive checks of account status. Indexed sequential
access methods were developed in response to these kinds of needs.
as
9.2
We set aside,
for the
moment,
as
a
a set
We
refer to this
We
can immediately rule out the idea of sorting and resorting the entire
sequence
entire
set as
the changes.
deletion
One of the
to just
part
best
ways
When we
We
the buffers
4:
we know
instead to find a
of the sequence
We need
an expensive process.
file is
We
set
that sorting an
way
of an insertion or
involves
tool
we
first
blocks.
block records, the block becomes the basic unit of input and
at once. Consequently, the size of
we
use in
After reading in
program
block,
all
is
the records in
block are in
RAM,
to localize
in a block.
We
where we
us keep a
last
name
also include
each block that point to the preceding block and the following
408
block.
We need
FILE
ACCESS
you
As with B-trees, the insertion of new records into a block can cause the
block to overflow. The overflow condition can be handled by a blocksplitting process that is analogous to, but not the same as, the blocksplitting process used in a B-tree. For example, Fig. 9.1(a) shows what our
blocked sequence set looks like before any insertions or deletions take place.
We show only the forward links. In Fig. 9.1(b) we have inserted a new
record with the key CARTER. This insertion causes block 2 to split. The
Note
we
is
found
in
split.
encountered in B-trees. In
of
Here things are simpler: We just divide the records between two
blocks and rearrange the links so we can still move through the file in order
by key, block after block.
Deletion of records can cause a block to be less than half full and
therefore to underflow. Once again, this problem and its solutions are
analogous to what we encounter when working with B-trees. Underflow in
record.
neighboring node
is
two
solutions:
we
If the
make
full,
we
the distribution
can
redistribute
more
nearly
even.
the
set is
and there are therefore no keys and records in a parent node. In Fig. 9.1(c)
we show the effects of deleting the record for DAVIS. Block 4 underflows
and is then concatenated with its successor in logical sequence, which is
block 3. The concatenation process frees up block 3 for reuse. We do not
show an example in which underflow leads to redistribution, rather than
concatenation, since it is easy to see how the redistribution process works.
Records are simply moved between logically adjacent blocks.
Given the separation of records into blocks, along with these fundamental block-splitting, concatenation, and redistribution operations, we
can keep a sequence set in order by key without ever having to sort the
entire set
free;
Once
made, our file takes up more space than an unof sorted records because of internal fragmentation
insertions are
blocked
file
Block
ADAMS
Block 2
w BYNUM
Block 3
W DENVER
BAIRD
fc
ELLIS
BIXBY
CARSON
COLE
BOONE
DAVIS
(a)
Block
ADAMS
Block 2
fc
w
BYNUM
DENVER
Block 3
Block 4
fe
w
COLE
BAIRD
ELLIS
DAVIS
BIXBY
CARSON
BOONE
CARTER
(b)
Block
ADAMS
fe
Block 2
BYNUM
BAIRD
BIXBY
CARSON
CARTER
BOONE
Block 3
Available
for reuse
Block 4
w COLE
DENVER
ELLIS
(c)
FIGURE 9.1
409
41
FILE
ACCESS
This
last
is
within
a block.
block
size.
set,
block
is
When we read data from the disk, we never read less than a
block; when we write data, we always write at least one block. A block is
also, as we have said, the maximum guaranteed extent of physical
sequentiality. It follows that we should think in terms of large blocks, with
operations.
many
on block
size:
Why
not
make
RAM
our
limits
first
becomes
sort
on
a file:
Consideration
We
The block
1:
RAM
RAM
RAM
a time.
Although we
sequence
are
presently
set sequentially
randomly accessing
in an entire
we
a single
block to get
at
Consideration
2:
Reading
in or writing out a
if
we had
41
We
very long?
knowledge of
not
interested in a
so
block because
And where is
When we discussed
it
is
it
a sensible
is
still
that
it
We
are
at
that?
cluster.
uses
cluster
up eight
guarantees a
As we move from
one:
the
is
minimum number of
long
adjacency.
clustering
How
The block
let's
imprecise:
(redefined):
is
a little
Consideration 2
This
more than
is
sectors
on the
minimum amount
disk.
file
we
sectors
containing
The reason
for
of physical sequentiality.
file,
we may
incur a disk
without seeking.
block size, then, is to make
One reasonable
want
to hold in
a cluster.
RAM
RAM at once. As
is
a guess.
you
that allows
point
is
to
revise this
9.3
and other
factors.
Sequence Set
to the
We
have created
access
them
mechanism
sequentially in order
by key.
It is
we
can
the records into blocks and then maintaining the blocks, as records are
412
12
ADAMS-BERNEE
BOLEN-CAGE
CAMP-DUTTON
ACCESS
FILE
\
/
EMBRY-EVANS
FABER-FOLK Ni FOLKS-GADDIS
each block.
in
We
actually read
it is
a particular record,
a range
of records,
as
know
we
the record
are seeking.
We
key
can
see,
BURNS, we
want
to retrieve
second block.
It is
these
easy to see
blocks.
We
how we
could construct
might choose,
for
a simple, single-level
example,
to
last
build
index for
an index of
we
consult the index and then retrieve the correct block; if we need
we start at the first block and read through the linked list
we have read them all. As simple as this approach is, it is in
sequential access
of blocks until
fact a very workable one
as
long
RAM
as the entire
we
is
for the
sequence
Key
Block number
BERNE
CAGE
DUTTON
EVANS
FOLK
GADDIS
4
5
by means of
set illustrated
that
in
we
RAM
discussed in Chapter
6,
Fig. 9.2.
is
41 3
many
As
we saw
in the
seeks if the
if
file is
on
RAM,
it
requires too
the blocks in the sequence set are changed through splitting, con-
Updating
dex
and contained
RAM.
works well
if
the in-
however, the updating requires seeking to individual index records on disk, the prorelatively small
is
cess can
in
If,
this
is
we
point
discussed
What do we
found
we
blocks
in
that
we
contains so
into
RAM?
many
much
we
like the
RAM at a time.
file
file
fit
More
specifically,
we found
we might
fit
entirely in
RAM.
B-tree.
The use of
which
is
appropriate since
is,
known
it is
we need
to
The purpose of
the index
we
The index
Keys
is to assist us when we
The index must guide us to
are building
all.
tree,
set at
B+
The Content
in fact, a
as a
it is
9.4
is
it
can
assist
are
the
sequence
set.
We are
us in getting
to the correct block in the sequence set; the index set does not itself contain
answers,
it
Given this view of the index set as a roadmap, we can take the very
important step of recognizing that we do not need to have actual keys in the
index set. Our real need is for separators. Figure 9.4 shows one possible set of
separators for the sequence set in Fig. 9.2.
Note
many
between two blocks. For example, all of the strings shown between blocks
3 and 4 in Fig. 9.5 are capable of guiding us in our choice between the blocks
as we search for a particular key. If a string comparison between the key and
414
ACCESS
CAM
BO
Separators:
FILE
12
ADAMS-BERNE
FOLKS
"7
CAMP-DUTTON
3
in
any of these separators shows that the key precedes the separator, we look
for the key in block 3. If the key follows the separator, we look in block 4.
If
we
by placing the
as the separator to
about
how
to
we
4.
we
Note
use
that
is
not always
function
Note
shown
in Fig. 9.6
blocks 5 and
6,
follows that, as
must decide
one that
is
by using
in the Pascal
the logic
procedure
embodied
in the
produce
a separator that
to the left
Relation of Search
Key <
Key =
separator
Key >
separator
list
and
FIGURE 9.5 A
this later),
do
there
BO
as
talk
Key and
Separator
separator
Decision
Go
Go
Go
left
right
right
of potential separators.
DUTU
CAMP-DUTTON
DVXGHESJF
DZ
E
EMBRY-EVANS
EBQX
ELEEMOSYNARY
^-n
41 5
...
V
f ind_sep
sep[];
while
(*sep++
*sep='\0';
*key2++)
== *keyl++)
>
VAR
i, minlgth
integer;
BEGIN
minlgth := min( len_str (keyl
:
:=
len_str(key2)
1;
(i
<= minlgth) DO
:=
END;
sepCi]
sepCO]
END;
:=
:=
key2Ci];
CHR(i)
41
FILE
ACCESS
E
I
Index
9.5
The Simple
for the
Prefix B
sequence
how we
set,
tree.
Tree
index
set.
called a
index
Our
They are actually just the initial letters within the keys.
More complicated (not simple) methods of creating separators from key
simply, prefixes:
prefixes
well as
discussion of prefix
Note
B+
"
1
trees.)"
branches to
N+
EMBRY, we
separator E.
retrieving the
children. If
start at the
is
we
a B-tree, a
node containing
N separators
set,
comparing
EMBRY
key
to the
on B trees and simple prefix B trees is remarkably inconsistent in the nomenclature used for these structures. B + trees are sometimes called B* trees; simple prefix
B + trees are sometimes called simple prefix B-trees. Comer's important article in Computing
Surveys in 1979 has reduced some of the confusion by providing a consistent, standard nomenclature which we use here.
""The literature
SIMPLE PREFIX B
TREE MAINTENANCE
417
to the left
9.6
Simple Prefix B
Tree Maintenance
in
suppose that we want to delete the records for EMBRY and FOLKS,
suppose that neither of these deletions results in any concatenation
or redistribution within the sequence set. Since there is no concatenation or
redistribution, the effect of these deletions on the sequence set is limited to
changes within blocks 4 and 6. The record that was formerly the second
record in block 4 (let's say that its key is ERVIN) is now the first record.
Similarly, the former second record in block 6 (we assume it has a key of
FROST) now starts that block. These changes can be seen in Fig. 9.9.
Let's
and
let's
The more
interesting question
is
what
effect,
if
have on the index set. The answer is that since the number of sequence set
blocks is unchanged, and since no records are moved between blocks, the
index set can also remain unchanged. This is easy to see in the case of the
EMBRY deletion: E is still a perfectly good separator for sequence set
blocks 3 and 4, so there is no reason to change it in the index set. The case
E
!
EMBRY
41 8
FOLKS
of the
deletion
appears both as
To
index
set.
these
two
key
more confusing
a little
is
ACCESS
FILE
and
FOLKS
we
separator,
a shorter
FOLKS
within the
as a separator
that although
we do
record
is
deleted.
FOLKS
it is
now
possible to construct
separator.
set usually
space.)
The
effect
is
much
the
same
example, that
we
by the separators
index
room
in
4,
we
set
We
set
EATON.
we
new
find that
we will insert
the
new
record
This
the
is
is
record
first
not surprising
decided to insert the record into block 4 on the basis of the existing
set.
block
since
set,
set
as the effect
is
set. It
in
sequence
set does
are stored.
B+
tree
is
actually just a
normal
B-tree, the changes to the index set are handled according to the familiar
few
separators in a
""As
we
you study
set.
to refer
back to Chapter
8,
where
SIMPLE PREFIX B
We
assume
that there
is
block
set
A new block
shown
419
TREE MAINTENANCE
Specifically, let's
in to
'
set,
block, and
brought
block. This new
(block 7)
first
in Fig. 9.9.
first
is
following block
and preceding block 2 (these are the physical block numbers). These
changes to the sequence set are illustrated in Fig. 9.10.
Note that the separator that formerly distinguished between blocks 1
and 2, the string BO, is now the separator for blocks 7 and 2. We need a
new separator, with a value of AY, to distinguish between blocks 1 and 7.
As we go to place this separator into the index set, we find that the node into
which we want to insert it, containing BO and CAM, is already full.
Consequently, insertion of the new separator causes a split and promotion,
1
BO,
is
set.
Now let's
FIGURE 9.10 An insertion into block 1 causes a split and the consequent addition of block 7.
of a block in the sequence set requires a new separator in the index set. Insertion
of the AY separator into the node containing BO and CAM causes a node split in the index set
B-tree and consequent promotion of BO to the root.
The addition
420
THE
FILE
ACCESS
FIGURE 9.1 1 A deletion from block 2 causes underflow and the consequent concatenation of
blocks 2 and 3. After the concatenation, block 3 is no longer needed and can be placed on
an avail list. Consequently, the separator CAM is no longer needed. Removing CAM from its
node in the index set forces a concatenation of index set nodes, bringing BO back down from
the root.
another concatenation,
this
that
set,
results
in
the
demotion of the BO separator from the root, bringing it back down into a
node with the AY separator. Once these changes are complete, the simple
prefix
B+
in a concatenation in the
index
block
set,
split in the
there
is
sequence
not always
this
set results in
set results
correspondence
of action. Insertions and deletions in the index set are handled as standard
B-tree operations; whether there is splitting or a simple insertion,
concatenation or a simple deletion, depends entirely on how full the index
set node is.
Writing procedures to handle these kinds of operations is a straightfor-
ward
task if you remember that the changes take place from the bottom up.
Record insertion and deletion always take place in the sequence set, since
that is where the records are. If splitting, concatenation, or redistribution is
necessary, perform the operation just as you would if there were no index set
at all. Then, after the record operations in the sequence set are complete,
make changes
If
as
blocks are
sequence
set:
set, a
new
separator
must be
in-
set;
421
If
moved from
set, a
separator
must be
and
If records are redistributed between blocks in the sequence
value of a separator in the index set must be changed.
Index
set
the index
re-
set;
set,
the
This means that node splitting and concatenation propagate up through the
set. We see this in our examples as the BO
and out of the root. Note that the operations on the
sequence set do not involve this kind of propagation. That is because the
sequence set is a linear, linked list, whereas the index set is a tree. It is easy
to lose sight of this distinction and think of an insertion or deletion in terms
+
of a single operation on the entire simple prefix B tree. This is a good way
to become confused. Remember: Insertions and deletions happen in the
sequence set since that is where the records are. Changes to the index set are
secondary; they are a byproduct of the fundamental operations on the
sequence set.
9.7
moves
in
Up
we
The block size for the sequence set is usually chosen because there is
a good fit between this block size, the characteristics of the disk
drive, and the amount of memory available. The choice of an index
set
block
fore, the
size
block
the index
size that
is
is
factors; there-
set.
A common
scheme
is
block
size
makes
it
easier to
implement
B+
buffering
The index
set
file
set
to avoid seeking
422
FILE
ACCESS
while accessing the simple prefix B tree. Use of one file for both
kinds of blocks is simpler if the block sizes are the same.
9.8
A Variable-order B-Tree
Given
a large,
separators within
how do we
set,
store the
it?
such that it can contain only a fixed number of separators. The entire
motivation behind the use of shortest separators is the possibility of packing
more of them into a node. This motivation disappears completely if the
is
index
set uses a
number of
a fixed
is
We
variable-length separators.
set
How
block
hold
to
should
we go
variable
number of
these separators? Since the blocks are probably large, any single block can
hold
a large
we want
to be able to
separators.
We
do
it
its list
can support
of
a
binary search, despite the fact that the separators are of variable length.
Chapter
In
6,
we
ences,
we
are
set
(We
more
easily
all
when we
Fie.
uppercase
letters,
concatenate them.)
as
so
you can
We
shown
could
in Fig.
9.12.
If
we
which
starts
separator
Our
in position
by looking
10.
Note
at the starting
first
that
roadmap
we perform
to help us find
binary search on
we
this
tells
us
that
"Beck"
falls
between the
separators
B+
tree,
us
downward through
423
AsBaBroCChCraDeleEdiErrFaFle
00 02 04 07 08 10 13 17 20 23 25
Index
Concatenated
to separators
separators
we want
store references to
lower
level
a relative
of the
its
We
tree.
assume
number except
If there are
that
it
from
is
analogous to
some way
it
made
to
in the next
in
terms of
a relative
record
We
need
we
middle element
in
end of
so
we
this variable-length
we need
list,
to
know how
long the
Let's suppose,
we
key
"Beck" and that the search has brought us to the index set block pictured in
Fig. 9.13. The total length of the separators and the separator count allows
Separator count
r
r
2H
AsBaBroCChCraDeleEdiErrFaFle
Separators
00 02 04 07 08 10 13 17 20 23 25
*U Index
to separators
BOO B01 B02 B03 B04 B05 B06 B07 BOS B09 B10 Bll
*U
Relative block
numbers
424
Separator
subscript:
BOO
FILE
ACCESS
01234 56789
As
B01
Ba
B02
Bro B03
Ch
B04
Edi
B08
Err
B09
Fa
10
BIO
Bll
Fie
us to find the beginning, the end, and consequently the middle of the index
to the separators.
As
in the preceding
example,
we perform
binary search
falls between the separators "Ba" and "Bro". Conceptually, the relation
between the keys and the RBNs is as illustrated in Fig. 9.14. (Why isn't this
be another index
set block,
set
block that
we
it
is
conduct our binary search within the index block and then
+
proceed to the next block in the simple prefix B tree.
There are many alternate ways to arrange the fundamental components
of this index block. (For example, would it be easier to build the block if the
vector of keys were placed at the end of the block? How would you handle
sufficient to let us
the fact that the block consists of both character and integer entities with
no
constant, fixed dividing point between them?) For our purposes here, the
have
own, including
its
own internal
and so
forth.
size increases.
efficient
With very
large blocks
way of processing
all
The second
425
TREE
prefix
B+
tree
point
is
mum
depth, that
form
the separators.
is
of
is
when
is full,
block
comparing
mum.
or half
full,
are
Decisions about
come more
The
when
such
no longer
some
as
simple matter of
fixed
determining
maximum
or mini-
complicated.
9.9
Tree
B*
tree,
we
on
something
focus
first
building
that
is
next record
we
encounter
Working from
blocks, one
fills
up.
by one,
sorted
file,
starting a
As we make
is
we
new
the transition
we need
that the
to load.
block
426
FILE
ACCESS
Next separator:
CAT
Next
CATCH-CHECK
sequence
block
set
first
is
loaded.
collect
these
RAM until
it is
full.
To
sets
the disk and one index set block that has been built in
shortest separators derived
set
RAM
from the
FIGURE 9.16 Simultaneous building of two index set levels as the sequence set continues to
grow.
CAT
00 -1
-1
1
Index block
containing no
separators
see,
427
CATCH
through
CHECK,
suppose that the index set block is now full. We write it out to disk. Now
what do we do with the separator CAT?
Clearly, we need to start a new index block. But we cannot place CAT
into another index block at the same level as the one containing the
separators ALW, ASP, and BET since we cannot have two blocks at the
same level without having a parent block. Instead, we promote the CAT
separator to a higher-level block.
means
that
RAM
we
build
in
as
we
set; it
will
now
sequence
the
set.
Figure
9.16
illustrates
set
this
RAM.)
more sequence
set
can
set?
It
is
instructive to ask
CHECK,
if
were
would stop
+
with the configuration shown in Fig. 9.16. The resulting simple prefix B
tree would contain an index set node that holds no separators. This is not an
isolated, one-time possibility. If we use this sequential loading method to
build the tree, there will be many points during the loading process at which
there is an empty or nearly empty index set node. If the index set grows to
more than two levels, this empty node problem can occur at even higher
levels of the tree, creating a potentially severe out-of-balance problem.
Clearly, these empty node and nearly empty node conditions violate the
B-tree rules that apply to the index set. However, once a tree is loaded and
goes into regular use, the very fact that a node is violating B-tree conditions
can be used to guarantee that the node will be corrected through the action
of normal B-tree maintenance operations. It is easy to write the procedures
for insertion and deletion so a redistribution procedure is invoked when an
underfull node
is
encountered.
operation
following
simple prefix
a
sort
B+
tree in this
of the records,
way,
as a
almost always
possibility
of creating
428
FILE
ACCESS
ALHASPBET
ACCESS-ALSO
set.
advantage
is
more quickly
No
as
we
proceed.
The
principal
since
the
many
passes
429
TREES
But,
more
newly loaded
in the
tree.
example presented
In the loading
in Fig. 9.16,
we
first
four sequence set blocks, then write out the index set block containing the separators for these sequence set blocks. If
file
we
for both sequence set and index set blocks, this process guaran-
tees that
quence
an index
set
set
block
starts
its
9.10
Trees
+
Our
discussions
prefix
B+
tree
up
and
to this point
a plain
B+
tree
is
the use of prefixes as separators. Instead, the separators in the index set are
which
block that
B+
is
where we
set
shown
block
tree,
in Fig.
tree.
The
operations performed on
of
a set
B+
trees.
Both B
of records arranged
in
same
as
those
B+
set,
430
FILE
ACCESS
Next separator:
CATCH
Next
CATCH-CHECK
sequence
set
first
in
block
separators.
coupled with an index set that provides rapid access to the block containing
any particular key/record combination. The only difference is that in the
simple prefix B tree we build an index set of shortest separators formed
from key prefixes.
One of the reasons behind our decision to focus first on simple prefix
B trees, rather than on the more general notion of a B + tree, is that we
want to distinguish between the role of the separators in the index set and
keys in the sequence
set. It is
much more
difficult to
make
this distinction
when
the separators are exact copies of the keys. By beginning with simple
+
prefix B
trees, we have the pedagogical advantage of working with
separators that are clearly different than the keys in the sequence set.
+
But another reason for starting with simple prefix B trees revolves
around the
plain
B+
implies that
we
can.
fact that
tree.
We
Why use anything longer than the simple prefix in the index set?
general, the
answer
to this question
is
B+
factors that
keys
might argue
good
that
not, in fact,
as a separator;
solution.
in favor
we do
of using
There
a
B+
are,
want
In
to use
consequently, simple
however,
at least
two
of
as separators:
The reason
we
have already
is
to
the use of variable-length fields within the index set blocks. For
some
and use
B-TREES, B
straightforward
B+
TREES
IN
431
PERSPECTIVE
Some key
fix
sets
method
is
as
9.1
common
features.
We
need
way
we
can
choose the most appropriate one for a given file structure job.
Before addressing this problem of differentiation, however, we should
point out that these are not the only tools in the toolbox. Because B-trees,
reliably
B+
trees,
and
it is
easy to
fall
This
a serious mistake.
is
RAM
RAM
432
B+
FILE
ACCESS
trees,
characteristics:
They
tire
are
all
RAM
at
once.
many
As
consequence,
it is
disk storage.
In
grow from
all
the
is
main-
With
all
three structures
it is
of block
splitting
scribed in Chapter
when
possible.
8.
which the most recently used blocks are held in RAM. The advantages of virtual trees were described in Chapter 8.
Any
For
of
all
some important
differences.
These
differences are brought into focus through a review of the strengths and
B-Trees
member of
each pair
file
is
is
structures.
grouped
member
as a set
is
of pairs.
One
These pairs are distributed over all the nodes of the B-tree. Consequently, we might find the information we are seeking at any level of the
+
+
B-tree. This differs from B trees and simple prefix B trees, which require
mation.
all
searches to proceed
all
the
way down
to the lowest,
sequence
set level
of
the tree. Because the B-tree itself contains the actual keys and associated
is
therefore
Given
a large
no need
B^
up
less
enough block
size
and an implementation
it is
tree.
The ordered
sequential access
is
B-TREES, B
tree.
433
PERSPECTIVE
IN
The implementation
as a
is
it
returns to the next highest level of the tree. This use of a B-tree for indexed
sequential access
is
actually stored
within the B-tree. If the B-tree merely contains pointers to records that are
in entry
workable because of
all
file,
is
not
information.
B-trees are most attractive
when
When
the key
is
it
tree.
is
only
tree
methods.
B+
that in the
set
The primary
Trees
tree
all
known
of blocks
difference
between the B
sequence
set
provided through
is
tree
contained in a linked
is
B+
of the
tree.
is
Indexed access to
is
this
B+
set.
In a
consists
The sequence
set
can be processed in
way,
The
ten
use of separators, rather than entire records, in the index set of-
means
index
set
that the
block in
number of separators
a
B+
records that
number of
a B-tree.
Sepa-
rators (copies of keys) are simply smaller than the key/record pairs
proach.
In
practice,
is
first
two advantages
advantage
is
is
often the
more
traversal
B-tree.
of
a virtual
434
Simple Prefix
using
aB
B+
ACCESS
Trees
tree instead
FILE
of a B-tree
farther.
this price is
is
to be considered
on
case-by-case basis.
SUMMARY
We
begin
this
chapter by presenting a
new problem.
In previous chapters
we
The sequence
Since
all
set
holds
all
of the
we
start
changes.
file's
The fundamental
on
file
by key.
set
file
set
still
we
encountered in
Chapter
redistribution of records
In this chapter,
sequence
set
we
blocks. There
is
no
precise
answer
we
how
large to
can give to
this
make
question
SUMMARY
RAM
or cannot read in
of
a single
Once we
a seek. In
disks) or
disk track.
and maintain
sequence
set,
enough
In general a
the size
we
we
turn to the
set. If
the index
RAM,
small
If the
index
to
fit
in
set turns
fit
in
RAM, we recommend
a
a
AB
B+
tree.
is
set,
since that
index
is
The
size
variable-length
To
separators
same
as the size
while
at
the
chosen
numbers of
435
436
searching,
we
block header
FILE
ACCESS
(RBNs)
from
the index set block. This illustrates an important general principle about
large blocks within
homogeneous
set
structure of their
We
file
structures:
They
are
own,
apart
from the
out of a
a slice
a sophisticated internal
We
we
start
a set
are:
They support
The index set
records, so
than
We
it is
a B-tree.
important
many
circumstances.
often the
The simple
is
more
this
prefix
B+
tree
compressing the
tree.
The
we must
price for
deal with
KEY TERMS
A B+
tree.
sequentially
set, is
tree consists
of
a sequence set
managed
as a B-tree.
index
EXERCISES
Index
set.
The index
set consists
corresponding to
a certain
key.
sequential access
is
not actually
file.
When
separators consists of those separators that take the least space, given
a particular
compression strategy.
from the
that can
rear
still
Simple prefix
serve as a separator.
tree.
B + tree in which the index set
B+
simple prefixes,
as
is
made up of
Variable order.
B-tree
is
in a block.
m
EXERCISES
1.
Describe
access:
(a)
file
sequential
sequential access.
access
only;
(b)
direct
access
only;
(c)
indexed
437
438
2.
A B+
tree
structure
whenever
B+
is
FILE
ACCESS
is
called for?
Consider the sequence set shown in Fig. 9.1(b). Show the sequence set
keys DOVER and EARNEST are added; then show the sequence
set after the key DAVIS is deleted. Did you use concatenation or
redistribution for handling the underflow?
3.
after the
4.
What
sequence
set? If
choice of
5.
It
block size?
is
tree-structured index.
could be used.
index?
6.
Under what
Under what
(such as an
AVL
The index
of
discussed in Chapter
a
8,
without using
conditions might
tree) rather
set
file
B+
than
tree
is
it
be reasonable to use
binary tree
Why
the
difference?
7.
How
differ
from block
splitting in the
index
set
of a simple prefix
B+
tree
set?
If the
affected?
+
Consider the simple prefix B tree shown in Fig. 9.8. Suppose a key
added to block 5 results in a split of block 5 and the consequent addition of
block 8, so blocks 5 and 8 appear as follows:
9.
FABER-FINGER
FINNEY-FOLK
3
a.
b.
What does
Suppose
that,
a deletion causes
under-
EXERCISES
5.
What does
Describe
a case in
which
10.
Why
often a
is it
good
show
the effect
it
has
on the
tree.
idea to use the same block size for the index set
+
and the sequence set in a simple prefix B tree? Why should the index
nodes and the sequence set nodes usually be kept in the same file?
11.
Show
Ab Arch
Also show
Astron
more
is
set block,
similar to the
set
one
B Bea
detailed
as illustrated in Fig.
9.13.
13.
Show how
B+
ITEMIZE-JAR
Assume
Use
room
for the
new
room
in the root.
B+
tree.
Compare
is
Assume
Suggest
criteria for
deciding
when
B-tree?
splitting, concatenation,
and
What
maximum number
16.
in
Make
a table
terms of the
simple prefix
B+
tree height,
comparing
criteria listed
B+
B-trees,
trees,
439
440
ACCESS
FILE
RRNs of data
answers based on a
some
records. In
tree's
will
cases
height or the
depend on unknown
factors,
such
specific
of access or average
as patterns
separator length.
b.
worst
c.
a tree
cases).
worst
required to delete
cases).
to process a file of n keys seassuming that each node can hold a maximum of k keys
and a minimum of k/2 keys (best and worst cases).
e. The number of accesses required to process a file of n keys sequentially, assuming that there are h + 1 node-sized buffers availd.
quentially,
able.
B+
trees.
which
is
VSAM
called
key-sequenced access
+
much
organized
how
IBM's
like a
tree.
and which
Look up
results
a description
modes, one of
in
file
being
on
its
Although
methods now
18.
B+
trees
in use, this was not always the case. A method called ISAM
Readings for this chapter) was once very common, especially
on large computers. ISAM uses a rigid tree-structured index consisting of at
least two and at most three levels. Indexes at these levels are tailored to the
specific disk drive being used. Data records are organized by track, so the
lowest level of an ISAM index is called the track index. Since the track index
points to the track on which a data record can be found, there is one track
(see Further
is
When
not
split.
separate overflow area and chained together in logical order. Hence, every
The
its
may
pointer to the
essential
difference
is
contain
home
track.
between the
in the
ISAM
way overflow
organization
and
B+
EXERCISES
records
index structure
Can you
accommodate
altered to
more
index
Why do you
rigid
ISAM, with
structure of
records?
is
two approaches
as well as addition
and deletion
of records.
Programming Exercises
We
begin
this
just a linked
you
list
to write a
first
program
and
to the sequence
Write
program
set,
in either
that accepts a
file
then to write
finally to write
creating a
set,
programs and
tree. These
B+
or Pascal.
of strings
as input.
The input
file
should be sorted so the strings are in ascending order. Your program should
use this input
The
file
sequence
Sequence
The
first
set
set
block
is
other things,
file is a
a reference to the
RRN
of the
first
among
quence set;
Sequence set blocks are loaded so they are as full as possible; and
Sequence set blocks contain other fields (other than the actual records
containing the strings) as needed.
Write an update program that accepts strings input from the keyboard,
along with an instruction either to search, add, or delete the string from the
20.
sequence
set.
either
found or not
found;
added
if
it is
set;
441
442
set
FILE
ACCESS
less
than
and
full;
Splitting, redistribution,
gram development.
21. Write a
assume
program
file
form of
a B-tree.
B+
two
You may
levels.
The
The index
Do
set in the
resulting
exercises
set
set,
tree;
as
you form
set;
Index
and
Index
set
blocks.
sequence
set blocks,
file as
the index set as well as the already existing reference to the begin-
new
set.
B+
you created
capabilities
full.
to separators
as:
Where should
Given the data types permitted by the language you are using, how
can you handle the fact that the block consists of both character and
integer data with no fixed dividing point between them?
As items are added to a block, how do you decide when a block is
too
full to insert
another separator?
FURTHER READINGS
FURTHER READINGS
The
initial
IP
(1973b), although he did not name or develop the approach. Most of the literature
+
that discusses B
trees in detail (as opposed to describing specific implementations
VSAM)
such as
provides what
is
form of
in the
articles rather
than textbooks.
B^
Comer
(1979)
promoted
ones, are
up
is
in the tree
shallower
have
as
trees.
blocks
a greater
tree.
Rosenberg and Snyder (1981) study the effects of initializing a compact B-tree
on later insertions and deletions. The use of batch insertions and deletions to B-trees,
+
rather than individual updates, is proposed and analyzed in Lang et al. (1985). B
trees are
file
organizations (such as
ISAM)
in
B+
An
exception to
(VSAM), one of
the
sequential access.
Wagner
this
is
tree
file
key maintenance, key compression, secondary indexes, and indexes to multiple data
sets. Good descriptions of VSAM can be found in several sources, and from a
variety of perspectives, in
IBM's
(VSAM
in a
B+
Bohl
(1981),
Comer
(1979)
an example of
tree),
VAX-11 Record Management Services (RMS). Digital's file and record access
subsystem of the VAX/ VMS operating system, uses a B"" tree- like structure to
support indexed sequential access (Digital, 1979). Many microcomputer implementations
of B
trees can
(Borland, 1984).
III
443
Hashing
10
CHAPTER OBJECTIVES
Introduce the concept of hashing.
Examine
the
problem of choosing
one
describe
some
good
hashing
in detail,
and
others.
Explore three approaches for reducing collisions: randomization of addresses, use of extra memory, and
storage of several records per address.
effects
of patterns of record
deterioration
access
on perfor-
mance.
445
CHAPTER OUTLINE
10.1
Introduction
10.1.1
What
Hashing?
is
10.6 Storing
10.1.2 Collisions
10.2
10.6.1
10.3
on
Performance
10.6.2 Implementation Issues
Effects of Buckets
Distributions
10.3.1
10.3.2
Distributing Records
10.7
among
Making Deletions
Tombstones
10.7.1
Deletions
10.7.2 Implications of Tombstones
for Insertions
Records
Additions on Performance
File
10.4
Handling
for
Addresses
How Much
Memory
Be Used?
10.8.1
Extra
Should
10.8
Double Hashing
Overflow
Overflow Area
Resolution by
Progressive Overflow
10.5 Collision
10.5.1
How
Revisited
Progressive Overflow
10.9 Patterns
Works
10.1
of Record Access
Introduction
O(l) access to
a
files
means
that
how
no matter
grows
O(N)
number of
access,
of the
big the
file.
file
grows, access to
seeks.
By
contrast,
As we saw
in the preceding
chapters, B-trees
improve on
N)
access; the
number of seeks
number of
records,
where k
is
this greatly,
measure of the
providing 0(\og k
leaf size.
files,
but
it is still
not
O(l) access.
Everyone agrees
is
~_i
447
INTRODUCTION
was not
it
clear that
that
files
general class of
change greatly
in size.
In this chapter
They provide
in size.
we
begin with
Static
following chapter
we show how
begun to
dynamic and
ways
has
find
hash Junction
is
art until
and O(l)
file
increases
about 1980.
work during
In
the
the 1980s
Hashing?
like a black
is
The
box
like
that
it is
resulting address
10.1, the
Hashing
of the
state
to extend hashing,
into an address.
to
10.1.1 What
drop
4.
That
key
is.
used
key
LOWELL
is
//(LOWELL) =
and
transformed by the
4.
Address 4
is
said
LOWELL.
indexing in that
is
Hashing
it
differs
involves associating
from indexing
in
key with
two important
ways:
as randomizing.
to deal
with
it.
names
addresses.
They appear
to be in random order.
is
no apparent order
to the
448
HASHING
Address
Record
Key
K = LOWE
Address
FIGURE 10.1
LOWELL.
r:
LOWELL
to
address
LOWELL'S
home address
4.
10.1.2 Collisions
Now
suppose there
Since the
is
key
name OLIVIER
LOWELL,
in the
starts
as
we must
record for
LOWELL. We refer to
keys that
synonyms.
We
resolve collisions.
We
do
this in
hashing algorithms partly on the basis of how few collisions they are likely
to produce,
TABLE 10.1
Name
tricks
we
store records.
ASCII Code
First
Two
for
Letters
Product
Home
Address
BALL
66
65
66 x 65
LOWELL
76
79
76 x 79
TREE
84
82
84 x 82
=
=
-
4,290
290
6,004
004
6,888
888
449
INTRODUCTION
The
turns out to be
is
Such an algorithm
much more
is
hashing
algorithm than one might expect, however. Suppose, for example, that you
algorithm.
want
It
among
shown (Hanson,
A more
1982)
that
practical solution
is
to reduce the
if
number of
collisions to an
record
number of
reduce the
number of collisions,
compete
records.
for the
Collisions occur
same
address. If
we would
we
when two
could find
or
more records
hashing algorithm
randomly among
It is
we have only a few records to distribute among many adwe have about the same number of records as adOur sample hashing algorithm is very good on this account
lisions if
dresses than if
dresses.
since there are 1,000 possible addresses and only 75 addresses (corre-
sponding to the 75 records) will be generated. The obvious disadvantage to spreading out the records
is
is
wasted.
(In
the example, 7.5% of the available record space is used, and the remaining 92.5% is wasted.) There is no simple answer to the question
of how much empty space should be tolerated to get the best hash-
some techniques
not unreasonable to try to generate perfect hashing functions for small (less than 500).
of keys, such as might be used to look up reserved words in a programming language. But files generally contain more than a few hundred keys, or they contain sets of
keys that change frequently, so they are not normally considered candidates for perfect
hashing functions. See Knuth (1973b), Sager (1985), Chang (1984), and Chichelli (1980) for
more on perfect hashing functions.
'''It
is
stable sets
450
HASHING
amounts of
free space.
sumed
tacitly that
file
in
Up
at a single address.
to
now we
such
way
is
usually
that every
no reason
file
a file
why we
address
is
cannot
big enough to
and
we
create a
file
methods, and
as
we do
so
we
we
elaborate
present
on these collision-reducing
for managing hashed
some programs
files.
10.2
Our
is
done
The
to achieve this.
pieces
It is
is
not too
work
well.
2.
3.
Divide by
as the address.
key is already a
string of characters,
If the
is
451
we
take the
it
to
form
number. For
example,
76 79 87 69 76 76 32 32 32 32 32 32
nlir
=
LOWELL
L
D
W
L
L
L
Blanks
,
In this algorithm
letters.
By
among
differences
we
more
using
we
parts of a key,
first
two
The
when
compared
>t
<
to the potential
improvement
is
usually insignificant
in performance.
Step 2. Fold and Add Folding and adding means chopping off pieces of the
number and adding them together. In our algorithm we chop off pieces
with two ASCII numbers each:
76 79
These number
87 69
76 76
',
32 32
thought of
pairs can be
32 32
32 32
as integer variables
(rather than
In Pascal,
we
we add
Before
32,767
the numbers,
most
we
set.
have to mention
a problem caused by
numbers we can add together are
On some microcomputers,
(15 bits)
adding the
first
Adding
maximum
32,767.
We
can do this by
first
sum
is
less
we
than
will
ever add in our summation, and then making sure after each step that our
assume
ZZ.
Suppose we choose 19,937 as our largest allowable intermediate result. This
differs from 32,767 by much more than 9,090, so we can be confident (in
this example) that no new addition will cause overflow. We can ensure in
our algorithm that no intermediate sum exceeds 19,937 by using the mod
alphabetic characters, so the largest addend
is
9,090, corresponding to
452
HASHING
which returns
operator,
the remainder
when one
integer
is
divided by
another:
+
+
4187 +
7419 +
10651 +
7679
16448
Why
did
we
8769 -* 16448
7676 -> 24124
3232^
7419
7419
3232 - 10651
3232 -> 13883
10651
is
16448
24124
13883
16448
4187
7419
10651
13883
bound
Because the division and subtraction operations associated with the mod
operator are more than just a way of keeping the number small; they are
part of the transformation work of the hash function. As we see in the
number
more random distribution than does transformation by
number 19,937 is prime.
discussion for the next step, division by a prime
usually produces
nonprime. The
step
3.
is
to cut
We
the
file.
mod
mod
n.
operator will be
number between
1.
our
we
0-99
for
= 13820 mod
100
20.
Since the number of addresses allocated for the file does not have to be
any specific size (as long as it is big enough to hold all of the actual records
to be stored in the file), we have a great deal of freedom in choosing the
divisor n. It is a good thing that we do, because the choice of n can have a
how
major
effect
on
prime
number
distribute
nonprime can
remainders
453
FUNCTION hash(KEY,MAXAD)
set
set
SUM to
to
while (J
<
12)
set
i
endwh i
KEYCJ
to
prime divisors less than 20 (Hanson, 1982). Since the remainder is going to
be the address of a record, we choose a number as close as possible to the
desired size of the address space. This number actually determines the size
of the address space. For a file with 75 records, a good choice might be 101,
file
74.3%
Hence,
the record
whose key
is
=
=
full
space, the
13820
= 0.743).
home address of the
(74/101
mod
record in
101
84.
LOWELL
is
assigned
record
to
number 84
in the
file.
that
The procedure
we
call hash(),
hash() takes
at least
two
inputs:
returned by hash()
10.3
12 characters, and
is
MAXAD,
the address.
look
at
ways
distributions
makes
it
easier
we
Understanding
to discuss other hashing methods.
files.
454
HASHING
no
by distribution
among
file
so
Such a distribution
is called uniform because the records are spread out uniformly among the
addresses. We pointed out earlier that completely uniform distributions are
so hard to find that it is generally not considered worth trying to find them.
Distribution (b) illustrates the worst possible kind of distribution. All
records share the same home address, resulting in the maximum number of
there are
collisions.
will be a
collisions, as illustrated
The more
a distribution
(a).
more
collisions
problem.
Distribution
somewhat spread
(c)
illustrates
out, but
with
a
a
distribution in
few
collisions.
This
are
case
if
chosen as every other address. The fact that a certain address is chosen for
one key neither diminishes nor increases the likelihood that the same
address will be chosen for another key.
It should be clear that if a random hash function is used to generate a
large number of addresses from a large number of keys, then simply by
chance some addresses are going to be generated more often than others. If
you have, for example, a random hash function that generates addresses
between
and 99, and you give the function 100 keys, you would expect
(a)
Worst
Best
Record
Address
(a)
UAfuI
No synonyms
Record
Address
(b)
synonyms
Acceptable
Record
Address
(c)
455
some of
ideal,
it is
random
among
distribution of records
may
to be
at all.
Although
not
chosen not
it is
available addresses
is
practically impossible
when we
random
can find
while they do
generate a
addresses
10.3.2
It
Some
would be
nice
better-than-random
if
were
there
distribution
by
in
hash
function
cases,
all
but
that
there
guaranteed
is
not.
The
of keys that
are actually hashed. Therefore, the choice of a proper hashing function
should involve some intelligent consideration of the keys to be hashed, and
perhaps some experimentation. The approaches to choosing a reasonable
hashing function covered in this section are ones that have been found to
work well, given the right circumstances. Further details on these and other
methods can be found in Knuth (1973b), Maurer (1975), Hanson (1982),
and Sorenson et al. (1978).
Here are some methods that are potentially better than random:
distribution generated
a pattern.
Sometimes keys
is
more
fall
set
likely to
be true of numeric
key can
also be used.
Fold parts of the key. Folding is one stage in the method discussed earlier. It involves extracting digits from part of a key and adding the
extracted parts together. This
terns but in
method destroys
Divide the key by a number. Division by the address size and use of
the remainder usually
is
involved somewhere in
is
to
456
HASHING
has
vision
by
nonprime
from
different con-
secutive sequences.
randomization
the goal.
is
method
(often called
a single large
produces
method
fairly
that
is
it
random
results.
One
key
99. If the
382; 382
mod
is
number
the decimal
99
85, so 85
is
453,
its
base 11 equivalent
is
method
that
it
records
among
predict
how
that a large
is
of
Records
file, it is
number of addresses
to
of
collisions.
is
hold, then
likely to
we know
have
important to be able to
we know,
far
more
for example,
records assigned
going to be
a lot
457
Although there
among
collisions
are
no nice mathematical
distributions
are
that
than random,
better
there
are
(knowing
it
how
to behave.
The Poisson
We
a
We want to
Distribution"''
predict the
hash function
is
When
questions.
applied to a key.
of the keys
all
We
would
number of collisions
one record
file
a single
like to
at
an address.
given address
when
what
is
the likelihood
that
None
file
A The
B
address
is
not chosen; or
The address
is
chosen.
How do we express
bothp(A) and a stand for
p(B) and
"''This
uted
one chance
in
p(A) = a
addresses in a
file if a
is
is
If
we
let
chosen, then
= ^>
N of being
N- =
-^1
among
two outcomes?
random hashing
ways
chosen, and
in
which records
function
is
used.
The
will be distrib-
discussion assumes
knowledge of some elementary concepts of probability and combinatorics. You may want
to skip the development and go straight to the formula, which is introduced in the next
section.
458
HASHING
(N =
chances in
N of not being
10 addresses
1/10 =
0.1 = 0.9.
is
Now suppose two keys are hashed. What is the probability that both
keys hash to our given address? Since the two applications of the hashing
function are independent of one another, the probability that both will
a product:
is
--
tor
N=
10: b
0.1
0.1
0.01.
Of course, other outcomes are possible when two keys are hashed. For
example, the second key could hash to an address other than the given
address.
The
p(BA) =
probability of this
In general,
is
- -M
when we want
the product
for
to
know
N=
10: b
xbxbxa
how
2 3
a b
0.9
0.09.
p(BABBA) = bx
0.1
for
N=
B by
and
2 3
10: a b
and
h,
(0.9) (0.1)
As occur
in the order
that exactly
all
six
Outcome
Probability
BBAA
BABA
BAAB
ABBA
ABAB
AABB
bbaa
bV
baba = bV
baab = bV
abba = bV
abab = bV
aabb = bV
=
For
N=
10
(0.1) (0.9)
2
(0.1) (0.9)
2
(0.1) (0.9)
2
(0.1) (0.9)
2
(0.1) (0.9)
2
(0.1) (0.9)
=
=
=
=
=
0.0036
0.0036
0.0036
0.0036
0.0036
0.0036
Since these six sequences are independent of one another, the probability of two Bs and two As is the sum of the probabilities of the individual
outcomes:
459
p(BBAA) + p(BABA) +
+ p(AABB) =
2 2
6b a
x 0.0036 = 0.0216.
The 6 in the expression 6b a~ represents the number of ways two s and two
As can be distributed among four places.
In general, the event "r trials result in r x As and x Bs" can happen
in as many ways as r x letters A can be distributed among r places. The
probability of each such
way
is
is
(r
This
is
the
items out of
well-known formula
of
a set
items.
It
x)\x
for the
follows that
x times
can be expressed as
p(x)
Furthermore,
we know
if
= Ca'~ x b x
N addresses
available, we can be
and B, and the formula
becomes
p(x)
where
= C
What does
this
mean?
It
means
that
for example,
if,
x =
0,
we
can
records assigned to
it
*) =
If
1,
this
c (' h\-
(h)-
to a given address:
p(\)
CI
(Try
it
of
and
r,
there
is
it
is
N=
awkward
to
compute.
1,000.) Fortunately,
function that
is
very good
460
HASHING
much
is
easier to
compute.
It
is
called the
Poisson function.
where N,
(r/N) e-
P(*)
The Poisson
function,
(r/x >
Tj
N=
r
x,
r,
the
number of available
number of records
the
addresses;
to be stored;
*s
in the
and
to a given address,
to all
n records.
Suppose, for example, that there are 1,000 addresses (N = 1,000) and
whose keys are to be hashed to the addresses (r = 1,000).
1,000 records
Since r/N
to
it
(x
=
0)
1,
becomes
=1jr
P(0)
The
0-368.
it
are
= ]~
0.368
p(2)=-^j- =
0.184
p(\)
[r
I'e
p(3)
If
we
0.061.
a certain
number of addresses
assigned.
For example, suppose there are 1,000 addresses (N = 1,000) and 1,000
(r = 1,000). Multiplying 1,000 by the probability that a given
address will have x records assigned to it gives the expected total number of
records
is,
461
N addresses,
them
number of
is
Np(x).
about p(x)
as a
random hashing
function,
we
can apply
numbers of
collisions.
Suppose you have a hashing function that you believe will distribute records
randomly, and you want to store 10,000 records in 10,000 addresses. How
many addresses do you expect to have no records assigned to them?
= 10,000, r/N = 1. Hence the proportion of
Since r = 10,000 and
addresses with
records assigned should be
= LIV =
1
P(0)
ir
0-3679.
records assigned
10,000 x p (0)
is
3,679.
respectively?
10,000 x ^(l)
0.3679 x 10,000
3,679
10,000 x p (2)
0.1839 x 10,000
1,839
10,000 x p (3)
0.0613 x 10,000
613.
two records
The 1,839
apiece,
overflow records.
Each of the 613 addresses with three records apiece has an even bigger
problem. If each address has space for only one record, there will be two
overflow records per address. Corresponding to these addresses will be a
462
HASHING
records.
10.4
But
first, let's
How Much
We
Extra
overflow records.
reduce collisions.
to use extra
is
memory. The
The term
stored
to the
(r)
For example,
number of available
spaces (N):^
Number of records _
Number of spaces
if there are
r_
75 records (n
75)
75o/o.
number of records
to be
100),
is
1=
0.75
The packing
actually used,
more
""We
assume here
that only
we
So
it is
with records
see later.
at
in a
file.
The more
each address. In
fact, that is
records
not nec-
the
when
new
file
record
is
likely
it is
that a collision
added.
circumstances.
example,
at
We
want
to
have
need
few
as
We
more
space, the
463
BE USED?
to reduce
particular
to use
for Different
of the
two
Packing Densities
density. In particular,
we need
effects
P(x)
(r/N) e~
r/N
distributed
among
1,000,000 addresses.
file is
r_
= 500 =
1,000
among
How many
How many
file:
synonyms)?
How many
onyms?
Assuming
more syn-
one record can be assigned to each home adoverflow records can be expected?
What percentage of records should be overflow records?
dress,
that only
how many
464
HASHING
1.
How many
Np(0)
2.
How many
num-
is
1,000 x
=
-
607.
5 )" g
(-
1,000 x 0.607
onyms)?
Np(\)
3.
How many
1,000 x
=
=
303.
^2L
1,000 x 0.303
more synonyms?
The
values o p(2), p(3), p(4), and so on give the proportions of addresses with one, two, three, and so on synonyms assigned to them.
Hence
the
sum
p(2)
all
may
appear to require
p(3)
a great deal
p(4)
grow
synonym. This
of computation, but
it
doesn't
file is
50%
only
loaded, one
would not expect very many keys to hash to any one address.
Therefore, the number of addresses with more than about three keys
hashed to them should be quite small. We need only compute the results up to p(5) before they become insignificantly small:
p(2)
p(2>)
p(4)
p(5)
=
=
N and this
Assuming
many
0.0002
or
more synonyms
is
just the
result:
N\p{2)
4.
0.0016
0.0902.
+ 0.0126 +
0.0758
p{3)
=
-
1,000
x 0.0902
90.
home
address,
how
resented by p(2), one record can be stored at the address and one
must be an overflow record. For each address represented by p{2>),
at
465
BE USED?
is
given by
1
N
=
=
5.
+ 2 X
x p (2)
NX
[1
1,000 x
N X p(3)
[1
+ 3 x iV x p(4) + 4 X
X p (5)
x p (3) + 3 x p (4) + 4 x p(5)]
x 0.0758 + 2 x 0.0126 + 3x 0.0016 + 4 x 0.0002]
p(2)
107.
jjgConclusion:
0.214
in
all,
= 21.4%.
If the
packing density
we
is
home
if
is
5%
The
of the time
table
we
shows
that
try to access
about
TABLE 10.2
37%
home addresses
Packing
Density (%)
Synonyms
as
of Records
10
4.8
20
9.4
30
13.6
40
17.6
50
21.4
60
24.8
70
28.1
80
31.2
90
34.1
100
36.8
466
HASHING
in
your
file
there will be
The 36.8%
terms of
10.5
0%
if a
hashing algorithm
is
very good,
is
number of techniques
it
fit
into their
home
for
Novak
Rosen
York's
home
address (busy)
Jaspei
2nd
Moreley
try (busy)
York's actual
address
467
Key
Blue
"1
98
Hash
Address
routine
"U
99
99
Jello
Wrapping around
for
works
and
well.
we
The technique
a file.
a lively area
concentrate on
of research.
We examine
linear probing.
How
10.5.1
An example
of a situation
In the example,
home
address,
we want
it is
in
which
a collision
an overflow record.
If
progressive overflow
used, the
is
The
first free
address 9
is
is
the
Eventually
6,
found.
first
stored in address
hashes to
is
York
9.
we need
to find
file.
it
proceeds to look
at
Since
6. It
York
still
it
gets
to address 9,
or for
record
at the
end of the
file.
This
there
is
is
which
468
HASHING
assumed that the file can hold 100 records in addresses 0-99. Blue is
hashed to record number 99, which is already occupied by Jello. Since the
file holds only 100 records, it is not possible to use 100 as the next address.
The way this is handled in progressive overflow is to wrap around the
address space of the file by choosing address
as the next address. Since, in
this case, address
is not occupied, Blue gets stored in address 0.
What happens if there is a search for a record but the record was never
it is
placed in the
file?
The
home
look for
to
it
in successive locations.
Two
address,
things can
happen:
If
an open address
sume
this
If the file
is it
means
is full,
is
when we approach
is
filling
is
not in the
comes back
the search
not in the
our
file,
to
file.
When
searching can
The
cases,
it
a perfectly
or
file;
where
is
it
this occurs,
become
in the
or even
intolerably
file.
is its
simplicity. In
are,
however,
many
collision-
we
The reason
to avoid
overflow
when
of collisions,
taking up spaces where they ought not to be. Clusters of records can form,
resulting in the placement of records a long
so
many
disk
Key
Home
Address
Adams
20
Bates
21
Cole
21
Dean
22
Evans
20
469
Number
of
Actual
Home
accesses needed
address
address
to retrieve
20
Adams
20
21
Bates
21
22
Cole
21
23
Dean
22
24
Evans
20
25
used to resolve
collisions,
file,
at their
is
The term
number of
long
accesses required to
on how many
it.
retrieve a record
home
Figure 10.6
is
a collision. If a
may
record
be unacceptable.
can expect to have to access the disk to retrieve a record. A rough estimate
of average search length may be computed by finding the total search length
(the sum of the search lengths of the individual records) and dividing this by
the number of records:
Average search length
number of records'
470
HASHING
In the
1+2
five records
is
= ?2
With no
access
is
collisions at
needed
to
all,
retrieve
later section.
It
turns out that, using progressive overflow, the average search length
goes up very rapidly as the packing density increases. The curve in Fig.
10.7,
density
much more
80%
or more,
it
appears that
it is
in a
is
Average
search
length
20
40
40%
we can
60
Packing density
80
100
471
improve on
to
our
hashing program. The change involves putting more than one record
at a
single address.
10.6
Storing
about
Therefore,
ally.
why
The word
bucket
record address in
is
sometimes used
is
On
a file to
an
to describe
when
those
sector-addressing disks, a
set
of keys, which
is
file.
Home
Key
Address
Green
30
Hall
30
Jenks
32
King
33
Land
33
Marx
33
Nutt
33
Each address
a file into
in a
Only
home
address.
much
less
often
when
when
472
HASHING
Bucket
address
30
Bucket contents
Green
Hall
31
32
Jenks
33
King
(Nutt
.
Land
Marks
...
is
an overflow
.
record)
When
changed
To compute how
densely packed a
file is,
we need
to consider
is
record.
both the
number of addresses
address (bucket
size). If
the
is
Packing density
Suppose
we
the following
We
have
a file in
bN
the
among
The packing
750
is
75%.
1,000
We
among 500
There are
still
locations,
bN
0.75
= 75%
where each
loca-
473
way
not changed,
is
to
although there are fewer addresses, each individual address has more
for variation in the number of records assigned to it.
room
storing the
is
structure.
with
Buckets
without
Buckets
File
File
Number of records
Number of addresses
Bucket
= 750
N=
size
= 750
N = 500
b = 2
r
1,000
1
Packing density
0.75
0.75
r/N = 0.75
r/N =
1.5
To
,
p(x)
(r/N) e-"
xi
two
different
file
organizations,
We
see
from
when
make
many
22.3% of
intuitive sense
since
in the
it
shown
in
Table 10.3.
we
when two-record
buckets are
as
474
HASHING
TABLE 10.3
p(x)
without
Buckets
File with
Buckets
(r/N = 0.75)
(r/N
0.472
0.223
p(l)
0.354
0.335
p(2)
0.133
0.251
p(3)
0.033
0.126
P(4)
0.006
0.047
P(5)
0.001
0.014
0.001
P(7)
organizations
1.5)
p(0)
P(6)
file
0.004
file
with bucket
size one,
Any
is
more
number of
than one record does have overflow. Recall that the expected
overflow records
is
given by
+2x
[1
x p (2)
0.75 and
[1
assigned
address with
N=
1,000,
is
approximately
records represent
29.6% overflow.
the bucket
file is
[1
x p (3) + 2 x p (4) + 3 x
p(S)
4 x p(6)
+ ...],
475
which
for
500 x
[1
= 500
is
approximately
TABLE 10.4
Synonyms causing
densities
for different
packing
Bucket Size
Packing
Density
<%)
10
100
10
4.8
0.6
0.0
0.0
0.0
20
9.4
2.2
0.1
0.0
0.0
30
13.6
4.5
0.4
0.0
0.0
40
17.6
7.3
1.1
0.1
0.0
50
21.3
10.4
2.5
0.4
0.0
60
24.8
13.7
4.5
1.3
0.0
70
28.1
17.0
7.1
2.9
0.0
75
29.6
18.7
8.6
4.0
0.0
80
31.2
20.4
10.3
5.3
0.1
90
34.1
23.8
13.8
8.6
0.8
100
36.8
27.1
17.6
12.5
4.0
476
HASHING
the sizes of buffers the operating system can manage, sector and track
on
capacities
disks,
(seek, rotation,
and
As
a rule, it is
probably not
be too large
entire track,
per search, any extra transmission time resulting from the use of extra large
buckets
essentially wasted.
is
many
In
suppose that
A less
is
a file
we
introduced
to retrieve a record
of buckets,
average search length represents the average number of buckets that must
be accessed to retrieve
search lengths for
The bigger
a record.
files
involved
in
Since a hashed
RRN, you
Hashed
respects,
file is a
fixed-length record
file
whose records
are accessed
by
should already
files
differ
however:
477
TABLE 10.5
in
a successful search by
progressive overflow
Bucket Sizes
Packing
Density
(%)
10
50
10
1.06
1.01
1.00
1.00
1.00
30
1.21
1.06
1.00
1.00
1.00
40
1.33
1.10
1.01
1.00
1.00
50
1.50
1.18
1.03
1.00
1.00
60
1.75
1.29
1.07
1.01
1.00
70
2.17
1.49
1.14
1.04
1.00
80
3.00
1.90
1.29
1.11
1.01
90
5.50
3.15
1.78
1.35
1.04
95
10.50
5.6
2.7
1.8
1.1
5 1973,
3,
Addison-Wesley, Read-
ing,
1.
fore the
as
long
size to
file
file
it
number of
needed.)
2.
Since the
to
its
home
RRN
of
record in
hashed
file is
uniquely related
bond
is
is
record and
no longer
its
must
home
accessible
ad-
by
hashing.
We
special needs in
to
files.
in
478
HASHING
Here
An empty
Two
we want
many
to store as
bucket:
entries:
full bucket:
JONES
ARNSWORTH
JONES
it.
a counter that
The counter
tells
us
when
many records it
new record
the addition of a
number of records
how many
THROOP
tell us which slots are used and which are not. We need a way
whether or not a record slot is empty. One simple way to do this is
to use a special marker to indicate an empty record, just as we did with
deleted records earlier.. We use the key value ///// to mark empty records in
does not
it
to
tell
remain
fixed,
it
Hashing
makes sense
in
Since the
most
logical size
of a hashed
file
must
we
before
creating a
file
likelihood that records will be stored close to one another on the disk,
when
an attempt
is
made
to read a missing
record, and
treat the
to
Loading a Hash File A program that loads a hash file is similar in many
ways to earlier programs we use for populating fixed-length record files,
with two differences. First, the program uses the function hash() to produce
a
home
the record
by
its
home
a free
space for
479
MAKING DELETIONS
the
is
is
new
somewhere
record
in the
file.
Another problem
when an attempt
made
when adding
records to
files
occurs
10.7
is
to
Making Deletions
Deleting a record from
record for
The
two
slot freed
searches;
It
hashed
file is
to hinder later
and
When
reasons:
progressive overflow
is
an open address
is
encountered. Because of
alphabetical order using progressive overflow for collisions, they are stored
in the locations
shown
in Fig. 10.9.
program
for
is
not in the
file.
480
HASHING
Home
Actual
Record
address
address
Adams
Jones
Adams
Morris
Jones
Smith
Morris
Smith
FIGURE 10.9
10.7.1 Tombstones
Chapter 5
In
One
we
for
Handling Deletions
simple technique
we
its
key) with
Adams
Jones
Smith
481
MAKING DELETIONS
Adams
Jones
######
Smith
tombstone
The
is
that
it
for Morris.
of tombstones
in
The
a record;
and
The
freed space
is
may
be reclaimed for
later
additions.
Figure 10.11 illustrates how the sample file might look aftc the
tombstone ###### is inserted for the deleted record. Now a search for
Smith does not halt at the empty record number 7. Instead, it uses the
######
It is
as
an indication that
it
a deletion occurs.
example, suppose in the preceding example that the record for Smith
following the Smith record
For
is
to
empty, nothing is
lost by marking Smith's slot as empty rather than inserting a tombstone.
Indeed, it is actually unwise to insert a tombstone where it is not needed. (If,
after putting an unnecessary tombstone in Smith's slot, a new record is
added at address 9, how would a subsequent unsuccessful search for Smith
be affected?)
be deleted. Since the
slot
10.7.2 Implications
of
Tombstones
is
for Insertions
482
HASHING
permissible to insert
######
which
occurs as the
key.
This
new
feature,
example
shown
in Fig. 10.
in
1 1
it
is
desirable because
a certain
which Morris
.
is
it
deleted,
giving the
file
organization
from occurring, the program must examine the entire cluster of contiguous
keys and tombstones to ensure that no duplicate key exists, and then go
back and insert the record in the first available tombstone, if there is one.
10.7.3 Effects
of Deletions
still
expect
some
work and
little
helps in
deterioration in performance
four-record
file
file.
of Adams, Jones,
example, that
if after original
is
is
is
1.2,
it
will
reached.
483
the records that follow a tombstone to see if the search length can be
its
file
home
address.
after the
average
10.8
its
simplicity,
to buckets in the
searches for
same
some
records.
One method
is
to store
overflow records a long way from their home addresses by double hashing.
With double hashing, when a collision occurs, a second hash function is
applied to the key to produce a number c that is relatively prime to the
number of addresses.^ The value c is added to the home address to produce
the overflow address. If the overflow address is already occupied, c is added
to it to produce another overflow address. This procedure continues until a
free overflow address is found.
Double hashing does tend to spread out the records in a file, but it
from a potential problem that is encountered in several improved
overflow methods: It violates locality by deliberately moving overflow
records some distance from their home addresses, increasing the likelihood
suffers
new overflow
address. If the
""If
covers
is
divisors.
the
number of addresses,
then
and
N are relatively
prime
if
they have no
common
484
HASHING
is
works
home
The
is
it is
synonyms
is
there
searched
is
linked
when
list
record
is
sought.
from
decreases
is
10.13.
illustrated in Fig.
The average
search length
2.5 to
1
1+2
1+3
must be added
we
attend to
of
First, a
a little
more
Home
Search
Key
address
Actual
address
length
Adams
20
20
Bates
21
21
Cole
20
22
Dean
21
23
Evans
24
20
Average search le ngth =
Flint
(1
24
25
3+1
6)/6
==
2.5
485
Home
Actual
address
address
20
20
Adams
21
21
Bates
20
22
Cole
21
23
Dean
24
24
Evans
20
25
Flint
Address of
Data
synonym
Search
length
next
FIGURE 10.13 Hashing with chained progressive overflow. Adams, Cole, and
Flint are synonyms; Bates and Dean are synonyms.
must guarantee
that
it is
possible to
to
Dean
still
ends up
address 23.
at
Does
this
mean
is
Dean
home
problem
for
some record
is
handled easily
are not
is
list
joining
If the pointer
is
synonym
23, Flint
Flint
is
is lost.
occupied by
a different record.
One solution to
home address
when
two-pass loading.
Two-pass loading,
passes.
is lost.
record (Dean)
the
two
file
as the
is
hold
first
home
record.
loaded by using
technique called
potential
486
HASHING
home
address, so
it
home
problem
solves the
address actually
example.
in the
It
does not
later deletions
record.
The methods used for handling these problems after initial loading are
somewhat complicated and can, in a very volatile file, require many extra
disk accesses. (For more information on techniques for maintaining
pointers, see Knuth, 1973b and Bradley, 1982.) It would be nice if we could
somehow altogether avoid this problem of overflow lists bumping into one
another, and that is what the next method does.
One way
to
to
is
move them
is
all
this basic
and the
home
to a separate
addresses where
overflow
approach. The
set
area.
set
of
Many
home
of overflow addresses is
is that it keeps all
home
Now
entry-sequenced
overflow area
is
on
a different
cylinder than
is
the
home
address, every
Studies
show
a separate
is
generally worse
movement.
when overflow
in
One
situation in
which
is
a separate
overflow area
there
are
is
required occurs
more
records than
when
home
487
Home
Primary
address
data area
20
Adams
21
Bates
Overflow
area
Cole
Dean
22
Flint
23
24
-*
.
Evans
addresses.
If,
for example,
it is
anticipated that a
file
Flint are
will
grow beyond
capacity of the initial set of home addresses and that rehashing the
a larger
address space
is
the
with
file
must
be used.
hash
file
organization of
a file
using
that contains
file
a scatter table.
one more than other forms of hashing require, unless the scatter
table can be kept in primary memory.) The data file can be implemented in
many different ways. For example, it can be a set of linked lists of
access
is
synonyms
(as
shown
file,
or an entry-sequenced
file.
more information on
488
HASHING
k.
20
21
1*
Adams
Bates
Cole
b Dean
Flint
??
23
24
Evans
25
FIGURE 10.15 Example of a scatter table structure. Because the hashed part
file may be organized in any way that is appropriate.
is
an index,
the data
10.9
fish.
M. Boyd
is
in a
hashed
file.
If
we
know something
time an item
file is
hashed,
up the
company
handles. Every
is
file. Is it
it is
among
make
randomly
There
is
a principle
The Concept of
the Vital
80/20 Rule of
file
account for
a large
Thumb: 80% of
on 20%
is
the
of the
SUMMARY
items, brie
among
file,
among
milk would be
the
20%
489
high-activity
the rest.
way
If,
when
that the
loaded
at
20% (more or
home
or near their
less)
a file,
that are
as
few accesses
as
addresses, then
access records that have short search lengths, so the effective average search
length will be shorter than the nominal average search length that
defined
we
earlier.
For example, suppose our grocery store's file handling program keeps
number of times each item is accessed during a one-month
period. It might do this by storing with each record a counter that starts at
zero and is incremented every time the item is accessed. At the end of the
month the records for all the items in the inventory are dumped onto a file
that is sorted in descending order according to the number of times they
have been accessed. When the sorted file is rehashed and reloaded, the first
records to be loaded are the ones that, according to the previous month's
experience, are most likely to be accessed. Since they are the first ones
loaded, they are also the ones most likely to be loaded into their home
addresses. If reasonably sized buckets are used, there will be very few, if any,
high-activity items that are not in their home addresses and therefore
track of the
retrievable in
one
access.
SUMMARY
There are three major modes for accessing files: sequentially, which provides
O(N) performance, through tree structures, which can produce 0(\og k N)
performance, and directly. Direct access provides O(l) performance, which
means that the number of accesses required to retrieve a record is constant
and independent of the size of the file. Hashing is the primary form of
organization used to provide direct access.
we
little
by key.
is
that
hashed
is
it
it is
adaptable to
rarely achieved.
files
may
The
not be sorted
490
HASHING
K to produce an address.
space.
distribution
or nearly
random
distribution
is
much
easier to achieve
and
A uniform
A random
is
usually
considered acceptable.
In this chapter a simple hashing algorithm
is
developed to demonstrate
The
three
2.
3.
Divide by the
When we examine
size
a valid address.
we
see that
produce better-than-random distributions. Failing this, we suggest some algorithms that generally produce
distributions which are approximately random.
The Poisson distribution provides a mathematical tool for examining in
detail the effects of a random distribution. Poisson functions can be used to
predict the numbers of addresses likely to be assigned 0, 1, 2, and so on,
records, given the number of records to be hashed and the number of
available addresses. This allows us to predict the number of collisions likely
to occur when a file is hashed, the number of overflow records likely to
that
SUMMARY
The Poisson
records.
synonyms.
is the third method for avoiding collisions. File addresses
more records, depending on how the file is organized by the
The number of records that can be stored at a given address,
Using buckets
designer.
The Poisson
to the
combined with
search lengths.
to deal
store a
new
record results in
a collision,
not found in
its
is
home
If a
home
record
is
address in
sought and
is
records
is
whose home
The
possibility that
empty
slots created
by
in
hashed
files
are
The need
leted;
3.
to
recover space
made
available
when
and
that are
491
492
HASHING
reorganization, complete
file
many
briefly:
1.
2.
home
may
place
some overflow
nisms for
3.
home
Mecha-
that the
to handling
Scatter tables
much more
flexibility in
this
4.
as a
record.
approach
is
requires
Since in
many
accessed,
home
we
than
we
it is
disadvantage of
RAM,
it
more
frequently than
less
search.
file.
make
One
such measure
is
to load the
effective
most frequently
KEY TERMS
We
number of accesses required for each record in the file divided by the
number of records in the file. This definition does not take into account
the number of accesses required for unsuccessful searches, nor does it
the
fact that
some
more
if
the
KEY TERMS
Bucket.
An
area of space
and
for storage
on the
retrieval
By
file
that
is
is
ets rather
many
cases,
be im-
proved substantially.
Collision. Situation in which
room
some means
Double hashing. A
c,
which
is
is
collision resolution
is
ber of addresses) as
record
record
When
a collision
oc-
handled by applying
number
scheme
in
which
collisions are
many
located or an
empty space
is
The
thumb. An assumption
80/20 rule of
80%) of the
accesses are
performed on
20%)
of the records in a file. When the 80/20 rule applies, the effective average search length is determined largely by the search lengths of the
more
make
added.
The
A
of
method of hashing
a
resulting
sum
in
(e.g.,
is
file,
RAM.
but hashing
is
might be organized for hashing rather than for binary search if exfast searching of the index is desired.
Home address. The address generated by a hash function for a given
tremely
key. If a record
for the record
the record.
is
is
stored at
its
home
record not
at its
home
is
required to retrieve
address requires
Indexed hash.
way
it
makes
it
this
possi-
493
494
HASHING
Mid-square method.
the key
is
representation of
squared and some digits from the middle of the result are
Minimum
is
wasted.
Open
Overflow. The
its
home
when
address.
file
are the
packing density
is
ol a collision occurring
when
in
a file is half
searching for
record in
a file.
to
remainders.
Progressive overflow. An overflow handling technique in which collisions are resolved by storing a record in the next available address
after its home address. Progressive overflow is not the most efficient
overflow handling technique, but it is one of the simplest and is adequate for many applications.
Randomize. To produce a number
(e.g.,
by hashing)
that appears to be
random.
Synonyms. Two
When
each
file
or
more
are
same address.
synonyms always
records whose keys
synonyms may be
EXERCISES
recognized
Uniform. Term
out evenly
as available
and
a record,
may
be reclaimed for
is
eas-
later additions.
among
randomizing algorithm.
EXERCISES
1.
Use
the function
hash(KEY,
MAXAD)
What
b.
Find two different words of more than four characters that are
is
synonyms.
c. It is assumed
need to
we
if
ways
have
a file
to get
it is important to understand the relationof the available memory, the number of keys to be
hashed, the range of possible keys, and the nature of the keys. Let us give
2.
In understanding hashing,
ships
between the
names
size
M=
the
number of memory
=
=
the
number of records
the
to be stored in the
memory
spaces;
key, which
may
characters.
Suppose h(K)
and
Ma.
is
1.
How many
How
are n
and
related?
495
496
HASHING
How
If the function h
would
3.
M related?
were minimum perfect hashing function,
and M be related?
c.
d.
are
n,
r,
and
The following
table
how
Function
Function
Function
<J(0)
0.71
0.25
0.40
*(1)
0.05
0.50
0.36
d{2)
0.05
0.25
0.15
d(3)
0.05
0.00
0.05
d(4)
0.05
0.00
0.02
d(S)
0.04
0.00
0.01
d(6)
0.05
0.00
0.01
d(l)
0.00
0.00
0.00
a.
Which of the
records that
b.
c.
d.
4.
Which
Which
Which
There
is
is
three functions
(if
approximately random?
is
nearest to uniform?
is
better
is
the
b.
assigned to
assigned (no
synonyms);
plus
one or
EXERCISES
6.
Consider the
file
number of overflow
expected
records
if the
What
the
is
as
a.
b.
7.
Make
1, 2, 5,
a table
and
10.
0.8,
that
There
is
works on block-addressable
number from a key,
disks as follows.
block number.
The corresponding
numbers
three
constitute the
home
whether or not
it
is
no need
3)
can direct the disk drive to search a track for the desired record.
empty record
home
a.
position, effectively
What
sive
about
is it
this
slot if a
memory
record
is
It
can even
not found in
it
superior to progres-
nized drives.
b.
size
of
Why
is
is
that
it
can be used
and
why
is it
disadvantage?
9. In discussing
implementation
issues,
we
file
by
actual data.
file
by an example
made
to a
file.
Tombstones
in
A number of additions
are to be used
where necessary
a.
to
497
498
HASHING
Operation
Add
Add
Add
Add
Add
Alan
Home
Address
Bates
Cole
Dean
Evans
Del
Bates
Del
Cole
Add
Add
Finch
Del
Alan
Gates
Add Hart
How
What would be
file
to deteriorate?
file
in the order
11. Suppose you have a file in which 20% of the records account for 80%
of the accesses, and that you want to store the file with a packing density of
and a bucket size of 5. When the file is loaded, you load the active 20% of
the records first. After the active 20% of the records are loaded, and before
times
it
dled by
a.
Progressive overflow; or
b.
Chaining to
a separate
overflow area?
535-539
records
EXERCISES
sorted (rather than hashed), these transactions are normally carried out by
some
of these
issues.)
We
able to
tell
every 4,000 because he observed that drafts of papers rarely contained more
than 20 errors, so one could expect
program to
cases where
it
fail
it
at
to detect a misspelled
might be reasonable
to report that a
key
exists
when
in fact
does not?
Programming Exercises
15.
Implement and
16. Create a
key
this
hashed
in each record
purposes of
test a
file
is
to be the
Begin by creating a sorted list of the names of all of the cities and towns in
California. (If time or space is limited, just make a list of names starting
with the letter '5'.)
499
500
HASHING
Examine
affect
each run:
0, 1, 2,
10,
and 10-or-
17.
Using some
set
of keys, such
as the
the
following:
a.
Write and
hash
files
test a
program
1, 2,
and
b.
5,
respectively,
length, the
maximum
and
pack-
overflow records.
a Poisson distribution, compare your results with the
expected values for average search length and the percentage of
records that are overflow records.
that are
c.
Assuming
18.
Repeat exercise
17,
19.
Repeat exercise
17,
number of keys
Assume
to available
program
is
the ratio of
home addresses.
that can
FURTHER READINGS
FURTHER READINGS
There are
related to hashing
and SorenTremblay, and Deutscher (1978). Textbooks concerned with file design
generally contain substantial amounts of material on hashing, and they often provide
extensive references for further study. Each of the following can be useful:
generally, including
(1975),
son,
Hanson
(1982)
of the issues
on comparing
is filled
we
introduce, and
different
file
file
results exploring
also contains a
all
good chapter
organizations.
much informa-
programming
One of the
is
the
501
Extendible Hashing
11
CHAPTER OBJECTIVES
Describe the problem solved by extendible hashing
and related approaches.
Explain
it
how
combines
Show how
tries
to
with conventional,
show how
static
hashing.
in-
cluding deletion.
Review
Examine alternative approaches to the same problem, including dynamic hashing, linear hashing, and
hashing schemes that control splitting by allowing
for
overflow buckets.
503
CHAPTER OUTLINE
11.1
11.2
Introduction
How
11.2.1
11.4.2
Tries
Directory
Handle Overflow
Implementation
11.5
11.3.4 Implementation
11.5.1
Directory
11.6 Alternative
Summary
Approaches
Dynamic Hashing
11.6.1
11.4 Deletion
11.4.1
Operations
Summary of the Deletion
Operation
11.4.5
11.2.3 Splitting to
11.3
Buddy Buckets
Overview of
the Deletion
Splitting
Process
11.1
Introduction
In Chapter 8
B-trees.
we began
B-trees
are
with
a historical
question
of the
is
file
structures that
emerge from
it.
set.
By
dynamic
The key
feature of both
AVL
As we add and
delete
AVL
trees
do
for storage in
records,
trees
and B-trees
mechanisms
is
to maintain themselves.
local
level.
HOW
file
EXTENDIBLE HASHING
WORKS
505
data storage and retrieval. Judging from the historical record, they are also
hard to develop. It was not until 1963 that Adel'son-Vel'skii and Landis
developed a self-adjusting structure for tree storage in memory, and it took
another decade of
dynamic
work
is
1 1
.2
How
Extendible Hashing
Works
11.2.1 Tries
idea behind extendible hashing is to combine conventional hashing
with another retrieval approach called the trie. (The word trie is pronounced
so that it rhymes with sky.) Tries are also sometimes referred to as radix
searching because the branching factor of the search tree is equal to the
number of alternative symbols (the radix of the alphabet) that can occur in
each position of the key. A few examples will illustrate how this works.
The key
Suppose
we want to
anderson, andrews,
11.1.
As you can
and
build a
baird.
see, the
trie is
shown
in Fig.
Since there are 26 symbols in the alphabet, the potential branching factor at
every node of the search is 26. If we used the digits 0-9 as our search
would be
one shown in
alphabet, rather than the letters a-z, the radix of the search
reduced to
Fig. 11.2.
10.
506
EXTENDIBLE HASHING
anderson
andrews
baird
FIGURE 11.1
Radix 26
trie
that indexes
names according
trie
that indexes
numbers according
to the letters
of the alphabet.
to the
HOW
Notice that
EXTENDIBLE HASHING
507
WORKS
we sometimes
We use
key.
in searching a trie
We
with
as
we
into a Directory
of two
tries
a radix
made on
in
a bit-by-bit basis.
10,
a trie that
allows us to
How
should
we
it
as a tree structure,
we
are forced to
we
flatten
it
into an array
11.4(b).
10 2
th
508
EXTENDIBLE HASHING
00
J
01
10
11
(a)
trie
from
(b)
11.3 transformed
Fig.
first into
complete binary
and then
tree,
key
issue in
overflows.
The
is
is
to find a
bucket
way
to
that
have to be
searched linearly.
Suppose we
overflow. In
in
Fig.
11.4(b)
to
beginning with
the 01 addresses in a
more complex
case.
split.
overflows.
How do we split bucket B and where do we attach the new bucket after the
split?
we do
bits
of address space that we can press into duty as we split the bucket. We now
need to use three bits of the hash address in order to divide up the records
that hash to bucket B. The trie illustrated in Fig. 11.6(a) makes the
distinctions required to complete the split. Figure 11.6(b)
trie
leaves at the
form of the
same
trie.
it is
extended into
level,
and
Fig.
completely
11.6(c)
shows
full
shows what
this
all
00
<
01
10
11
<
^
(
overflows.
(a)
000
001
;>
CZEZJ
CZD
(b)
(c)
first
as a
trie,
510
EXTENDIBLE HASHING
By
used in
building on the
search,
trie's ability to
we have doubled
grow
all
(or
about.
We
tries
make
to
extendible hashing; one might well ask where the actual hashing comes into
play. Why not just use the tries on the bits in the key itself, splitting buckets
and extending the address space as necessary? The answer to this question
grows out of hashing's most fundamental characteristic: A good hash
function produces a nearly uniform distribution of keys across an address
space. Notice that the trie shown in Fig. 11.6 is poorly balanced, resulting
is twice as big as it actually needs to be. If we had an
uneven distribution of addresses that placed even more records in buckets B
and D without using other parts of the address space, the situation would
get even worse. By using a good hash function to create addresses with a
nearly uniform distribution, we avoid this problem.
in a directory that
11.3
Implementation
11.3.1 Creating the Addresses
look
at
The
pseudocode
more
place to start
is
operations.
The hash
function itself
hashing algorithm
we
is
used in Chapter
10.
The only
difference
is
that
we do
not conclude the operation by returning the remainder of the folded address
divided by the address space.
We
integer. For
511
IMPLEMENTATION
FUNCTION hash(KEY)
set
set
set
if
SUM to
J
to
to
the key
return SUM
end FUNCTION
FIGURE
KEY
for a
high-order
By
out to be zero.
working from
right to
bit values.
left,
we
take
For example,
The
number of address
bill
lee
pauline
alan
julie
mike
elizabeth
mark
The
0110
0100 0010
1111 0110
1100 1010
00101110 0000
0000 0111 0100
1100 0110
001
01
0000
0000
0000
0000
0000
0100
0011
DEPTH
argument
tells
bits to return.
1100
1000
0101
of keys,
001
1 001
1101
1010
01
512
EXTENDIBLE HASHING
RETVAL to
/*
accumulate reversed
* /
string
0...001 mask to extract
*
low bit from N
to
bit
set MASK
to
HASH_VAL
:=
/*
/*
hash(KEY)
* *
*/
for
:=
to
RETVAL
LOWBIT
RETVAL
HASH_VAL
next
DEPTH
RETVAL left shifted one position
HASH_VAL bitwise ANDed with MASK
RETVAL bitwise ORed with LDWBIT
= HASH_VAL right
shifted one position
return RETVAL
end FUNCTION
FIGURE 11.9 Function make_address (DEPTH) gets a hashed address, reverses the
bits, and returns an address of DEPTH bits.
order of the
FIGURE 11.10
record structures.
KEYM
array
MAX_BUCKET_S ZE
hold k eys
[ 1
of
strings to
IMPLEMENTATION
513
Our
that references
them as necessary.
Each cell in the directory
Because
we
use
consists of a reference to a
directory records,
RAM. The
BUCKET
we implement
From
the
record.
to
one
size.
FIGURE 11.11 The driver, ex_init, and ex_close functions provide a high-level view
hashing program operation.
of the extendible
FUNCTION driverO
ex_i n i t (
call op_add() and op_find() as directed by the user
ex_c 1 ose(
end FUNCTION
FUNCTION ex_init()
open (or create, as necessary) the directory
and buc ket files
if the hash file already exists
read directory records into the array DIRECTORY
DIR_DEPTH := log 2 (size of DIRECTORY)
else
allocate an initial directory consisting of a
single cell
set DIR_DEPTH to
allocate an initial bucket and assign its address
to the directory cell
end i f
end FUNCTION
FUNCTION ex_close()
write the directory back to disk
close files
end FUNCTION
514
EXTENDIBLE HASHING
Note
that the
DIR_DEPTH
directly
is
related
to
directory, since
2
If
dir_depth =
we are starting
we are using
the
hash
file,
the
ln
DIRECTORY.
DIR_DEPTH
which means
the keys go
get the address of
zero,
We
their address.
initial,
is
that
into the
new
number of cells
the
it
all
cell.
Given a way to open and close the file, we are ready to add records. The
op_add and op_find functions are outlined in Fig. 11.12.
The op_find function turns the key into a directory address. Given this
address,
we do
and assign
a direct
FOUND_BUCKET
return
calls
we
we
FAILURE.
The op_add
in the
FOUND_BUCKET,
to
it
hash
file,
bk_add_key
to insert
if
the key
is
it.
When
not
op_add
calls
bk_add_key
bk_add_key
full,
bucket, however,
it
(Fig.
requires a
split,
which
is
it
where things
start
is
full
to
get
interesting.
look
at Fig. 11.6(a).
The keys
The
its
address
is
cells).
in
bucket
are distinguished
bit.
first
two
bits;
they
all
begin
them.
If
we
directory,
one of the buckets that is using fewer address bits than the
which
therefore is referenced from more than one directory
and
split
515
IMPLEMENTATION
second copy */
*/
bk_add_key(FOUND_BUCKET, KEY)
return SUCCESS
end FUNCTION
*/
/*
**
*/
FOUND_BUCKET
if
:=
if
the
bucket referenced by
DIRECTORYCADDRESS] .BUCKET_REF
else
return FAILURE
end FUNCTION
FIGURE 11.12 op_add and op_find functions.
<
MAX_BUCKET_S ZE )
bk_split(BUCKET)
op_add(KEY)
endi
end FUNCTION
key.
If
it
the bucket
is full,
it
516
EXTENDIBLE HASHING
FUNCTION bk_split(BUCKET)
/* if the depth used for the BUCKET addresses is
** already the same as the address depth in the
** directory, we must first split the directory
** to double the directory address space
*/
if
allocate NEW_BUCKET
/*
**
*/
f
/*
insert
*/
cell,
the
we
split.
cells to
we
point to the
split
bucket
new
bucket
in Fig.
after
11.6(c).
Before the split only one bit, the initial zero, is used to identify keys that
belong in bucket A. After the split, we use two bits. Keys starting with 00
(directory cells 000 and 001) go in bucket A; keys starting with 01 (directory
cells 010 and Oil) go in the new bucket. We do not have to expand the
directory because the directory already has the capacity to keep track of the
additional address information required for the
If,
as the directory,
such
as
split a
split.
buckets
Before
we
creating a
can
new
split the
bucket,
directory entry
new
address information.
517
IMPLEMENTATION
we
compare
the
we double
new bucket
Next we
get the
we need
addresses
for
we need for
we will use
the
that
number used
split.
Then we
new
for the
find the
bucket.
For
to the directory over this range, adjust the bucket address depth information
in
both buckets to
of an additional address
from
new
pseudocode
in Fig. 11.15.
To
see
how
it
bit,
It
is
described in
FIGURE 11.15 find_new_range function finds the start and end directory addresses
the new bucket by using information from the old bucket.
f
md_new_range(OLD_BUCKET
NEN_START, NEW_END)
*/
/*
**
**
**
**
**
*/
NEW_SHARED
NEW_SHARED
:=
:=
place
1)
BITS_TO_FILL := DIR_DEPTH - ( OLD_BUCKET DEPTH
set NEW_START and NEW_END to the NEW_SHARED value
for J :=
to BITS_TO_FILL
place
NEH_START := NEN_START left shifted
place
NEU_END := NEU_END left shifted
NEW_END := NEH_END bitwise ORed with
.
next
end FUNCTION
is
should point to
for
FUNCTION
and then
two buckets.
Fig.
518
EXTENDIBLE HASHING
11.6(c).
into a
to
When we
split
bucket
A we
FUNCTION dir_double()
/* calculate the current size and new size */
CURRENT_SIZE := 2 DIR - DEPTH
NEH_SIZE := 2 * CURRENT_SIZE
--
for
:=
to CURRENT_SIZE neh_dirc2*i .bucket_ref
d rectory
neh_dir[2*i+1 .bucket_ref
directory:
I
next
.bucket_ref
:=
.bucket_ref
end FUNCTION
519
IMPLEMENTATION
bits, the new bucket is attached to the directory cells starting with
010 and ending with Oil.
Suppose that the directory used a five-bit address instead of a three-bit
address. Then the range for the new bucket would start with 01000 and
would end with 01111. This range covers all five bit addresses that share 01
as the first two bits. The logic for finding the range of directory addresses
use three
for the
new
bucket.
It
then
fills
bits
is
we have
the
new
number of
The
The
range of directory
make
is
now
bit.
simply
cells to
the change.
Now that we have assembled all of the pieces necessary to add records to an
extendible hashing system,
The op_add
let's
see
how
the pieces
work
together.
op_add returns immediately. If the key does not exist, op_add calls
bk_add_key, passing it the bucket into which the key is to be added. If
bk_add_key finds that there is still room in the bucket, it adds the key and
the operation
is
is full,
bk_add_key
calls bk_split to
enough
to
accommodate
the
new
then allocates
new
bucket, attaches
it
is
The
function
new
key.
is
520
11.4
EXTENDIBLE HASHING
Deletion
11.4.1 Overview of the Deletion Process
If
extendible hashing
trees,
it
must be
AVL
is
able to shrink
files
gracefully as well as
we
delete a key, we need a way to see if we can decrease the size of the file
system by combining buckets and, if possible, decreasing the size of the
directory.
When do we combine
we
Look
again
the
at
combined? Trying
to
trie
in
Fig.
11.6(b).
first.
Similarly, there
no
is
B and
single
D are in
same configuration as buckets that have just split. They are ready to be
combined; they are buddy buckets. We will take a closer look at the
question of finding buddy buckets as we consider implementation of the
deletion procedure; for now let's assume that we combine buckets B and D.
After combining buckets, we examine the directory to see if we can
make changes there. Looking at the directory form of the trie in Fig.
1 1 .6(c), we see that once we combine buckets B and D, directory entries 100
and 101 both point to the same bucket. In fact, each of the buckets has at
least a pair of directory entries pointing to it. In other words, none of the
buckets requires the depth of address information that is currently available
in the directory. That means that we can shrink the directory and reduce the
the
its size.
Reducing the size of the address space restores the directory and bucket
structure to the arrangement shown in Fig. 11.4, before the additions and
splits that produced the structure in Fig. 11.6(c). Reduction consists of
collapsing each adjacent pair of directory cells into a single
since both cells in each pair point to the
11.4.2 A Procedure
for Finding
cell.
This
is
easy,
that this
procedure that
is
we use
Buddy Buckets
Given
this
521
DELETION
There
/*
**
*/
if
is
(there
(DIR_DEPTH == 0)
return N0_BUDDY
BUDDY_ADDRESS
:=
is
the address of
the
signal
522
EXTENDIBLE HASHING
Once we determine
we
that there
is a
to find
its
we have at hand;
this is the shared address of the keys in the bucket. Since we know that the
buddy bucket is the other bucket that was formed from a split, we know
address. First
that the
Once
buddy has
the
same address
buddy
is
in
all
illustrated
address,
we
last bit.
in Fig.
by buckets B and
flip
the last
bit.
We
return the
buddy bucket.
The
the
is
is
one of the principal potential benefits of deleting records. In our implementation we use one function to check to see whether downsizing is possible
and, if it is, to actually collapse the directory. Figure 11.18 shows
pseudocode for this function, called dir_try_collapse( ).
The function begins by making sure that we are not at the lower limit
of directory size. By treating the special case of a directory with a single cell
here, at the start of the function, we simplify subsequent processing: With
the exception of this case, all directory sizes are evenly divisible by two.
The actual test for the COLLAPSE_CONDITION consists of examining each pair of directory entries. We assume at the outset that we can
collapse the directory and then look for a pair of directory cells that do not
both point to the same bucket. As soon as we find such a pair, we know that
CONDITION
to false
The
directory that
is
references shared
new
half the size of the original and then copying the bucket
by each
the
new
directory.
Now
that
we have
deletion, finding
critical
we
are ready
DELETION
FUNCTION dir_try_collapse()
/* the directory is already at minimum size when
** the depth is zero
*/
if
/*
**
**
*/
(DIR_DEPTH == 0)
return FAILURE
check each pair of directory cells to see whether
each member references the same bucket -- if so,
we can collapse the directory.
to DIR_SIZE
:=
if
/*
by
then
it
*/
(DIRECTORY!! J] .BUCKET_REF
!=
next
/*
**
**
*/
if
endi f
J by
(C0LLAPSE_C0NDITI0N)
NEN_DIR_SIZE := DIR_SIZE / 2
allocate memory for NEW_DIR
for
:=
to
NEW_DIR[
next
J]
NEH_DIR_SIZE .BUCKET_REF :=
DIRECT0RY[2*J] .BUCKET_REF
1
return C0LLAPSE_C0ND
ON
end FUNCTION
FIGURE 11.18 The dir_try_collapse function first tests to see whether the directory
can be collapsed. If the test succeeds, the directory is collapsed.
523
524
EXTENDIBLE HASHING
KEY))
end FUNCTION
/*
**
*/
if
(KEY_REMOVED)
bk_try_combine( BUCKET)
return SUCCESS
else
return FAILURE
endi
end FUNCTION
FIGURE 11.19 The op_del and bk_del_key functions.
return the value reported back from the service function. Figure 11.19
describes op_del and the service function, bk_del_key
The bk_del_key
function does
second
which
its
work
in
pseudocode.
step,
bk_try_combine to see
if deleting the
d!
DELETION
525
bk_try_combine
created a
new buddy
for the
BUCKET;
it
may
may have
do even more
recursive combining and
be possible to
buddy
to
combine with.
FIGURE 1 1.20 The bk_try_combine function tests to see whether a bucket can be
combined with its buddy. If the test succeeds, bk_try_combine calls bk_combine to do the actual combination.
FUNCTION bk_try_combine(BUCKET)
/* If there i5 no baddy return right away
BUDDY := b k_f i nd_buddy(BUCKET)
if (BUDDY == ND_BUDDY)
return
/*
if
*/
bk_combine(BUCKET
BUDDY)
so,
bk_try_combine( BUCKET)
endi
end FUNCTION
next
526
EXTENDIBLE HASHING
11.4.5 Summary
If the
1.5
is
How
it
is
RAM, two
RAM,
accesses
grows and
of
file
work? As always,
trade-off between time and
well does
may
is
is all
so large that
it
that
is
ever required to
must be paged
in
and out
is
these access time values are truly independent of the size of the
file.
are
Questions about
complicated than questions about access time.
space utilization
is
that
no overflow,
more
We need to be concerned
about two uses of space: the space for the buckets and the space for the
directory.
for
Buckets
527
records
the average
b,
N
Space utilization, or packing density,
to the total
is
2'
b In
number of records
number of blocks
is
number of records
allocated space:
Utilization
we
So,
bN
N gives
us:
0.69.
In 2
we found
8,
where we looked
have
at
a
is
full.
So, B-trees tend to use less space than simple extendible hashing,
The average
few extra
seeks.
is
story; the other part relates to the periodic nature of the variations in space
utilization.
It
we
the
to split at the
same
As the buckets
up, space utilization
can reach past 90%. This is followed by a concentrated series of splits that
reduce the utilization to below 50% As these now nearly half-full buckets
fill up again, the cycle repeats itself.
fill
The
even when the number of keys is quite large. Just how large
should we expect to have, given an expected number of keys?
number of
is
different
ways
directory
developed
value for the directory size for different numbers of keys and different
bucket
sizes.
528
EXTENDIBLE HASHING
TABLE 11.1
for a given
total
number
of
20
10
100
50
200
10
10
10
10
10
1
68.20
M
M
M
11.64 M
25.60
424.10
6.90
111.11
K =
From
K
K
K
1.50
10
3
,
0.30
4.80
K
K
K
0.10
1.70
16.80
62.50
K
K
K
K
16.80
K
K
K
K
0.52
0.26
0.00
0.50
4.10
M
2.25 M
0.26
1.02
10
K
K
K
0.00
0.20
2.00
0.13
0.00
1.00
6
.
Flajolet, 1983.
Flajolet
also
provides
the
following
formula
for
making rough
estimates of the directory size for values that are not in this table.
that this
1 1
8.10
K
K
K
K
0.00
.6
size
size
-^j r (1
+ 1/
by
a factor
He
notes
of 2 to
4.
Alternative Approaches
11.6.1 Dynamic Hashing
and Strong produced their
paper describing a scheme
called dynamic hashing. Functionally, dynamic hashing and extendible
hashing are very similar. Both use a directory to track the addresses of the
buckets, and both extend the directory through the use of tries.
In 1978, before Fagin, Nievergelt, Pippenger,
The key
difference
size.
with
is
that
dynamic hashing,
As buckets within
overflow, they
split,
529
ALTERNATIVE APPROACHES
(a)-
)(
)C
40
X XIX
41
in
dynamic hashing.
the
two buckets
resulting
we have
split the
from the
split as
bucket
address
at
40 and 41.
We
two
We address
circle
because
to an internal
it
has
node
child nodes.
In Fig. 11.21(c)
new
4.
we
split the
downward
We
2, creating the
by
41,
trie
to include
First,
you
However,
if the
directory node
is
there
is
is
node
is
an
complete.
address information to guide you through the ones and zeroes that form the
530
EXTENDIBLE HASHING
trie.
Larson suggests using a second hash function on the key and using the
of this hashing as the seed for a random-number generator that
result
produces
sequence of ones and zeroes for the key. This sequence describes
trie.
interesting to
brief,
in
which
is
the buckets
it
as a
similarity,
is
the
it is
same (69%)
for
both approaches.
expressed differently,
it
The primary
difference
that
is
dynamic
hashing allows for slower, more gradual growth of the directory, wmereas
it. However, because
dynamic hashing must be capable of holding pointers
to children, the actual .size of a node in dynamic hashing is larger than a
directory cell in extendible hashing, probably by at least a factor of two. So,
the directory for dynamic hashing will usually require more space in
memory. Moreover, if the directory becomes so large that it requires use of
virtual
memory,
hashing uses
incur
no more than
a single
page
fault to
move through
it
fault.
may
Since
dynamic
be necessary to
the directory.
The key
dynamic hashing
is
that they
use a directory to direct access to the actual buckets containing the key
makes
it
number of buckets:
ALTERNATIVE APPROACHES
(a)
00
01
10
11
w
i
(b)
000
(c)
000
10
11
100
001
10
11
100
101
100
101
110
b
1
I
X
i i
(d)
000
001
010
d
1 1
z
i k
(e)
000
001
010
Oil
100
1( )1
110
D
111
531
532
EXTENDIBLE HASHING
Enbody and Du
is
adapted from
a description
with two
we
developed
a
more
key and
of linear hashing by
(1988).
earlier in this
second argument of 2.
Note
we
of hashed value
bits
(Fig.
11.22a) with an
we
split,
is
bucket w.
We add
c is
assume
between
that bucket
we
and C.
overflows.
The overflow
is
placed
x,
function to reach
new
new
buckets.
two hash
at
depth and an h d+l (k) function for the expansion buckets, finding
requires
to use. \{p
is
record
the next bucket to be split and extended, then the procedure for finding the
key k
is
as follows:
ihjk)> = P
address
h d (k)
ALTERNATIVE APPROACHES
533
else
address
hd +
(k)
Litwin (1980) shows that the access time performance of linear hashing
is no directory to access or maintain, and since we
extend the address space through splitting every time there is overflow, the
is
overflow chains do not become very large. Given a bucket size of 50, the
average number of disk accesses per search approaches very close to one.
Space utilization, on the other hand, is lower than it is for extendible
hashing or dynamic hashing, averaging around only 60%.
We know
it is
it
is
is
arbitrary, particularly
utilization
Suppose we
reaches
some
let
we
split a
such
75%
We
still
average number of
stays
as
an alternative
below
as
the
unsuccessful searches
file
desired figure,
of
triggering event.
This choice of
found
2.
can also use overflow buckets to defer splitting and increase space
utilization for
which use
attraction
these methods,
size
it
is
avoid
a split
when
the split
would cause
Consider the example that we used early in this chapter, where we split the
bucket B in Fig. 11.4(b), producing the expanded directory and bucket
structure shown in Fig. 11.6(c). If we had allowed bucket B to overflow
instead, we could have retained the smaller directory. Depending on how
much space we allocated for the overflow buckets, we might also have
534
EXTENDIBLE HASHING
improved space
utilization
ments, of course,
is
among
the buckets.
The
cost of these
chains.
developed
refinement of
this idea in
Scholl (1981)
are shared.
about
thesis
81%
research
1.1 seeks
per search. Veklerov (1985) suggested using buddy buckets for overflow
rather
than
allocating
chains
of
new
buckets.
This
is
an
attractive
suggestion,
utilization
SUMMARY
Conventional,
dynamic, that
static
grow and
file
is
one of
several hashing systems that allow the address space for hashing to
grow
and shrink along with the file. Because the size of the address space can
grow as the file grows, it is possible for extendible hashing to provide
hashed access without the need for overflow handling, even as files grow
many
The key
to extendible hashing
is
size.
more
bits
of the
hashed value as we need to cover more address space. The model for
extending the use of the hashed value is the trie: Every time we use another
bit of the hashed value, we have added another level to the depth of a trie
with a radix of two.
In extendible hashing we fill out all the leaves of the trie until we have
a perfect tree,
The
and then
we
array forms a directory to the buckets, kept on disk, that actually hold
we add
The
directory
is
address space represented in the directory can cover the use of this
new
bit,
KEY TERMS
no more changes
are necessary.
If,
bits
for
if
it is
RAM.
possible to
If the directory
is
a split.
single seek
if
the
to disk,
worst-case performance
bucket
and
size
There are
number of records.
number of other approaches
total
extendible hashing.
to the
problem solved by
The
more cumbersome but grows more smoothly. Space
is
as for
extendible hashing.
Linear hashing does
address space
overflow of
away with
in a linear sequence.
Although the
KEY TERMS
Buddy
x,
XOR
1.
535
536
EXTENDIBLE HASHING
Buddy
hashing since,
if
enough keys
buddy
Deferred splitting.
It is
possible to
improve space
is
a classic
more compact
we
storage.
The
by extending the
makes
it
possi-
directory, rather
Dynamic
as
buckets
split.
Extendible hashing. Like dynamic hashing, extendible hashing is sometimes used to refer to any hashing scheme that allows the address
space to grow and shrink so it can be used in dynamic file systems.
Used more
precisely, as
it is
used in
refers to
first
hashing, linear hashing does not use a directory. Instead, the actual
is extended one bucket at a time as buckets overflow.
Because the extension of the address space does not necessarily correspond to the bucket that is overflowing, linear hashing necessarily
involves the use of overflow buckets, even as the address space ex-
address space
pands.
EXERCISES
Splitting.
new
chapter
in this
new
make room
strictly
for
Trie.
key
is
level
level
can take.
wmmmmmmmam
EXERCISES
ex
1.
What
approach?
2.
The
tries
that
are
the basis
change
3.
if
we
use
a larger
procedure
for
How
does performance
radix?
about the
If the
we
redistribute keys
in bk_split
in
moving
7.
we
situation in
is
this?
which there
are
a
empty buckets
series
that can be
of recursive
calls
to
537
538
EXTENDIBLE HASHING
two
bk_try_combine. Describe
in the
8.
hash structure.
met before
buckets.
Why
tor a bucket
order for
procedure described in
empty
How
this
buckets. Describe
could
we modify
chapter,
two
in
the
situations
the procedures to
wasteful than an
empty bucket.
How
could
few records
we minimize
is
not
much
nearly empty
buckets?
makes use of overflow records. Assuming an unconimplementation where we split and extend the address
space as soon as we have an overflow, what is the effect of using different
bucket sizes for the overflow buckets? For example, consider overflow
buckets that are as large as the original buckets. Now consider overflow
buckets that can only hold one record. How does this choice affect
performance in terms of space utilization and access time?
12. Linear hashing
trolled splitting
13. In section
we
11.6.3
is
number of accesses
Why
is
unsuccessful searches?
Because linear hashing splits one bucket at a time, in order, until it has
reached the end of the sequence, the overflow chains for the last buckets in
the sequence can become much longer than those for the earlier buckets.
Read about Larson's approach to solving this problem through the use of
"partial expansions," originally described in Larson (1980) and subsequently summarized in Embody and Du (1988). Write a pseudocode
description of linear hashing with partial expansions, paying particular
14.
attention to
how
addressing
of buckets
utilization.
What
larger ones?
is
How
we
is
handled.
storage
FURTHER READINGS
Programming Exercises
Write
16.
key, the hash value, and the extracted, reversed address. Build a driver that
allows you to enter keys interactively for this function and see the results.
17.
in
a simplified version
pseudocode
Keep
Hold
RAM
rather than
on
disk;
Once you
as
buckets
Use
the
all
the
RAM.
(1980),
FURTHER READINGS
For information about hashing for dynamic
here,
to journal
articles.
files
The
that goes
best
approaches
539
540
EXTENDIBLE HASHING
especially
recommended.
Michel Scholl's 1981 paper titled "New File Organizations Based on Dynamic
Hashing" provides another readable introduction to dynamic hashing. It also
investigates implementations that defer splitting by allowing buckets to overflow.
Papers analyzing the performance of dynamic or extendible hashing often
derive results that apply to either of the
careful analysis
two methods.
size.
Mendelson
and goes on to discuss the costs of retrieval and deletion as different design
parameters are changed. Veklerov (1985) analyzes the performance of dynamic
hashing when splitting is deferred by allowing records to overflow into a buddy
results
number of papers
building
ideas associated with linear hashing. His 1980 paper titled "Linear
Hashing
on the
with Partial Expansions" introduces an approach to linear hashing that can avoid the
uneven distribution of the lengths of overflow chains across the cells in the address
Appendix A
File Structures
on
CD-ROM
OBJECTIVES
Introduce the commercially important characteristics
of
CD-ROM
storage.
storage
it
CD-ROM
541
OUTLINE
A.l Using this Appendix
A.2 Introduction to
A. 2.1
A. 2. 2
A. 5. 2 Block Size
A. 5. 3 Special Loading Procedures and
Other Considerations
A. 5. 4 Virtual Trees and Buffering
CD-ROM
CD-ROM
Short History of
CD-ROM
as a File Structure
Blocks
A. 5. 5 Trees as Secondary Indexes on
Problem
CD-ROM
CD-ROM
A.6 Hashed
A.4
CD-ROM
1
A. 4. 2
A. 4. 3
A. 4. 4
A. 4. 5
on
CD-ROM
Helps
A.6.4 Advantages of
Strengths and
Weaknesses
A.4.
Files
Read-Only
Seek Performance
Data Transfer Rate
Storage Capacity
Read-Only Access
Asymmetric Writing and
Reading
A.7 The
CD-ROM
CD-ROM
CD-ROM's
Status
File
System
CD-ROM
A.1
performance
characteristics
of
The
first
CD-ROM,
is
file
structures for
CD-ROM
to
review
to
tell
you about
commercially
is
to use the
many of the
the
important
problem of
design issues
in the text.
how this
performance
affects the
543
INTRODUCTION TO CD-ROM
we
introduce exercises
and questions throughout this discussion, rather than holding them to the
end. We encourage you to stop at these blocks of questions, think carefully
about the answers, and then compare results with the discussion that
follows.
A.2
CD-ROM
Introduction to
CD-ROM
CD audio
CD-ROM
is
an acronym for
disc
is
contains
that
digital
data
than
rather
That
can hold
digital
1"
It
is
sound.
of data and
megabytes of
a lot
data.
is
CD-ROM
is
cannot record on
it.
many
is
CD
publishing medium,
audio
disc:
You
information to
like
audio.
CD-ROM
is
CD-ROM
movies on
methods
disc.
in a disc,
much
industry spent
like a vinyl
great
deal
goal
was
number of
needle to
LP record
of money
""Usually
spell
it
we
with
spell disk
a
c.
with
a k,
among
is
to
544
APPENDIX
A: FILE
STRUCTURES ON CD-ROM
standards battles, the competing developers had not only spent enormous
sums of money, but had also lost important market opportunities. These
hard lessons were put to use in the subsequent development of CD audio
and
CD-ROM.
both
CLV
capacity, and a
CAV
seek performance.
frames quickly,
CAV
By
number of organizations,
including the
MIT Media
Lab,
and
entertain.
In the early 1980s, a
at the possibility
of
it
is,
after
all,
Different firms
in
make
it
a technically
more
desirable
medium
than the
CD-ROM;
in
one can build drives that seek quickly and deliver information
from the disc at a high rate of speed. But, reminiscent of the earlier disputes
over the physical format of the videodisc, each of these pioneers in the use
of LaserVision discs as computer peripherals had incompatible encoding
schemes and error correction techniques. There was no standard format,
and none of the firms was large enough to impose their format over the
others through sheer marketing muscle. Potential buyers were frightened
by the lack of a standard; consequently, the market never grew.
During this same period the Philips and Sony companies began work
on a way to store music on optical discs. Rather than storing the music in
the kind of analog form used on videodiscs, they developed a digital data
format. Philips and Sony had learned hard lessons from the expensive
particular,
players
1985.
Not
surprisingly,
LaserVision discs
first
the
saw
firms
that
CD-ROM
were delivering
digital
data
on
They
545
INTRODUCTION TO CD-ROM
however,
also recognized,
CD-ROM
CD-ROM
that
in the past: a
was guaranteed
manufactured by any
drive
of any disc
promised
what had
to provide
Anyone with
medium
such
to be used in
at
enough.
work
disc.
sectors into
settled
on
files, is like
how
relatively
moving
small,
into the
system to be
from the
were
letters are to
together
called
CD-ROM
built
industry
on top of the
CD-ROM
format. In
file
now emerged
as
official international
of which
standard
file
display of
a rare
all
larger firms
worked out
that
the
main
work
has
on
files
CD-ROM.
The
has
CD-ROM
begun
to
industry
show mature
is still
young, though
signs of
concern with
CD-ROM
applications;
CD-ROM
it
A. 2.
CD-ROM
CD-ROM
presents
medium with
slow,
interesting
it
structure problems
often taking
and showed
The
from
is
that if
that
is
seek performance on
RAM
access
is
RAM
it
is
CD-ROM
inexpensive, and
a
we compared
because
strengths of
file
CD-ROM
per seek.
is
In
is
very
the
CD-ROM
546
APPENDIX
A: FILE
STRUCTURES ON CD-ROM
the analogy stretches the disc access to over two and a half years! This kind
of performance, or lack of it, makes intelligent file structure design a critical
concern for CD-ROM applications. CD-ROM provides an excellent test of
our ability to integrate and adapt the principles we have developed in the
preceding chapters of this book.
A.3
Physical Organization of
CD-ROM
is
is
the child of
CD
CD-ROM
audio parentage
the market.
it is
It is
possible to
CD
is
make
Similarly, advances
not to provide
fast,
random
access to data.
objective biases
transfer
requires
rates,
from our
in the
file
medium
it
If
an application
itself.
CD-ROM
547
this signal is
not simply
matter of calling
by the
transitions
a pit a
from
and
pit to
land
a 0. Instead,
Every time the light intensity changes, we get a 1. The zeroes arc
represented by the amount of time between transitions; the longer between
and 0s has
of
Is
and
()s
that
form the bytes of the original data. This translation scheme, which is done
through a lookup table, turns the original 8 bits of data into 14 expanded
bits that can be represented in the pits and lands on the disc; the reading
process reverses this translation. Figure A.l shows a portion of the lookup
table values. Readers
players
may have
who
EFM
at
encoding.
EFM
CD
stands for
important to
we
EFM-
encoded data by
data is dependent on moving the pits and lands under the optical pickup at
a precise and constant speed. As we will see, this affects the CD-ROM
drive's ability to seek quickly.
CD-ROM
three miles
of
is
CAV
to the outer
edge of the
disc.
which
from CD
requires a lot of storage space, we want to pack the data on the disc as
tightly as possible. Since we "play" audio data, often from start to finish
is
part of the
CD-ROM's
heritage
FIGURE A. 1 A portion
EFM encoding table.
of the
548
APPENDIX
STRUCTURES ON CD-ROM
A: FILE
Constant angular
velocity
Constant linear
velocity
CAV
recording.
reads
Furthermore,
start
of
no matter where we
timing mark placed on the disk makes
eight
sectors,
are
it
on the
disc.
a sector.
The CLV format is responsible, in large part, for the poor seeking
performance of CD-ROM drives. The CAV format provides definite track
boundaries and a timing mark to find the start of a sector. The CLV format,
549
is
we
know where we
we need
to a specific
need to change
we need
to be
moving
know how
to
we
How
are.
at
jump
this
To
to
the
loop? In practice, the answer often involves making guesses, finding the
trial
down
seek
performance.
CD-ROM
A. 3.
a little better
than half
its
contributes to the
arrangement,
the
present capacity.
3 Addressing
The use of
sector
the
Instead,
CLV
CD-ROM.
we use a sector-addressing scheme that is related to the CD-ROM's
way of
identifying
work on
CD
is
CD-ROM,
disc
is
CD
disc,
whether used
for audio or
contains at least one hour of playing time. That means that the
capable of holding
at least
it is
is
CD,
A. 3. 4 Structure of a Sector
It is
way
that the
CD
When we want
to store sound,
shows
550
APPENDIX
A: FILE
STRUCTURES ON CD-ROM
32767
actual
wave
wave reconstructed
from sample data
sampling frequency
-32767
amplitude
at
of a
question of how
turns into
much
storage space
two other
amplitude sample,
CD audio uses
that
wave
different gradations.
means
65,536
digital
we
we need to store 88,200 bytes per second. Since we want to store stereo
sound, we need double this, storing 176,400 bytes per second. You can see
why storing an hour of Roy Orbison takes so much space.
551
actual
wave
wave reconstructed
from sample data
*
sampling frequency
FIGURE A.4 The effect of sampling at less than twice the frequency of the wave.
If
you divide
divides
up
this
"raw"
sector
CD
user data storage, along with addressing information, error detection, and
error correction information.
because, although
CD
The
would
The
result in an average
additional
error
20,000
12 bytes
necessary
The audio
it is
error
correction
in
every
discs.
synch
is
CD-ROM
4 bytes
sector ID
sector.
2,048 bytes
user data
4 bytes
8 bytes
error
detection
null
276 bytes
error
correction
552
A.4
APPENDIX
STRUCTURES ON CD-ROM
A: FILE
CD-ROM
As we say throughout this book, good file design is responsive to the nature
of the medium, making use of strengths and minimizing weaknesses. We
begin, then, by cataloging the strengths and weaknesses of CD-ROM.
The
random
the
is
is
random-access performance.
is
about 30
On a CD-ROM,
msec.
to a second or
A CD-ROM
is
CD-ROM;
it
can't
CD audio standard.
It is
modest
than the transfer rate for floppy disks, and an order of magnitude slower
than the rate for good Winchester disks.
makes
itself felt
when we
files,
such
as
transfer rate
those associated
to
the
we
can avoid as
much
more
seeking
as possible.
A. 4. 3 Storage Capacity
A CD-ROM
to use
up
holds
big
when
it
text
it is
possible
comes
with
2,400-baud
modem,
it
will take
about three days of constant data transmission, assuming errorless transmission conditions. Many typical text databases and document collections
published on
CD-ROM
The design
use only
benefit arising
a fraction
from such
of the
disc's capacity.
large capacity
is
that
it
enables us
553
A. 4. 4
From
Read-Only Access
design standpoint, the fact that
CD-ROM
significant advantages.
We
organization.
through
file
structures
We know
later additions
is
publishing medium,
provides
manufacture,
worthwhile
after
but also
means
that
is
it
file
or deletions.
A. 4. 5
files
are written
Often, reading and writing are both interactive and are therefore con-
by the need
strained
CD-ROM
is
different.
the disc
it
position
to
bring
substantial
computing power
to
the
task
of
file
much
on
small machines.
file
We make
on
structures only once; users can enjoy the benefits of this investment
A.5
CD-ROM
Tree Structures on
A. 5.1
Design Exercises
Tree structures are a good way to organize indexes and data on CD-ROM.
+
Chapters 8 and 9 took a close look at B-trees and B trees. Before we
discuss the effective use of trees on CD-ROM, think through these design
questions:
1.
2.
3.
How
How
B+
trees?
should you go in the direction of using virtual tree structures? How much memory should you set aside for buffering blocks?
+
How could you use special loading procedures to advantage in a B
far
4.
set
of records.
How
554
APPENDIX
tor
A: FILE
CD-ROM?
your
A.
STRUCTURES ON CD-ROM
in
reply.
Avoiding seeks
Consequently,
is
and
B-tree
tree
CD-ROM
file
structure design.
structures
are
it is
possible to provide
be
less
than 2 Kbytes.
consequently,
it
sector
does not
Since the
CD-ROM's
especially
when viewed
CD-ROM
make
is
is
some
on the
disc;
relative to
its
is
moderately
seeking performance,
composed of several
sectors.
it
is
the better part of a second seeking for the sector and reading
it,
fast,
usually
spent
reading an
additional 6 Kbytes to
msec. If
this
seek,
size
for
two- and
three-level
at
least
an
8-Kbyte block.
TABLE
A.
of
in
a B-tree of
Tree Height
One
Block
size
Block
size
Block
size
=
=
=
4
8
K
K
K
Level
Two
Levels
Three Levels
64
4,224
274,624
128
16,640
2,146,688
256
66,048
16,974,592
555
A. 5.3 Special
appear on the
You
B+
CD-ROM
of sequenced records. As we
+
the content of the index part of a B tree can consist
showed
a set
in Chapter 9,
of nothing more than the shortest separators required to provide access to
lower levels of the tree and, ultimately, to the target records. If these
shortest separators are only a few bytes long, as is frequently the case, it is
often possible to provide access to millions of records with an index that is
only tw o levels deep. An application can keep the root of this index in
RAM, reducing the cost of searching the index part of the tree to a single
seek. With one additional seek we are at the record in the sequence set.
+
Another attractive feature of B trees is that it is easy to build a
two-level index above the sequence set with a separate loading procedure
T
level
manage more
levels
CD-ROM
The
CD-ROM
large
storage
how we
design.
disc
has
relatively
on the disc; a few bytes here or there usually doesn't matter much
when you have 600 Mbytes of capacity. But when we design the index
store data
556
APPENDIX
A: FILE
STRUCTURES ON CD-ROM
structures for
CD-ROM, we
even counting
bits as
is
we pack
not, in
most
information into
cases, that
we
a single
are
sometimes
byte or integer.
The
but because packing the index tightly can often save us from making
file design the cost of seeks adds up very
quickly; the designer needs to get as much information out of every seek as
disc,
CD-ROM
an additional seek. In
possible.
A.
RAM
CD-ROM, we
will
significantly
buffering
is
intelligent
Buffering
is
most
to
in
useful
in
selecting
when
the
node
to
keep
tree's root
the
want
The
can
particularly
to
sometimes
when
the
Note
we
discussed earlier as a
way
which
to the data.
A. 5.
Typically,
the data
CD-ROM
on the
disc.
CD-ROM
applications provide
access route to
give direct access to the documents, so you can page through them in
call
described in Chapter
6,
the data.
read-only
HASHED
FILES
ON CD-ROM
557
CD-ROM,
written to the
they
frequently reorganized
may
be years, or
may
be
as short as a
may
disc.
CD-ROM. CD-ROM
on the
fail
file
best.
The key
issue
from
disc can,
the
file
designer's standpoint
is
to recognize that
the alternatives exist, and then to be able to quantify the costs and benefits
of each.
A.6
Hashed
A. 6.1
Files
on CD-ROM
Design Exercises
Hashing, with
its
organize indexes on
CD-ROM. We begin
retrieval,
is
an excellent
way
to
intersect
manipulate are
Bucket size;
Packing density for the hashed index; and
The hash
function
itself.
558
APPENDIX
A: FILE
STRUCTURES ON CD-ROM
The following
efficient
questions,
you
CD-ROM
to
applications.
What
2.
3.
How
CD-ROM
Since a
CD-ROM
is
read-only,
you have
to be
disc.
complete
How
list
assist in
of the keys
Bucket Size
A. 6. 2
CD-ROM
you
applications,
will
want
How
A.6.3
Packing
the Size of
hashed
additional seeking.
bucket
we
loosely
good
rule
is
Helps
another
of thumb
is
way
that,
all
to
size,
overflow almost
10,
file
CD-ROM
60%
moderate
randomly
bucket
size
search to 1.01.
When
there
disadvantage to
virtually eliminated.
THE CD-ROM
A. 6. 4
Advantages
of
CD-ROM
more
the
is
CD-ROM
90%
full?
most of the
digitized
559
SYSTEM
FILE
is
disc,
fairly
common. Large
text
file
storing
is
text,
harder to
sell
reduction.
The
space.
we do
calculations that
we
could find
a
a
randomly,
we
could achieve
100% packing
density
and no overflow.
that
we produce
we do
expecting
a distribution that is
that
read-write environment.
When
it
settle for
we
we
can select
hash
we
have to hash. If our performance and space constraints require it, we can
develop a hash function that produces no overflow even at very high
packing densities. We identify the selected hash function on the disc, along
with the data, so the retrieval software knows how to locate the keys. This
relatively expensive and time-consuming function-fitting effort is worthwhile because of the asymmetric nature of writing and reading CD-ROMs;
the one-time effort spent in making the disc is paid back many times as the
disc
A.7
is
distributed to
The CD-ROM
many
File
A. 7.1
The Problem
When
the
users.
System
together to begin
work on
common
file
CD-ROM
applications
came
560
APPENDIX
A: FILE
STRUCTURES ON CD-ROM
and
Support the use of generic
files
seeks;
file
names,
as in "file*.c",
during directory
access.
The
way
usual
more than
directories as nothing
you
notation,
to
a special
file
with the
kind of
full
file.
is
If,
to
the
treat
using
UNIX
path
(/)
of the directory
file
you open
to find mydir,
cdrom,
and many
CD-ROM developer,
must seek
it is
to,
CD-ROM,
that before
is
we
the standpoint of a
such
At
we
very unresponsive
file
system.
A. 7.
2 Design Exercise
initial
file
system,
at a
standard
CD-ROM
this
tory systems on
that
ROM. One
child,
left
CD-ROM.
were commercially
available
CD-
building a
shown
tailored to
a single file,
file
containing the
in Fig.
shown
in Fig. A. 8.
Considering what you know about CD-ROM (slow seeking, readonly, and so on), think about these alternative file systems and try to answer
the following questions. Keep in mind the design goals and constraints that
were facing the committee (hierarchical structure, fast access to thousands
of
files,
1.
2.
use of generic
file
names).
THE CD-ROM
FILE
SYSTEM
561
ROOT
REPORTS
LETTERS
CALLS.LOG
WORK
SCHOOL
ZATX
S2.RPT
Sl.RPT
WLRPT
Wl.LTR
P2.LTR
Pl.LTR
FIGURE A.7 Left child, right sibling tree to express directory structure.
ROOT
I
REPORTS
fc
LETTERS
CALLS.LOG
I
PERSONAL
ISC
fr
ABC.LTR
SCHOOL
h WORK
I
I
Sl.RPT
Wl.RPT
S2.RPT
XYZ.LTR
fc
WORK
i
Pl.LTR
b>
P2.LTR
Wl.LTR
562
APPENDIX
A: FILE
STRUCTURES ON CD-ROM
/REPORTS/SCHOOL/S 1 .RPT
/REPORTS/SCHOOL/S2.RPT
/LETTERS/PERSONAL/P 1 .LTR
/LETTERS/PERSONAL/P2.LTR
Table
of
Path
Hash Function
/LETTERS/ABC.LTR
Names
/LETTERS/XYZ.LTR
/LETTERS/WORK/W1 .LTR
/CALLS.LOG
FIGURE A.8 Hashed index of
file
pathnames.
works well
as
long
file,
as
with the
left-child,
the
file
does
when
fits
RAM
each directory
is
a separate file.
Hashing the path names, on the other hand, provides single-seek access
to any file but does a very poor job of supporting generic file and directory
names, such as prog* .c, or even a simple command such as Is or dir to list
all
the
files
in a given subdirectory.
of the
files
By
definition, hashing
them over
randomizes the
letters
by building
file
work with
all
the
as
files
Is
and
In short,
we build
and then solve the access problem by building an index for the subdirectories.
563
SUMMARY
Parent
RRN
1
3
4
5
-1
Root
Report 5
Letters
School
Work
Persona 1
Hork
2
2
This
is
settled on.
is
But
highly
hashing the path names of the subdirectories. Figure A.9 shows what
index structure looks like
when it is
this
is
through the directory files. The directories are ordered in the index so
parents always appear before their children. Each child is associated with an
integer that is a backward reference to the relative record number (RRN) of
directory
the parent. This allows us to distinguish between the
directory under LETTERS. It also
under REPORTS and the
allows us to traverse the directory structure, moving both up and down
with a command such as the cd command in DOS or UNIX, without
having to access the actual directory files on the CD-ROM. It is a good
example of a specialized index structure that makes use of the organization
inherent in the data to produce a very compact, highly functional access
WORK
WORK
mechanism.
Summary
CD-ROM
must look
software.
to careful
file
fast,
responsive retrieval
564
APPENDIX
STRUCTURES ON CD-ROM
A: FILE
B-tree and
B+
on
sector size
work
tree structures
provide access to
ability to
CD-ROM
is
many
it is
few
because of their
Because the
seeks.
seek so slowly,
CD-ROM
well on
size.
Because
CD-ROM
drives
filled.
When
CD-ROM
makes
it
indexes tightly to the target data, pinning the index records to reduce
seeking and increase performance.
Hashed indexes
are often a
good choice
for
trees, the
The bucket
size
2-Kbyte
should be
one or more full sectors. Since CD-ROMs are large, there is often enough
space on the disc to permit use of packing densities of 60% or less for the
hashed index. Use of packing densities of less than 60%, combined with a
bucket size of 10 or more records, results in single-seek access for almost all
records. But it is not always possible to pack the index this loosely. Higher
packing densities can be accommodated without loss of performance if we
hash function to the records in the index, using
tailor the
provides
more
we know
function that
that there will
often worthwhile
functions. This
of
densities
In 1985
is
90%
especially true
there
or more.
companies trying
faced an interesting
common
when we need
it is
of several hash
file
to build the
structure problem.
file
CD-ROM
They
system for
in use
publishing market
needed a
At the time,
on
CD-ROM.
Moving from
another
it
CD-ROM,
file.
in a wait
as
of
series
since
it
files
to
could result
a single file.
files.
means seeking
Simple
file,
or
map of the
directory
SUMMARY
565
structure. Typical
index
file
illustrates
of other
the
structure design.
Appendix B
ASCII Table
40
20
41
21
42
43
44
45
46
47
22
23
24
25
26
27
28
29
64
65
66
67
68
69
70
71
2F
N
O
72
73
74
75
76
77
78
79
60
30
61
31
62
63
64
65
66
67
32
33
34
35
36
37
70
38
39
3A
3B
stx
etx
cot
enq
ack
&
bel
bs
10
40
50
ht
11
41
51
nl
10
12
11
13
np
12
14
cr
13
15
so
14
16
si
15
17
42
43
44
45
46
47
52
vt
die
16
20
10
del
17
21
11
dc2
dc3
dc4
18
12
13
51
20
14
nak
21
15
syn
22
23
22
23
24
25
26
27
48
49
50
16
17
52
53
54
55
30
18
31
19
32
33
34
35
36
37
1A
"
'
etb
19
rs
24
25
26
27
28
29
30
us
31
can
cm
sub
esc
fs
gs
566
1C
ID
<
56
57
58
59
60
61
IE
>
IF
IB
32
33
34
35
36
37
38
39
sol
62
63
53
54
55
56
57
71
72
73
74
75
76
77
2A
2B
2C
2D
2E
3C
3D
D
E
42
43
44
45
46
47
110
48
49
68
69
111
160
70
51
P
q
112
113
161
71
122
52
123
53
54
55
56
57
114
115
116
117
118
119
162
163
164
165
166
167
72
73
74
75
76
77
58
59
120
170
121
171
78
79
122
172
173
174
175
176
177
130
150
4F
88
89
90
62
63
64
65
66
67
82
83
84
85
86
87
61
142
143
144
145
146
147
4E
60
152
153
154
155
156
157
4A
121
140
141
104
105
106
107
108
109
110
111
81
3E
3F
102
103
104
105
106
107
Q
T
102
103
50
101
41
120
101
80
96
97
98
99
100
40
4C
4D
100
112
113
114
115
116
117
124
125
126
127
131
4B
k
1
u
V
132
133
5A
91
5B
92
93
94
95
134
135
136
137
5C
5D
123
124
5E
125
126
5F
del
127
151
6A
6B
6C
6D
6E
6F
7A
7B
7C
7D
7E
7F
Appendix C
String Functions
in
make up
on
TYPE
strng
packed array
The length of
the strng
is
MAX_REC_LGTH
of
char;
character representative of the length. Note that the Pascal functions CHR(
and ORD() are used to convert integers to characters and vice versa.
Functions include:
Returns the length of str.
len_str(str)
by
dear_str(str)
Clears
cat_str(strl,str2)
str
setting
Concatenates
str2 to
Puts result in
strl.
Reads
write _str(str)
Writes contents of
Reads
str as
a str
Trims
trim_str(str)
first,
to the screen.
str
with length
trailing blanks
Returns length of
Converts
ucase(strl ,str2)
key)
strl to
Combines
0.
end of strl.
Igth
fwrite_str(fd,str)
makekey (last,
length to
to strl.
read_str(str)
fread_str(fd,str, Igth)
its
last
from
file fd.
file fd.
from
str.
str.
str2.
in canonical
Returns the
Compares
If strl
If strl
If strl
<
>
minimum
of two integers.
strl to str2:
str2,
cmp_str returns
0.
str2,
str2,
returns a positive
number
567
568
APPENDIX
C:
STRING FUNCTIONS
IN
PASCAL: tools.prc
>
END;
eger
BEGIN
for
:=
strl
strl CO]
en_s t
str2C
str2[01
to
1
[
:=
r ( s t r
i
DO
END;
in
It
strl
integer
>
BEGIN
for
:=
to 1 en_s t r (
strl [(len_str(str1
strHO]
:=
DO
5 t r 2 )
) +
i)l
CHR(len_str(str1
:=
)
str2Ci];
1 en_s t r (
s t r 2 ) )
END;
gt h
nt eger
BEGIN
lgth
:=
lgth
=
lgth +
read (strtlgthl)
:
END;
read 1 n
str CO]
END;
;
:
CHR(lgth)
>
strng
569
>
VAR
i
eger
BEGIN
for
wr
:=
to
wr i t e(
t eln
s t r
len_str(str) DO
[ i 3
END;
nt eger
BEGIN
for
str [0]
to
readC
f d
lgth DO
s t r
CHR(lgth)
END;
strng);
VAR
i
eger
BEGIN
for
:=
wr
i t
e(
to
f d
len_str(str) DO
s t r
END;
nt eger
BEGIN
lgth := 1 en_5 t r (
while strtlgth]
s t r )
=
'
lgth
=
lgth strCO]
=
CHR(lgth);
lgth
t r im_5 t r
=
:
END;
DO
integer);
570
APPENDIX
C:
STRING FUNCTIONS
IN
PASCAL: tools.prc
END;
str2[0]
:=
)) then
strl [0]
END;
PROC EDURE makekey (last: strng; first: strng; VAR key: strng);
{
ma kekey() trims the blanks off the ends of the strngs last and
f i r st concatenates last and first together with a space
se parating them, and converts the letters to uppercase }
VAR
integer
lenl
integer;
lenf
blank_str: strng;
BEGIN
lenl := t r im_s t r ( las t )
copy_str (key last);
blank_str[0] := CHRC1 );
;
blank_str[1 ] :=
:
'
cat_str(key blank_str)
T
lenf
:=
t r
im_s t r ( f
rst )
cat_5 1 r(key,first);
ucase( key key)
,
END;
intl
<=
int2 then
min
nt
min
nt
else
END;
integer;
negative number.
positive number.
Or
if
strl
>
str2,
VAR
i
integer
length
i nt eger
BEGIN
if len_str(str1 )
BEGIN
:
:=
while strl
:=
if
[1
len_str(str1
cmp_5 t r
str2Ci] DO
1)
(i
then
else
cmp_str := (ORDC s t r 1 [ i ]
END
else BEGIN
length := mi n( len_s t r ( s t r 1
i
:=
if
len_s t r (
s t r
2)
while (striCil
:=
(ORDC s t r2
) )
length then
cmp_str := len_s t r ( s t r
cmp_str
) )
>
len_s t r ( s t r2)
else
END
END;
:=
(0RD( s t r 1
(0RD(
s t
r2
i ]
57
Appendix D
Comparing Disk Drives
time
it
settle to a stop.
from
is
a standstill,
move one
such
minimum
as likely to
One
track,
and
given, with a
is
is
it
clearly.
as
it
is
on any
other. In a
drives,
Fixed head drives provide one or more read/write heads per track, so there
is
no need
very
fast,
There
to
move
the heads
from
no
among
similar drives. Most floppy disk drives rotate between 300 and 600 rpm.
Hard disk drives generally rotate at approximately 3600 rpm, though this
will increase as disks decrease in physical size. There is at least one drive that
rotates at 5400 rpm, and speeds of 7200 rpm are possible. Floppy disks
usually do not spin continuously, so intermittent accessing of floppy drives
might involve an extra delay due to startup of a second or more. Strategies
572
such
as sector interleaving
some circumstances.
The volume of data
years, thereby focusing
rate
from
a single
drive
much
is
attention
on
enormously
in
in recent
Data transfer
itself, and the speed at which the controller can pass data
through to or from RAM. Since rotation speeds vary little, the main
differences among drives are due to differences in recording density. In
recent years there have been tremendous advances in improving recording
densities on disks of all types. Differences in recording densities are usually
expressed in terms of the number of tracks per surface, and the number of
bytes per track. If data are organized by sector on a disk, and more than one
sector is transferred at a time, the effective data transfer rate depends also on
the method of sector interleaving used. The effect of interleaving can be
substantial, of course, since logically adjacent sectors are often widely
on the disk
separated physically.
A
from
different
rate
is
called
to access data
PTD
(parallel
among
Of
different drives.
among
drives,
there are other important differences. The IBM 3380 drive, for example, has
many built-in features, including separate actuator arms that allow it to
perform two accesses simultaneously. It also has large local buffers and a
it to optimize many operations that,
have to be monitored by the central
573
sC
Is
-w
<J so
C/5
C/3
co
r^
rf
in
cm
oc
vc
CM
in
"O
O.SO
< s
z 8
m in
m o>
c
13 JZ
cs
.* 13++
J2 -S
t-h
vC
00
0 CO
o co
ffl
oc
CO
oc
sc
r-
moo
r-
tN
co
C<
Efl
-^
lib
73
m
CO
CM
oc
CM
CM
-3
_
ed
C/5
C
u
r-,
nj
V5
&
CM
CM
^^
(N
3\
rn .5 uu
o
oc
y:
^
4-1
y:
O
>
n
t/5
Si
O
>
T3
'
in
7^-
^j
n j
p
e a
T3
O
C/5
(/;
t/5
(U
'-*-
<u
"5 5c
TO
Q.
Tj co
.s
E
o
Ih
CJ
T3
-^
c/3
574
;/i
i-
<
-=
H U
>-
-^
~
r;
J-i
W)*T3
H u
"T3
to
s-
.s
pq
ij
CJ
c/J
o
>
>
>
o
o
3 S
m a,
u tfl w
(U
>
'o
"T3
4-1
'
(j
r;
T3
X)
^ H u 2
u
C/5
t/5
(J
(/5
u u >
'13
Bibliography
AT&T.
System
V Interface
AT&T,
1986.
S.
ACM
1,
J.
2,
no.
(March
ACM
Transactions on Database
1977): 11-26.
ACM 28,
Bohl,
M.
Introduction to
IBM
Borland. Turbo Toolbox Reference Manual. Scott's Valley, Calif: Borland International, Inc., 1984.
ACM
cember 1985.
"Minimal
Chichelli, R.J.
the
ACM 23,
no.
of
ACM Computing
1979): 121-137.
New
York:
Co., 1983.
575
576
BIBLIOGRAPHY
thesis,
Oklahoma
Oklahoma
D024A-TE.
Digital.
1984.
Communications of the
ACM 25,
ACM 24,
no.
how
ACM
Computing
and H.C. Du. "Dynamic Hashing Schemes."
Qune 1988): 85-113.
Fagin, R., J. Nievergelt, N. Pippenger, and H.R. Strong. "Extendible hashTransactions on Database
ing
a fast access method for dynamic files."
Systems 4, no. 3 (September 1979): 315-344.
Computing Surveys 17, no. 1
Faloutsos, C. "Access methods for text."
(March 1985): 49-74.
Flajolet, P. "On the Performance Evaluation of Extendible Hashing and Trie
Searching." Acta Informatica 20 (1983): 345-369.
Enbody,
R.J.,
ACM
ACM
Flores,
I.
Peripheral Devices.
Englewood
Gonnet, G.H. Handbook of Algorithms and Data Structures. Reading, Mass.: Addison-Wesley, 1984.
Hanson, O. Design of Computer Data Files. Rockville, Md.: Computer Science
Press, 1982.
ACM 21,
M.
ACM 24,
Award
ad-
"VSAM
IBM
Systems
UNIX Programming
Environment.
Englewood
Cliffs,
577
BIBLIOGRAPHY
wood
1,
Fundamental Algorithms. 2d
Knuth, D. The Art of Computer Programming. Vol. 3, Searching and Sorting. Reading, Mass.: Addison-Wesley, 1973b.
Lang, S.D., J.R. Dnscoll, andJ.H. Jou. "Batch insertion for tree structured file
organizations
Flor.
and
UNIX
Prentice-Hall, 1987.
Larson, P.
18 (1978): 184-201.
ACM
(March
75-89.
1985):
Larson, P. "Linear Hashing with Partial Expansions." Proceedings of the 6th Conference on Very Large Databases. (Montreal, Canada Oct 1-3, 1980) New
ACM/IEEE: 224-233.
York:
ACM
Laub, L.
S.
"What
is
CD-ROM?"
Ropiequet, eds.
(December
1982):
566-587.
and
Leffler, S.,
Implementation of the
son-Wesley, 1989.
Levy, M.R. "Modularity and the sequential
file
ACM
1-3, 1980)
Litwin,
W.
New
York:
"Virtual Hashing:
Canada, Oct
ACM/IEEE: 212-223.
New
"'
Proceedings oj
York:
ACM/
IEEE: 517-523.
Loomis, M. Data Management and
File Processing.
Englewood
Cliffs, N.J.:
Pren-
tice-Hall, 1983.
Lum, V.Y.,
ACM
Company,
J.J.
Inc.,
Donovan.
New
1985.
Operatifig Systems.
Englewood
Cliffs. N.J.:
Prentice-Hall, 1974.
ACM Computing
Surveys
578
BIBLIOGRAPHY
McCrcight, E. "Pagination of
cations
of the
ACM 20,
McKusick, M.K.,
W.M.
ACM
UNIX."
B*
trees
Joy,
S.J. Leffler,
Transactions on
Computer Systems
2,
"A
fast file
system for
181-197.
Mendelson, H. "Analysis of Extendible Hashing." IEEE Transactions on Software
Engineering 8, no. 6 (November 1982): 611-619.
Microsoft, Inc. Disk Operating System. Version 2.00.
UNIX
IBM
Personal
System V.
New
Computer
York:
Mc-
Graw-Hill, 1987.
Murayama,
memory
ACM 20,
245-254.
Nievergelt,
J.,
metric, multikey
1
file
structure."
ACM
file:
an adaptive sym-
Ouskel, M., and P. Scheuermann. "Multidimensional B-trees: Analysis of dynamic behavior." BIT 21 (1981):401-418.
Pechura, M.A., and J.D. Schoeffler. "Estimating file access of floppy disks."
Communications of the
26, no. 10 (October 1983): 754-763.
ACM
Peterson, J.L., and A. Silberschatz. Operating System Concepts, 2nd Ed. Reading,
W.W.
and Development
1,
IBM fournal
of Research
no. 2(1957):130-146.
New
ACM
Ritchie,
Hill, N.J.:
AT&T
Bell Laboratories,
1979.
dynamic indexes."
ACM SIGMOD
ACM
ACM
Hill, 1983.
al.
Englewood
"FastSort:
of Data,
Scholl,
M. "New
tions
file
on Database Systems
6,
no.
(March
1981): 194-211.
ACM
Transac-
579
BIBLIOGRAPHY
model."
"On
Snyder, L.
ACM 21,
no. 7 (July
1978): 594.
J. P. Tremblay and R.F. Deutscher. "Key-to-Address Transformation Techniques." INFOR (Canada) Vol. 16, no. 1 (1978): 397-409.
Spector, A., and D. Gifford. "Case study: The space shuttle primary computer
system." Communications of the
27, no. 9 (September 1984): 872-900.
Standish, T.A. Data Structure Techniques. Reading, Mass.: Addison-Wcsley, 1980.
Sun Microsystems. Networking on the Sun Workstation. Mountain View, CA: Sun
Microsystems, Inc., 1986.
Sussenguth, E.H. "The use of tree structures for processing files." Communications of the
6, no. 5 (May 1963): 272-279.
Sorenson, P.G.,
ACM
ACM
Sweet,
F.
Teory,
T.J.,
and
1,
1985): 119-120.
J. P.
Englewood
Cliffs, N.J.:
Prentice-Hall, 1982.
SIGPLAN Notices
Forward
to the can-
28-44.
and P.G. Sorenson. An Introduction to Data Structures with Applications. New York: McGraw-Hill, 1984.
Ullman, J. Principles of Database Systems, 2d Ed. Rockville, Md.: Computer Scididate extension library."
Tremblay,
J. P.,
Ullman, J.D.
U.C. Berkeley.
UNIX Programmer's
of California
Berkeley, 1986.
VanDoren,
the
"Some
J.
NSF-CBMS
VanDoren,
J.,
and
AVL
trees." Proceedings of
J.
Gray.
In Information Systems,
"An
of Missouri
at
Columbia
(July 1973):
COINS
IV,
New
46-62.
AVL
trees."
161-180.
Veklerov, E. "Analysis of Dynamic Hashing with Deferred Splitting."
Transactions on Database Systems 10, no. 1 (March 1985): 90-96.
ACM
Wagner, R.E. "Indexing design considerations." IBM Systems Journal 12, no. 4
(1973): 351-367.
Wang, P. An Introduction to Berkeley Unix. Belmont, CA: Wadsworth Publishing
Co., 1988.
Webster, R.E.
sity,
"B +
trees."
Unpublished Master's
thesis,
Oklahoma
State Univer-
1980.
Welch, T.
"A Technique
for
Wells, D.C.,
8-19.
Flexible
Image Transport
580
BIBLIOGRAPHY
Wirth, N.
"An
IEEE
Transac-
tions on
Yao, A. Chi-Chih.
159-170.
Zocllick, B.
"CD-ROM
11, no. 5
(May
1986):
173-188.
Zoellick, B. "File
rus. S.
1986: 103-128.
Zoellick, B. "Selecting an
ume
Approach
2: Optical Publishing. S.
1987: 63-82.
to
Ropiequet, ed.
Index
explanation
of,
FITS image
as
128
Access. See
124-125, 132
example
Random
Record
of,
access;
access
trees
explanation
B*
trees,
B+
trees
447, 493
overflow
sector, 46
use of,
114, 115
352-362
Assign statement, 9
list
of variable-length records,
196-198
Average search length
of,
492
number of collisions
and, 476
progressive overflow and,
469-471
record turnover and, 482
433
trees vs.,
and,
of,
553-555
347
deletion, redistribution,
and
558
explanation
trees
B-trees
construction
Avail
431-433
insertion,
of,
377-379
invention
of,
334-336
383
of order m, 364, 382
order of, 362-364, 364, 382.
383
page structure used by, 253.
352
splitting and promoting, 347leaf of,
189
fit
of, 217
placement strategies, 202
Better-than-random, 492-493
Binary encoding, 137-138
Binary search
explanation of, 204. 2(>5. 217
of index on secondary
storage, 234, 336
limitations of, 207-208, 228
explanation
422
Binary search
trees
balanced. 4
explanation
of,
337340
indexing,
Bkdddkey
249-250
519
516-519
524-
BktryColl<ip<c function.
351
underflow
in,
Best
and
CD-ROM
table,
436
429-431
ASCII
UNIX,
of, 4, 6, 413,
126
431-433
B+
in headers,
553-555
and,
explanation
also
4,
K-wav, 314-315
ASCII
340-343,
372-373, 382
CD-ROM
46-48
in
of, 6,
382
and files, 4
home,
use of,
AVL
access; Sequential
Access mode, 29
Addresses
block,
in,
408
526
581
582
INDEX
Bk try combine
function,
524-
526
Block addressing, 46-48, 471
Block device, 82
Block I/O
explanation of, 83
UNIX. 46
use of,
Block
and
Byte count
Byte offset
file
78-79
and order
of,
109
63-68
of,
number
a predictable
of, 101
on performance, 53-54
of, 3,
45-47
Bpi, 82
392-393
394-396
Btutil.prc, 400-404
Btutil.c,
Buckets
buddv. 520-522, 535-536
and effect on performance,
472-476
explanation
493
closing
LIST program
tries
and,
507-509
of,
535-536
Buttering
bottlenecks
69
in,
explanation
68
282-283, 287
multiple, 69-72, 283
disks and cache
input,
RAM
memory as, 55
during replacement selection,
303-304
and virtual
Collision resolution
155-156
trees,
373-377
156-157
94-95,
154-155
CD-ROM,
556
488
velocity),
544^547-549
memory (CD-ROM).
CD-ROM
Compaction
62
explanation
448-449
methods of reducing, 449450, 462-466
predicting, 457, 461-466
Color lookup table, 128
Color raster images, 128-129
Comer, Douglas, 334-336, 363
in hashing,
98, 99,
Cache, 55h
Canonical form
explanation of, 144
for keys, 110
in secondary indexes, 237
Cascade merge, 316
CAV (constant angular
of, 543,
563-565
problem.
545-546
system and, 79, 559-563
hashed files and, 557-559
file
tables, 487,
Collisions
explanation
as file structure
162-
writstrm.c,
CD-ROM,
by chained progressive
overflow, 484-486
by chaining with separate
overflow area, 486- 487
by double hashing, 483
by progressive overflow,
466-471
and scatter
162
166
writrec.c,
Buddy buckets
explanation
strjuncs.c,
from
547-549
Cmp, 320-322, 325
Coalescence, 217-218
readstrm.c, 99,
411
544,
389-396
391-392
readrec.c,
CLV
makekey.c, 161
13-14
use of, 45
159
to insert
29
14,
internal fragmentation
19-20
C programs
btio.c, 392-393
btutil.c, 394-396
driver. c, 390-391
fileio.h, 153-154
find.c, 160-161
520
and implementation
479
527, 534
15-18
in,
352,
in,
insert. c,
476-
in,
),
files,
Clusters
14
in,
Closing
119
getrf.c,
in,
seeks
Btio.c,
484-486
Chang, H., 534
CLOSE(
character strings
file
553-557
Character I/O, 83
Character I/O system, 78
Character strings, in Pascal and
C, 119
organization
546-
552-553
dump
making records
554
543-545
551
to calculate, 116
journey
29
of,
Bytes
size
CD-ROM,
history of,
explanation
RRN
144
field,
explanation
storage,
of,
218
190-192
Compar( ), 320
Compression. See Data
compression
See
583
INDEX
370
Da tarec,
of block
performance
effect
117
organization
144
45-47
organizing tracks by sector.
speed
Descriptor table, 83
types of. 37
program, 268-276
and matching, 259-263
and merging, 263266
and multiway merging. 276-
285-286
of, 266-268
of.
of.
(DASDs),
83
Direct
42
Conversion
text,
138-140
m UNIX.
318-320
320-322
utilities for.
Count subblocks.
),
turning
Cylinders
a.
of.
508
computing capacity
of,
40
83
519
decreasing
number
of. 5
49-50
for,
188-189
218
irreversible, 189
and simple prefix method to
produce separators. 431
of,
suppressing repeated
sequences for, 186-188
UNIX. 189-190
for.
185
5960
explanation
of.
84
Record
distribution
390-391
397-399
Driver. pa<,
C,
Dynamic
531, 532
hashing. 528-530.
Disk bound. 54
Disk cache. 55. 84
Disk controller. 67
Disk drives, 37
comparison of. 572-574
dedicated. 294
fixed head. 572
replacement selection and use
of two. 307, 309
use of multiple. 309-31
Disk packs
explanation
removable.
Disks
EBCDIC
(extended binary
84
Effective transmission rate.
of. 38.
;
84
as bottleneck.
59-
60, 84
assigning variable-length
explanation
51-53
Data compression
in
522
536
extendible hashing and. 513
519. 527-528. 530
collapsing
46, 83
29
explanation
.See
317-318
536
explanation
Cosequential processing
codes
(DMA),
67m. 83
number and
Distribution.
Du, H.
access
of. 2
Driver. c,
37, 83
memory
41-45
Directory
CREATE(
144-145
explanation
file
of.
Controller
speed
Direct access
explanation
37-39
of,
279,
53-54
at
summary
on
of.
explanation
size
explanation
of,
230-234
252
and controlling
splitting.
533-534
and deletion. 520-526
and dynamic hashing, 528530
explanation ot. 6. 505510,
536
implementation of. 510-519
and linear hashing, 530-533
use ot.
4-5
54-55
buckets.
526-527
584
INDEX
and space
merge
utilization for
527-528
directory,
Extensibility,
External fragmentation
218
methods to combat, 201
placement strategies and, 203
of,
tools for,
317-318
96-99
145
of, 96,
190-203
work
in,
CD-ROM,
79,
kernel and,
79-80
483-487
448-450
deletions and, 479-483
22-23,
559-563
131-132
object-oriented, 132-133,
141, 145
explanation of, 84
UNIX counterpart
to,
76
218
placement strategy,
201-202
FITS (flexible image transport
system), 126-129, 136-137
102,
118-119
Fixed-length records
29
File descriptor,
74-75
dump, 107-109
manager
clusters and, 42-43
and
access, 123,
File
organization
access and,
527-528, 530
Floppy disks, 37, 572
Fold and add, 451-452, 493
Formatting
method
of,
explanation
pre-,
145
File protection,
of, 5, 6, 75,
84
closing,
239
end
of,
84,
218
450-453
Header
files
explanation
of,
29-30
UNIX,
in
26
Header records,
281-284
283, 284
Heapsort
of, 304, 326
use of, 280-281, 287, 291,
explanation
311
Height-balanced
trees, 341,
382-383
Hex dump
15-18
18
44-45,
44-45, 198-200,
203, 218
storage, 198-201
Frames, 56, 84
13-14
4-5
internal,
Files
use of,
Hashing algorithms
perfect, 449, 494
153-154
462
with simple indexes, 234
building,
74//
explanation
),
and, 462-466
record access and, 488-489
record distribution and, 453-
Heap
Fragmentation
File structures
conversions
explanation
84
47
Fprintf(
13
of,
530-533, 536
memory
130
Flajolet, P.,
122-123
linear,
192-196
deleting,
names, 76-78
hashing
steps in simple,
File
File
446-
indexed, 493
indexing vs., 447
HDF
204
File
explanation of, 84
function of, 64, 66, 68
of, 6, 431,
448, 493
extendible. See Extendible
Fixed disk, 84
Fixed head disk drives, 572
Fixed-length fields, 96-98, 101,
method, 145
allocation table (FAT)
File-access
file
explanation
160-161
Find.new range function, 517
Find. pas, 175-176
fit,
557-559
466-
471,
First-fit
CD-ROM,
collisions and,
29
of,
First
123
85
and
21-22
in,
explanation
UNIX,
link, 77,
Find.c,
File access
File
in,
disks, 37
contained
Filesystems
file
files
78
special,
on
early
Hard
Hard
Physical
129-
special characteristics
Fields
explanation
in,
self-describing, 125
310-311
Field structures,
285-
132
normal, 78
opening, 9-13
reclaiming space
size of,
133-134, 145
Extents, 43-44, 84
explanation
and
sorts
311
Gather output, 72
Get. pre,
Getrf.c,
174-175
159
explanation
of,
107-109
HIGH_VALUE,
265-266, 326
585
INDEX
Home
Huffman
explanation
to
102, 103
I/O
approaches
|,
paged, 383
primary, 237
in different
Keys
explanation
character, 78, 83
selective, 248,
processing as, 14
overlapping processing and,
languages
16-17
to,
file
RAM
buffer space in
performing, 61
scatter/gather, 86
252
Insert.c,
391-392
Insertions
+
of,
145
multiple,
Inode table, 76
Input buffers, 282-283, 287
Insert( ) function, 357, 359
UNIX, 72-80
in
252
280-281, 283
235-239
placement of information
associated with, 377-379
primary, 110, 111, 146
promotion
of, 383
sequence set, 430
secondary. See Secondary
keys
role in
418-421
355-360, 371-372
block splitting and, 408, 409
as separators,
in B-trees,
description of, 67
236-237
random, 429
explanation
of,
85
Insert. pre,
IBM,
examples
IEEE Standard
data
files
files
memory,
Inverted
role
Index
of separators
set
in,
422-425
244-248
explanation
of,
218
to
files,
3-4
Leaf
of B-tree, 363, 383
Least-recently-used
(LRU)
strategy, 71
/C-way merge
421-422
added
lists
blocks
437
430
373
134-135
Match
418
211-213, 285
pinned records and, 213-214
Keywords, 129, 130
Knuth, Donald, 301-302. 311.
limitations of,
206
operation
explanation
207-208
208-211, 218,
Lands, 546-547
limitations of,
of,
of,
44-45, 218
use
234-235,
effect
of,
Internal sort
85
Internal fragmentation
explanation
135-139
42
of,
Kcysort
289-290
56-57, 85
of,
of,
explanation
415
explanation
Interleaving factor
Index
399-400
explanation
430-431
413-
379-380
Interblock gap, 47
UNIX, 25-26
IBM
trees,
index,
I/O processors
in
in
balanced,
314-315
overflow
Linked lists
explanation of. 218
use of, 192-193, 245-247
LIST program
explanation
in Pascal
of.
15.
24-25
and C, 15-18
586
INDEX
Lists
inverted,
244-248
Linked
linked. See
lists
Loading
of CD-ROM, 555
+
of simple prefix B
maintaining
match
311,
30
Makekey.c, 161
326
Memory
and
collisions,
index
in,
files
462-466
234-235
231-232
Merge
balanced, 312-314
cascade, 316
fc-way,
285-
290-292
explanation
of, 145
image, 128
125-126, 128
and
raster
use
of,
Mod
operator, 451-452
Morse code, 188
Move mode, 71
merge
9-13
files,
Order
of B-trees, 362-364, 383,
422-425, 437
file dump and byte, 109
of merge, 327
Overflow. See also Progressive
of,
494
508-510
Overflow records
buckets and, 473-475, 532
expected number of, 464-465
techniques for handling, 466splitting to handle,
54
467
310
Multistep merge
number of seeks
Multiway merge
consequential processing
286
also
Space
utilization
470
buckets and, 472-473
explanation of, 462-463, 494
463-466
explanation
of,
343-345
top-down construction
345-347
128
explanation
Parallelism, 54
85
Nondata overhead, 47-49
Nonmagnetic disks, 37
of,
of,
computing, 59
(PTD), 573
Parity bit, 56
276-279
multiphase, 326
75-76, 85
13, 76
overflow
decreasing
30
file table,
explanation
Multiphase merges
explanation of, 326
effects of,
12,
Open() function,
Opening
57-59
and UNIX, 80
Makeaddress function, 510513, 532
263
McCreight,
files,
),
addressing. See
Progressive overflow
Open
Metadata
LOW_VALUE, 326
LRU replacement, 375-377
explanation
Open
through, 208
files
9,
OPEN(
264-265
vs.,
56
numbers of lists,
278-279
trees,
parity,
Op^add
for large
425-429
two-pass, 485-486
Locate mode, 71
Logical files
explanation of,
in UNIX, 23
Odd
Pascal
character strings
in,
119
587
INDEX
LIST program
opening
in,
files in,
15-18
10-11
20-21
Pascal programs
btutil.prc, 400-404
driver.pas, 397-399
find. pas, 175-176
get. pre, 174-175
methods
182
Position(f), 21
176-182
171-172
94-95,
writstrm.pas,
98, 99,
168-169
Pathnames, 30, 562, 563
Perfect hashing algorithm, 449,
494
files
of,
Simple
division,
452-453, 494
Process, 86
Progressive overflow
chained, 484-486
of, 466-467, 494
and open addresses, 480
and search length, 468-471,
476, 477
Promotion of key, 355-357,
383
Protection mode, 30
Pseudo random access devices,
335
PTD (parallel transfer disk), 573
of,
explanation
of,
30
25-26
546-547
use of,
Pixels, 128
Placement strategies
explanation of, 218
selection of, 203
types of,
201-202
chained progressive
overflow, 484-486
dangling, 213
Poisson distribution
in
473
457-460, 494
packing density and, 463
Polyphase merge, 316, 317, 327
explanation
of,
Portability
explanation
RAM
RAM
146
86
disks, 55,
51-53
access,
access
196
storage
in,
304-306
545
36
54-55
amount of, 293-
294
in, 206-208, 211,
279-285, 287-290
Random hash functions, 454456
Randomization, 455, 456, 494.
See also Hashing
sorting
READ(
distribution
uniform, 454
117-121
and header records, 120, 122
methods of organizing, 101
103
that use length indicator,
103-107
Record updating, index, 233234,
238-239
Records
explanation of, 100, 101, 146
in Pascal,
96n
reading into
RAM,
287-288
117-121
C, 16
explanation
memory (RAM)
in
of,
526
of fixed-length records, 192
buffer space, 61
Random
Random
file
Redistribution
and disk
86
Pointers
RAM
and
sort, 304-306
Record additions. See Insertions
Record blocking, 112-113
Record deletion
+
in B
trees, 418-421
in B-trees, 366-368, 370
196-198
Record distribution. See
320, 327
of,
Pipes
Platter,
),
218
213-214, 235
explanation
use
Qsort(
172-173
155-156
Readstrm.pas, 169-172
Record access, 3-4
file access and, 51-53
hashing and, 488-489
patterns of, 488-489
Readrec.pas,
for, 111
UNIX, 23-26
in
99
Readstrm.c, 99,
8-9, 30
Pinned records
Pits,
requirements
Prime
),
explanation
explanation
Readfieldf
397-404
insert. pre, 399-400
readrec.pas, 172-173
readstrm.pas, 169-172
16
352,
Physical
136-
prefix B-trees
writrec.pas,
in Pascal,
for achieving,
141
UNIX
seeks in,
stod.pre,
134-136
factors of,
of,
30
distribution
588
INDEX
370-372,
in B-trees, 367,
explanation
of,
383
Redundancy reduction,
187, 188,
185,
219
Redundant arrays of
inexpensive disks (RAID),
573
Reference field, 228-229, 252
Relative block number (RBN),
423
Relative record number (RRN)
access by, 116, 204, 207
explanation of, 146
hashed files and, 476-477
in stack, 193, 194
Replacement
based on page height, 376377
LRU, 375-377
Sequential search
Search length, 469. See also
for
236-237
to, 237-238
record updating to, 238-239
retrieval and, 239-241
record addition
record deletion
306-308
to,
235-236
use of,
of,
573
Run-length encoding
explanation of, 219
use of. 186-188
Runs
10
235-238
retrieval using
239-242
of,
327
298-303
285-289
length of,
use of,
Scatter/gather I/O, 86
Scatter input,
Scatter tables,
71-72
487-488
M., 534
Seagate Sable
PTD, 573
Self-describing
files,
125. 146
Separators
explanation
index
425
set
413-415
430-431
instead of keys,
keys
as,
shortest, 437
Sequence checking, 327
Sequence set
adding simple index to, 411413
and block size, 410-411
blocks and, 407-410, 417-
425-429
explanation
of,
407, 433,
437
Secondary storage
Sequences, suppressing
336-337
paged binary
344
repeating,
186-188
Sequential access,
explanation
3-4
of, 6,
146
sequential access
Sectors
explanation of, 86
organization of, 86
organizing tracks by, 41-45
41-42
SEEK(
UNIX
Sequential processing,
tools for,
114-115
Sequential search
best uses of, 114
292-294
CD-ROM,
552
explanation
of,
explanation
49-50,
86,
of,
146
112-113
112-113
types of, 572
Serial devices,
SGML
37
(standard general
markup
Seeks
in C, 19-20
language), 130
131
excessive, 61
explanation of, 38
multistep merges to decrease
number of, 295-298, 311
Sibling, 367
in Pascal,
of,
111-112
Seek time
See'kRead(f,n), 21
explanation
lists,
421,
combinations
248
of,
Replacement selection
average run length for, 301303
cost of using, 303-305
explanation of, 327
increasing run lengths using,
298-301
for initial run formation, 311.
312
plus multistep merging, 304,
Scholl,
20-21
Simple indexes
with entrv-sequenced
252
SeekWnte(f,n), 21
explanation
Selection tree
explanation
of,
327
files,
227-230
234-235
of,
memory,
589
INDEX
Simple prefix
B+
B+
B+
also
STDIN, 24-25,
trees
STDOUT,
429-430. See
trees vs.,
trees
Stod.prc,
418-421
changes localized to single
blocks in sequence set and,
417-418
explanation
of,
416-417, 437
425-429
loading,
31,
bottleneck, 54, 55
Stmg, 567-571
Subblocks
explanation of, 86
types of,
in
UNIX, 318-322
disk
files in
merging
RAM,
for large
206-208
285-
on
for directory,
Special
file,
527-528
86
361
Splitting
in B-trees, 355, 356, 360,
367, 425
sort. See
Keysort
Tags
advantages of using, 133
explanation of, 129-131
132-133
specification of,
deferred, 536
Tombstones
189-190
22-23
of block size on
performance, 53-54
and file dump, 108
effect
header
files,
26
commands, 26-27
72-80
filesystem
I/O in,
magnetic tape and, 80
physical and logical files
in,
23-26
explanation
in,
318-322
file
166
format), 130
176-182
of,
480-481, 495
480-
481
Stack
explanation
use of,
in,
Tag
file-related
408-410
control of, 533-534
block,
balanced merging,
312
directory structure,
310-311
Two-way
compression
46-47
chained progressive
overflow, 484
explanation of, 448, 464, 494
System call interface, 74
System V UNIX, 189
311-318
UNIX
in
tape,
508
Uniform
Synonyms
explanation
Uniform, 495
file,
311
Tries
162
Sorting
234
for indexes,
62-63
Sockets, 78, 86
Symbolic link
Sort, 319-320, 322, 327
Sort-merge programs, 318
553-557
382-383
height-balanced,
74
182
Storage, as hierarchy,
Strfuncs.c,
CD-ROM,
on
74
of,
219
193-194
Standard I/O, 31
Standardization
136-137
Standish, T. A., 342
Static hashing,
STDERR,
74
J.,
343
219
Variable-length records
379-380
196-198
B-trees and,
Tracks
deleting,
of, 37-40, 87
organizing by sector, 41-45
per surface, 573
Transfer time, 51, 87
explanation
Tree structure
447
VanDoren,
application of, 4
199
590
VAX.
INDEX
135.
138-139
importance
of,
219
fit,
Writstrm.c,
202
Virtual B-trees
explanation
Worst
373-377, 383
377
WRITE(
Writstrm.pas,
115
explanation
Writrec.c,
94-95,
98, 99,
168-169
of, 31
94-95,
155
63-65
156-157
Writrec.pas, 171-172
XDR
(external data
representation),
137-139
Computer
Science/File Structures
Structures
Bill
Zoellick
Michael
J.
Second Edition
This book develops a framework for approaching the design of systems to store and retrieve
information on magnetic disks and other mass storage devices. It provides a fundamental
collection of tools that
appropriate solutions to
file
in
structure problems.
Highlights
Discusses a "toolkit" of approaches to
retrieve file records: simple indexes, paged
indexes (e.g. B-trees), variations on paged
indexes (e.g. B + trees, B trees), and hashing
Includes a
new
is
in
both ANSI C
management
inherent
Second Edition
using
file
and
is
data. structures in a
Zoellick
is
Scientist at the
Avalanche Development
Supercomputing Applications
Company
publishing.
writer
He
is
a frequent lecturer
on CD-ROM
and
at the University
three years
he has been responsible for developing
general purpose scientific data file formats.
of Illinois
in
was
last
a Professor of
issues.
90000>
780201"557138
ISBN D-ED1-55713-4