Files Chapter Seven
Files Chapter Seven
Files
7.1
Chapter Seven
Chapter Overview
In this chapter you will learn about the le persistent data type. In most assembly languages, le I/O is
a major headache. Not so in HLA with the HLA Standard Library. File I/O is no more difcult than writing
data to the standard output device or reading data from the standard input device. In this chapter you will
learn how to create and manipulate sequential and random-access les.
7.2
File Organization
A le is a collection of data that the system maintains in persistent storage. Persistent means that the
storage is non-volatile that is, the system maintains the data even after the program terminates; indeed,
even if you shut off system power. For this reason, plus the fact that different programs can access the data
in a le, applications typically use les to maintain data across executions of the application and to share
data with other applications.
The operating system typically saves le data on a disk drive or some other form of secondary storage
device. As you may recall from the chapter on the memory hierarchy (see The Memory Hierarchy on
page 303), secondary storage (disk drives) is much slower than main memory. Therefore, you generally do
not store data that a program commonly accesses in les during program execution unless that data is far too
large to t into main memory (e.g., a large database).
Under Linux and Windows, a standard le is simply a stream of bytes that the operating system does not
interpret in any way. It is the responsibility of the application to interpret this information, much the same as
it is your applications responsibility to interpret data in memory. The stream of bytes in a le could be a
sequence of ASCII characters (e.g., a text le) or they could be pixel values that form a 24-bit color photograph.
Files generally take one of two different forms: sequential les or random access les. Sequential les
are great for data you read or write all at once; random access les work best for data you read and write in
pieces (or rewrite, as the case may be). For example, a typical text le (like an HLA source le) is usually a
sequential le. Usually your text editor will read or write the entire le at once. Similarly, the HLA compiler will read the data from the le in a sequential fashion without skipping around in the le. A database
le, on the other hand, requires random access since the application can read data from anywhere in the le
in response to a query.
Page 517
Chapter Seven
Volume Three
ment of this list in the le as you would use to locate an element of an array in memory; the only difference
is that a le doesnt have a base address in memory, you simply compute the zero-based offset of the
record in the le. This calculation is quite simple, and using some le I/O functions you will learn about a
little later, you can quickly locate and manipulate any record in a random access le.
Sequential les also consist of a list of records. However, these records do not all have to be the same
length1. If a sequential le does not use xed length records then we say that the le uses variable-length
records. If a sequential le uses variable-length records, then the le must contain some kind of marker or
other mechanism to separate the records in the le. Typical sequential les use one of two mechanisms: a
length prex or some special terminating value. These two schemes should sound quite familiar to those
who have read the chapter on strings. Character strings use a similar scheme to determine the bounds of a
string in memory.
A text le is the best example of a sequential le that uses variable-length records. Text les use a special marker at the end of each record to delineate the records. In a text le, a record corresponds to a single
line of text. Under Windows, the carriage return/line feed character sequence marks the end of each record.
Other operating systems may use a different sequence; e.g., Linux uses a single line feed character while the
Mac OS uses a single carriage return. Since were working with Windows or Linux here, well adopt the
carriage return/line feed or single line feed convention.
Accessing records in a le containing variable-length records is problematic. Unless you have an array
of offsets to each record in a variable-length le, the only practical way to locate record n in a le is to read
the rst n-1 records. This is why variable-length les are sequential-access you have the read the le
sequentially from the start in order to locate a specic record in the le. This will be much slower than
accessing the le in a random access fashion. Generally, you would not use a variable-length record organization for les you need to access in a random fashion.
At rst blush it would seem that xed-length random access les offer all the advantages here. After all,
you can access records in a le with xed-length records much more rapidly than les using the variable-length record organization. However, there is a cost to this: your xed-length records have to be large
enough to hold the largest possible data object you want to store in a record. To store a sequence of lines in
a text le, for example, your record sizes would have to be large enough to hold the longest possible input
line. This could be quite large (for example, HLA allows lines up to 256 characters). Each record in the le
will consume this many bytes even if the record uses substantially less data. For example, an empty line
only requires one or two bytes (for the line feed [Linux] or carriage return/line feed [Windows] sequence).
If your record size is 256 bytes, then youre wasting 255 or 254 bytes for that blank line in your le. If the
average line length is around 60 characters, then each line wastes an average of about 200 characters. This
problem, known as internal fragmentation, can waste a tremendous amount of space on your disk, especially
as your les get larger or you create lots of les. File organizations that use variable-length records generally dont suffer from this problem.
1. There is nothing preventing a sequential le from using xed length records. However, they dont require xed length
records.
Page 518
File I/O
1
87
0
As a text le, this le consumes at least 34 bytes (assuming a two-byte end of line marker on each line).
However, were we to store the data in a xed-record length binary le, with two bytes per integer value, this
le would only consume 14 bytes less than half the space. Furthermore, since the le now uses
xed-length records (two bytes per record) we can efciently access it in a random fashion. Finally, there is
one additional, though hidden, efciency aspect to the binary format: when a program reads and writes
binary data it doesnt have to convert between the binary and string formats. This is an expensive process
(with respect to computer time). If a human being isnt going to read this le with a separate program (like
a text editor) then converting to and from text format on every I/O operation is a wasted effort.
Consider the following HLA record type:
type
person:
record
name:string;
age:int16;
ssn:char[11];
salary:real64;
endrecord;
If we were to write this record as text to a text le, a typical record would take the following form (<nl> indicates the end of line marker, a line feed or carriage return/line feed pair):
Hyde, Randall<nl>
45<nl>
555-55-5555<nl>
123456.78<nl>
Presumably, the next person record in the le would begin with the next line of text in the text le.
The binary version of this le (using a xed length record, reserving 64 bytes for the name string) would
look, schematically, like the following:
...
Hyde, Randall
Figure 7.1
Dont get the impression that binary les must use xed length record sizes. We could create a variable-length version of this record by using a zero byte to terminate the string, as follows:
Page 519
Chapter Seven
Volume Three
Hyde,
Randall0
Two bytes for the Age field
Figure 7.2
In this particular record format the age eld starts at offset 14 in the record (since the name eld and the
end of eld marker [the zero byte] consume 14 bytes). If a different name were chosen, then the age eld
would begin at a different offset in the record. In order to locate the age, ssn, and salary elds of this record,
the program would have to scan past the name and nd the zero terminating byte. The remaining elds
would follow at xed offsets from the zero terminating byte. As you can see, its a bit more work to process
this variable-length record than the xed-length record. Once again, this demonstrates the performance difference between random access (xed-length) and sequential access (variable length, in this case) les.
Although binary les are often more compact and more efcient to access, they do have their drawbacks. In particular, only applications that are aware of the binary les record format can easily access the
le. If youre handed an arbitrary binary le and asked to decipher its contents, this could be very difcult.
Text les, on the other hand, can be read by just about any text editor or lter program out there. Hence,
your data les will be more interchangeable with other programs if you use text les. Furthermore, it is easier to debug the output of your programs if they produce text les since you can load a text le into the same
editor you use to edit your source les.
7.3
Sequential Files
Sequential les are perfect for three types of persistent data: ASCII text les, memory dumps, and
stream data. Since youre probably familiar with ASCII text les, well skip their discussion. The other two
methods of writing sequential les deserve more explanation.
A memory dump is a le that consists of data you transfer from data structures in memory directly to
a le. Although the term memory dump suggests that you sequentially transfer data from consecutive
memory locations to the le, this isnt necessarily the case. Memory access can, an often does, occur in a
random access fashion. However, once the application constructs a record to write to the le, it writes that
record in a sequential fashion (i.e., each record is written in order to the le). A memory dump is what
most applications do when you request that they save the programs current data to a le or read data from a
le into application memory. When writing, they gather all the important data from memory and write it to
the le in a sequential fashion; when reading (loading) data from a le, they read the data from the le in a
sequential fashion and store the data into appropriate memory-based data structures. Generally, when loading or saving le data in this manner, the program opens a le, reads/writes data from/to the le, and then it
closes the le. Very little processing takes place during the data transfer and the application does not leave
the le open for any length of time beyond what is necessary to read or write the les data.
Stream data on input is like data coming from a keyboard. The program reads the data at various points
in the application where it needs new input to continue. Similarly, stream data on output is like a write to the
console device. The application writes data to the le at various points in the program after important computations have taken place and the program wishes to report the results of the calculation. Note that when
reading data from a sequential le, once the program reads a particular piece of data, that data is no longer
available in future reads (unless, of course, the program closes and reopens the le). When writing data to a
Page 520
File I/O
sequential le, once data is written, it becomes a permanent part of the output le. When processing this
kind of data the program typically opens a le and then continues execution. As program execution continues, the application can read or write data in the le. At some point, typically towards the end of the applications execution, the program closes the le and commits the data to disk.
Although disk drives are generally thought of as random access devices, the truth is that they are only
pseudo-random access; in fact, they perform much better when writing data sequentially on the disk surface.
Therefore, sequential access les tend to provide the highest performance (for sequential data) since they
match the highest performance access mode of the disk drive.
Working with sequential les in HLA is very easy. In fact, you already know most of the functions you
need in order to read or write sequential les. All thats left to learn is how to open and close les and perform some simple tests (like have we reached the end of a le when reading data from the le?).
The le I/O functions are nearly identical to the stdin and stdout functions. Indeed, stdin and stdout are
really nothing more than special le I/O functions that read data from the standard input device (a le) or
write data to the standard output device (which is also a le). You use the le I/O functions in a manner analogous to stdin and stdout except you use the leio prex rather than stdin or stdout. For example, to write a
string to an output le, you could use the leio.puts function almost the same way you use the stdout.puts
routine. Similarly, if you wanted to read a string from a le, you would use leio.gets. The only real difference between these function calls and their stdin and stdout counterparts is that you must supply an extra
parameter to tell the function what le to use for the transfer. This is a double word value known as the le
handle. Youll see how to initialize this le handle in a moment, but assuming you have a dword variable
that holds a le handle value, you can use calls like the following to read and write data to sequential les:
fileio.get( inputHandle, i, j, k ); // Reads i, j, k, from file inputHandle.
fileio.put( outputHandle, I = , i, J = , j, K = , k, nl );
Although this example only demonstrates the use of get and put, be aware that almost all of the stdin and stdout functions are available as leio functions, as well (in fact, most of the stdin and stdout functions simply
call the appropriate leio function to do the real work).
There is, of course, the issue of this le handle variable. Youre probably wondering what a le handle
is and how you tell the leio routines to work with data in a specic le on your disk. Well, the denition of
the le handle object is the easiest to explain its just a dword variable that the operating system initializes
and uses to keep track of your le. To declare a le handle, youd just create a dword variable, e.g.,
static
myFileHandle:dword;
You should never explicitly manipulate the value of a le handle variable. The operating system will initialize this variable for you (via some calls youll see in a moment) and the OS expects you to leave this value
alone as long as youre working with the le the OS associates with that handle. If youre curious, both
Linux and Windows store small integer values into the handle variable. Internally, the OS uses this value as
an index into an array that contains pertinent information about open les. If you mess with the le handles
value, you will confuse the OS greatly the next time you attempt to access the le. Moral of the story leave
this value alone while the le is open.
Before you can read or write a le you must open that le and associate a lename with it. The HLA
Standard Library provides a couple of functions that provide this service: leio.open and leio.openNew.
The leio.open function opens an existing le for reading, writing, or both. Generally, you open sequential
les for reading or writing, but not both (though there are some special cases where you can open a sequential le for reading and writing). The syntax for the call to this function is
fileio.open( filename, access );
The rst parameter is a string value that species the lename of the le to open. This can be a string constant, a register that contains the address of a string value, or a string variable. The second parameter is a
constant that species how you want to open the le. You may use any of the three predened constants for
the second parameter:
fileio.r
Page 521
Chapter Seven
Volume Three
fileio.w
fileio.rw
leio.r obviously species that you want to open an existing le in order to read the data from that le;
likewise, leio.w says that you want to open an existing le and overwrite the data in that le. The leio.rw
option lets you open a le for both reading and writing.
The leio.open routine, if successful, returns a le handle in the EAX register. Generally, you will want
to save the return value into a double word variable for use by the other HLA leio routines (i.e., the MyFileHandle variable in the earlier example).
If the OS cannot open the le, leio.open will raise an ex.FileOpenFailure exception. This usually
means that it could not nd the specied le on the disk.
The leio.open routine requires that the le exist on the disk or it will raise an exception. If you want to
create a new le, that might not already exist, the leio.openNew function will do the job for you. This function uses the following syntax:
fileio.openNew( filename );
Note that this call has only a single parameter, a string specifying the lename. When you open a le with
leio.openNew, the le is always opened for writing. If a le by the specied lename already exists, then
this function will delete the existing le and the new data will be written over the top of the old le (so be
careful!).
Like leio.open, leio.openNew returns a le handle in the EAX register if it successfully opens the le.
You should save this value in a le handle variable. This function raises the ex.FileOpenFailure exception if
it cannot open the le.
Once you open a sequential le with leio.open or leio.openNew and you save the le handle value
away, you can begin reading data from an input le (leio.r) or writing data to an output le (leio.w). To do
this, you would use functions like leio.put as noted above.
When the le I/O is complete, you must close the le to commit the le data to the disk. You should
always close all les you open as soon as you are through with them so that the program doesnt consume
excess system resources. The syntax for leio.close is very simple, it takes a single parameter, the le handle value returned by leio.open or leio.openNew:
fileio.close( file_handle );
If there is an error closing the le, leio.close will raise the ex.FileCloseError exception. Note that Linux
and Windows automatically close all open les when an application terminates; however, it is very bad programming style to depend on this feature. If the system crashes (or the user turns off the power) before the
application terminates, le data may be lost. So you should always close your les as soon as you are done
accessing the data in that le.
The last function of interest to us right now is the leio.eof function. This function returns true (1) or
false (0) in the AL register depending on whether the current le pointer is at the end of the le. Generally
you would use this function when reading data from an input le to determine if there is more data to read
from the le. You would not normally call this function for output les; it always returns false2. Since the
leio routines will raise an exception if the disk is full, there is no need to waste time checking for end of le
(EOF) when writing data to a le. The syntax for leio.eof is
fileio.eof( file_handle );
The following program example demonstrates a complete program that opens and writes a simple text
le:
program SimpleFileOutput;
Page 522
File I/O
#include( stdlib.hhf )
static
outputHandle:dword;
begin SimpleFileOutput;
fileio.openNew( myfile.txt );
mov( eax, outputHandle );
for( mov( 0, ebx ); ebx < 10; inc( ebx )) do
fileio.put( outputHandle, (type uns32 ebx ), nl );
endfor;
fileio.close( outputHandle );
end SimpleFileOutput;
Program 7.1
The following sample program reads the data that Program 7.1 produces and writes the data to the standard output device:
program SimpleFileInput;
#include( stdlib.hhf )
static
inputHandle:dword;
u:uns32;
begin SimpleFileInput;
fileio.open( myfile.txt, fileio.r );
mov( eax, inputHandle );
for( mov( 0, ebx ); ebx < 10; inc( ebx )) do
fileio.get( inputHandle, u );
stdout.put( ebx=, ebx, u=, u, nl );
endfor;
fileio.close( inputHandle );
end SimpleFileInput;
Program 7.2
There are a couple of interesting functions that you can use when working with sequential les. They
are the following:
Page 523
Chapter Seven
Volume Three
fileio.rewind( fileHandle );
fileio.append( fileHandle );
The leio.rewind function resets the le pointer (the cursor into the le where the next read or write
will take place) back to the beginning of the le. This name is a carry-over from the days of les on tape
drives when the system would rewind the tape on the tape drive to move the read/write head back to the
beginning of the le.
If youve opened a le for reading, then leio.rewind lets you begin reading the le from the start (i.e.,
make a second pass over the data). If youve opened the le for writing, then leio.rewind will cause future
writes to overwrite the data youve previously written; you wont normally use this function with les
youve opened only for writing. If youve opened the le for reading and writing (using the leio.rw option)
then you can write the data after youve rst opened the le and then rewind the le and read the data youve
written. The following is a modication to Program 7.2 that reads the data le twice. This program also
demonstrates the use of leio.eof to test for the end of the le (rather than just counting the records).
program SimpleFileInput2;
#include( stdlib.hhf )
static
inputHandle:dword;
u:uns32;
begin SimpleFileInput2;
fileio.open( myfile.txt, fileio.r );
mov( eax, inputHandle );
for( mov( 0, ebx ); ebx < 10; inc( ebx )) do
fileio.get( inputHandle, u );
stdout.put( ebx=, ebx, u=, u, nl );
endfor;
stdout.newln();
// Rewind the file and reread the data from the beginning.
// This time, use fileio.eof() to determine when weve
// reached the end of the file.
fileio.rewind( inputHandle );
while( fileio.eof( inputHandle ) = false ) do
// Read and display the next item from the file:
fileio.get( inputHandle, u );
stdout.put( u=, u, nl );
//
//
//
//
//
//
fileio.readLn( inputHandle );
endwhile;
Page 524
File I/O
fileio.close( inputHandle );
end SimpleFileInput2;
Program 7.3
The leio.append function moves the le pointer to the end of the le. This function is really only useful for les youve opened for writing (or reading and writing). After executing leio.append, all data you
write to the le will be written after the data that already exists in the le (i.e., you use this call to append
data to the end of a le youve opened). The following program demonstrates how to use this program to
append data to the le created by Program 7.1:
program AppendDemo;
#include( stdlib.hhf )
static
fileHandle:dword;
u:uns32;
begin AppendDemo;
fileio.open( myfile.txt, fileio.rw );
mov( eax, fileHandle );
fileio.append( eax );
for( mov( 10, ecx ); ecx < 20; inc( ecx )) do
fileio.put( fileHandle, (type uns32 ecx), nl );
endfor;
Program 7.4
Page 525
Chapter Seven
Volume Three
Another function, similar to leio.eof, that will prove useful when reading data from a le is the
leio.eoln function. This function returns true if the next character(s) to be read from the le are the end of
line sequence (carriage return, linefeed, or the sequence of these two characters under Windows, just a line
feed under Linux). This function returns true or false in the EAX register if it detects an end of line
sequence. The calling sequence for this function is
fileio.eoln( fileHandle );
If leio.eoln detects an end of line sequence, it will read those characters from the le (so the next read
from the le will not read the end of line characters). If leio.eoln does not detect the end of line sequence,
it does not modify the le pointer position. The following sample program demonstrates the use of
leio.eoln in the AppendDemo program, replacing the call to leio.readLn (since leio.eoln reads the end of
line sequence, there is no need for the call to leio.readLn):
program EolnDemo;
#include( stdlib.hhf )
static
fileHandle:dword;
u:uns32;
begin EolnDemo;
fileio.open( myfile.txt, fileio.rw );
mov( eax, fileHandle );
fileio.append( eax );
for( mov( 10, ecx ); ecx < 20; inc( ecx )) do
fileio.put( fileHandle, (type uns32 ecx), nl );
endfor;
Page 526
File I/O
Program 7.5
7.4
The leHandle parameter is the usual le handle value (a dword variable). The count parameter is an uns32
object that species how many bytes to read or write. The buffer parameter must be an array object with at
least count bytes. This parameter supplies the address of the rst byte in memory where the I/O transfer will
take place. These functions return the number of bytes read or written in the EAX register. For leio.read, if
the return value in EAX does not equal counts value, then youve reached the end of the le. For
leio.write, if EAX does not equal count then the disk is full.
Here is a typical call to the leio.read function that will read a record from a le:
fileio.read( myHandle, myRecord, @size( myRecord ) );
If the return value in EAX does not equal @size( myRecord ) and it does not equal zero (indicating end of
le) then there is something seriously wrong with the le since the le should contain an integral number of
records.
Writing data to a le with leio.write uses a similar syntax to leio.read.
You can use leio.read and leio.write to read and write data from/to a sequential le, just as you can
use routines like leio.get and leio.put to read/write data from/to a random access le. Youd typically use
these routines to read and write data from/to a binary sequential le.
The functions weve discussed to this point dont let you randomly access records in a le. If you call
leio.read several times in a row, the program will read those records sequentially from the text le. To do
true random access I/O we need the ability to jump around in the le. Fortunately, the HLA Standard
Librarys le module provides several functions you can use to accomplish this.
The leio.position function returns the current offset into the le in the EAX register. If you call this
function immediately before reading or writing a record to a le, then this function will tell you the exact
Page 527
Chapter Seven
Volume Three
position of that record. You can use this value to quickly locate that record for a future access. The calling
sequence for this function is
fileio.position( fileHandle ); // Returns current file position in EAX.
The leio.seek function repositions the le pointer to the offset you specify as a parameter. The following is the calling sequence for this function:
fileio.seek( fileHandle, offset ); // Repositions file to specified offset.
The function call above will reposition the le pointer to the byte offset specied by the offset parameter. If
you feed this function the value returned by leio.position, then the next read or write operation will access
the record written (or read) immediately after the leio.position call.
You can pass any arbitrary offset value as a parameter to the leio.seek routine; this value does not have
to be one that the leio.position function returns. For random access le I/O you would normally compute
this offset le by specifying the index of the record you wish to access multiplied by the size of the record.
For example, the following code computes the byte offset of record index in the le, repositions the le
pointer to that record, and then reads the record:
intmul( @size( myRecord ), index, ebx );
fileio.seek( fileHandle, ebx );
fileio.read( fileHandle, (type byte myRecord), @size( myRecord ) );
You can use essentially this same code sequence to select a specic record in the le for writing.
Note that it is not an error to seek beyond the current end of le and then write data. If you do this, the
OS will automatically ll in the intervening records with uninitialized data. Generally, this isnt a great way
to create les, but it is perfectly legal. On the other hand, be aware that if you do this by accident, you may
wind up with garbage in the le and no error to indicate that this has happened.
The leio module provides another routine for repositioning the le pointer: leio.rSeek. This functions calling sequence is very similar to leio.seek, it is
fileio.rSeek( fileHandle, offset );
The difference between this function and the regular leio.seek function is that this function repositions the
le pointer offset bytes from the end of the le (rather than offset bytes from the start of the le). The r in
rSeek stands for reverse seek.
Repositioning the le pointer, especially if you reposition it a fair distance from its current location, can
be a time-consuming process. If you reposition the le pointer and then attempt to read a record from the
le, the system may need to reposition a disk arm (a very slow process) and wait for the data to rotate underneath the disk read/write head. This is why random access I/O is much less efcient than sequential I/O.
The following program demonstrates random access I/O by writing and reading a le of records:
program RandomAccessDemo;
#include( stdlib.hhf )
type
fileRec:
record
x:int16;
y:int16;
magnitude:uns8;
endrecord;
const
// Some arbitrary data we can use to initialize the file:
Page 528
File I/O
fileData:=
[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
];
2000, 1, 1 ],
1000, 10, 2 ],
750, 100, 3 ],
500, 500, 4 ],
100, 1000, 5 ],
62, 2000, 6 ],
32, 2500, 7 ],
10, 3000, 8 ]
static
fileHandle:
RecordFromFile:
InitialFileData:
dword;
fileRec;
fileRec[ 8 ] := fileData;
begin RandomAccessDemo;
fileio.openNew( fileRec.bin );
mov( eax, fileHandle );
// Okay, write the initial data to the file in a sequential fashion:
for( mov( 0, ebx ); ebx < 8; inc( ebx )) do
intmul( @size( fileRec ), ebx, ecx );
fileio.write
(
fileHandle,
(type byte InitialFileData[ecx]),
@size( fileRec )
);
endfor;
// Okay, now lets demonstrate a random access of this file
// by reading the records from the file backwards.
stdout.put( Reading the records, backwards: nl );
for( mov( 7, ebx ); (type int32 ebx) >= 0; dec( ebx )) do
intmul( @size( fileRec ), ebx, ecx );
fileio.seek( fileHandle, ecx );
fileio.read
(
fileHandle,
(type byte RecordFromFile),
@size( fileRec )
);
if( eax = @size( fileRec )) then
stdout.put
(
Read record #,
(type uns32 ebx),
, values: nl
x: , RecordFromFile.x, nl
Page 529
Chapter Seven
Volume Three
y: , RecordFromFile.y, nl
magnitude: , RecordFromFile.magnitude, nl nl
);
else
stdout.put( Error reading record number , (type uns32 ebx), nl );
endif;
endfor;
fileio.close( fileHandle );
end RandomAccessDemo;
Program 7.6
7.5
3. Redundant array of inexpensive disks. RAID is a mechanism for combining lots of cheap disk drives together to form the
equivalent of a really large disk drive.
4. This assumes, of course, that your les have a maximum size of four gigabytes.
Page 530
File I/O
mov( eax, ISAMarray[ecx*4] ); // Save away current record offset.
fileio.put( fileHandle, outputLine, nl ); // Write the record.
inc( ecx ); // Advance to next element of ISAMarray.
<< determine if were done and BREAK if we are >>
endfor;
<< At this point, ECX contains the number of records and >>
<< ISAMarray[0]..ISAMarray[ecx-1] contain the offsets to >>
<< each of the records in the file.
>>
After building the le using the code above, you can quickly jump to an arbitrary line of text by fetching
the index for that line from the ISAMarray list. The following code demonstrates how you could read line
recordNumber from the le:
mov( recordNumber, ebx );
fileio.seek( fileHandle, ISAMarray[ ebx*4 ] );
fileio.a_gets( fileHandle, inputString );
As long as youve precalculated the ISAMarray list, accessing an arbitrary line in this text le is a trivial
matter.
Of course, back in the days when IBM programmers were trying to squeeze every byte from their databases as possible so they would t on a ve megabyte disk drive, they didnt have 512 kilobytes of RAM to
hold 128K entries in the ISAMarray list. Although a half a megabyte is no big deal today, there are a couple
of reasons why keeping the ISAMarray list in a memory-based array might not be such a good idea. First,
databases are much larger these days. Some databases have hundreds of millions of entries. While setting
aside a half a megabyte for an ISAM table might not be a bad thing, few people are willing to set aside a half
a gigabyte for this purpose. Even if your database isnt amazingly big, there is another reason why you
might not want to keep your ISAMarray in main memory its the same reason you dont keep the le in
memory memory is volatile and the data is lost whenever the application quits or the user removes power
from the system. The solution is exactly the same as for the le data: you store the ISAMarray data in its
own le. A program that builds the ISAM table while writing the le is a simple modication to the previous ISAM generation program. The trick is to open two les concurrently and write the ISAM data to one
le while youre writing the text to the other le:
static
fileHandle: dword;
outputLine: string;
CurrentOffset: dword;
.
.
.
forever
Page 531
Chapter Seven
Volume Three
endfor;
If necessary, you can count the number of records as before. You might write this value to the rst record of
the ISAM le (since you know the rst record of the text le is always at offset zero, you can use the rst
element of the ISAM list to hold the count of ISAM/text le records).
Since the ISAM le is just a sequence of four-byte integers, each record in the le (i.e., an integer) has
the same length. Therefore, we can easily access any value in the ISAM le using the random access le I/O
mechanism. In order to read a particular line of text from the text le, the rst task is to read the offset from
the ISAM le and then use that offset to read the desired line from the text le. The code to accomplish this
is as follows:
// Assume we want to read the line specified by the lineNumber variable.
if( lineNumber <> 0 ) then
// If not record number zero, then fetch the offset to the desired
// line from the ISAM file:
intmul( 4, lineNumber, eax );
// Compute the index into the ISAM file.
fileio.seek( isamHandle, eax );
fileio.read( isamHandle, (type byte CurrentOffset), 4 ); // Read offset
else
mov( 0, eax );
endif;
fileio.seek( fileHandle, CurrentOffset ); // Set text file position.
fileio.a_gets( fileHandle, inputLine );
// Read the line of text.
This operation runs at about half the speed of having the ISAM array in memory (since it takes four le
accesses rather than two to read the line of text from the le), but the data is non-volatile and is not limited
by the amount of available RAM.
If you decide to use a memory-based array for your ISAM table, its still a good idea to keep that data in
a le somewhere so you dont have to recompute it (by reading the entire le) every time your application
starts. If the data is present in a le, all youve got to do is read that le data into your ISAMarray list.
Assuming youve stored the number of records in element number zero of the ISAM array, you could use the
following code to read your ISAM data into the ISAMarray variable:
static
isamSize: uns32;
isamHandle: dword;
fileHandle: dword;
ISAMarray: dword[ 128*1024 ];
.
.
.
// Read the first record of the ISAM file into the isamSize variable:
fileio.read( isamHandle, (type byte isamSize), 4 );
// Now read the remaining data from the ISAM file into the ISAMarray
// variable:
if( isamSize >= 128*1024 ) then
raise( ex.ValueOutOfRange );
Page 532
File I/O
endif;
intmul( 4, isamSize, ecx ); // #records * 4 is number of bytes to read.
fileio.read( isamHandle, (type byte ISAMarray), ecx );
// At this point, ISAMarray[0]..ISAMarray[isamSize-1] contain the indexes
// into the text file for each line of text.
7.6
Truncating a File
If you open an existing le (using leio.open) for output and write data to that le, it overwrites the
existing data from the start of the le. However, if the new data you write to the le is shorter than the data
originally appearing in the le, the excess data from the original le, beyond the end of the new data youve
written, will still appear at the end of the new data. Sometimes this might be desirable, but most of the time
youll want to delete the old data after writing the new data.
One way to delete the old data is to use the leio.openNew function to open the le. The leio.openNew
function automatically deletes any existing le so only the data you write to the le will be present in the
le. However, there may be times when you may want to read the old data rst, rewind the le, and then
overwrite the data. In this situation, youll need a function that will truncate the old data at the end of the le
after youve written the new data. The leio.truncate function accomplishes this task. This function uses the
following calling syntax:
fileio.truncate( fileHandle );
Note that this function does not close the le. You still have to call leio.close to commit the data to the disk.
The following sample program demonstrates the use of the leio.truncate function:
program TruncateDemo;
#include( stdlib.hhf )
static
fileHandle:dword;
u:uns32;
begin TruncateDemo;
fileio.openNew( myfile.txt );
mov( eax, fileHandle );
for( mov( 0, ecx ); ecx < 20; inc( ecx )) do
fileio.put( fileHandle, (type uns32 ecx), nl );
endfor;
Page 533
Chapter Seven
Volume Three
endfor;
fileio.truncate( fileHandle );
Program 7.7
7.7
The sourceFileName and destFileName parameters are strings that specify the pathnames of the source and
destination les. These can be string constants or variables. The last parameter is a boolean variable that
species what should happen if the destination le exists. If this parameter contains true and the le already
exists, then the function will fail; if failIfExists is false, the leio.copy routine will replace the existing destination le with a copy of the source le. In either case, of course, the source le must exist or this function
Page 534
File I/O
will fail. This function returns a boolean success/failure result in the EAX register. It returns true if the
function returns TRUE in EAX.
Program 7.8 demonstrates the use of this function to copy a le:
program CopyDemo;
#include( stdlib.hhf )
begin CopyDemo;
// Make a copy of myfile.txt to itself to demonstrate
// a true failsIfExists parameter.
if( !fileio.copy( myfile.txt, myfile.txt, true )) then
stdout.put( Did not copy myfile.txt over itself nl );
else
stdout.put( Whoa!
endif;
// Okay, make a copy of the file to a different file, to verify
// that this works properly:
if( fileio.copy( myfile.txt, copyOfMyFile.txt, false )) then
stdout.put( Successfully copied the file nl );
else
stdout.put( Failed to copy the file (maybe it doesnt exist?) nl );
endif;
end CopyDemo;
Program 7.8
To move a le from one location to another might seem like another trivial task all youve got to do is
copy the le to the destination and then delete the original le. However, these scheme is quite inefcient in
most situations. Copying the le can be an expensive process if the le is large; Worse, the move operation
may fail if youre moving the le to a new location on the same disk and there is insufcient space for a second copy of the le. A much better solution is to simply move the les directory entry from one location to
another on the disk. Win32s disk directory entries are quite small, so moving a le to a different location on
the same disk by simply moving its directory entry is very fast and efcient. Unfortunately, if you move a
le from one le system (disk) to another, you will have to rst copy the le and then delete the original le.
Once again, you dont have to bother with the complexities of this operation because Windows has a built-in
function that automatically moves les for you. The HLA Standard Librarys leio.move procedure provides
a direct interface to this function (available only under Windows). The calling sequence is
fileio.move( source, dest );
Page 535
Chapter Seven
Volume Three
The two parameters are strings providing the source and destination lenames. This function returns true or
false in EAX to denote the success or failure of the operation.
Not only can the leio.move procedure move a le around on the disk, it can also move subdirectories
around. The only catch is that you cannot move a subdirectory from one volume (le system/disk) to
another.
If both the destination and source lenames are simple lenames, not a pathnames, then the leio.move
function moves the source le from the current directory back to the current directory. Although this seems
rather weird, this is a very common operation; this is how you rename a le. The HLA Standard Library
does not have a separate leio.rename function. Instead, you use the leio.move function to rename les
by moving them to the same directory but with a different lename. Program 7.9 demonstrates how to use
leio.move in this capacity.
program FileMoveDemo;
#include( stdlib.hhf )
begin FileMoveDemo;
// Rename the myfile.txt file to the name renamed.txt.
if( !fileio.move( myfile.txt, renamed.txt )) then
stdout.put
(
Could not rename myfile.txt (maybe it doesnt exist?) nl
);
else
stdout.put( Successfully renamed the file nl );
endif;
end FileMoveDemo;
Program 7.9
The rst form above expects you to pass the lename as a string parameter. The second form expects a handle to a le youve opened with leio.open or leio.openNew. These two calls return the size of the le in
EAX. If an error occurs, these functions return -1 ($FFFF_FFFF) in EAX. Note that the les must be less
than four gigabytes in length when using this function (if you need to check the size of larger les, you will
have to call the appropriate OS function rather than these functions; however, since les larger than four
gigabytes are rather rare, you probably wont have to worry about this problem).
Page 536
File I/O
One interesting use for this function is to determine the number of records in a xed-length-record random access le. By getting the size of the le and dividing by the size of a record, you can determine the
number of records in the le.
Another use for this function is to allow you to determine the size of a (smaller) le, allocate sufcient
storage to hold the entire le in memory (by using malloc), and then read the entire le into memory using
the leio.read function. This is generally the fastest way to read data from a le into memory.
Program 7.10 demonstrates the use of the two forms of the leio.size function by displaying the size of
the myle.txt le created by other sample programs in this chapter.
program FileSizeDemo;
#include( stdlib.hhf )
static
handle:dword;
begin FileSizeDemo;
// Display the size of the FileSizeDemo.hla file:
fileio.size( FileSizeDemo.hla );
if( eax <> -1 ) then
stdout.put( Size of file: , (type uns32 eax), nl );
else
stdout.put( Error calculating file size nl );
endif;
// Same thing, using the file handle as a parameter:
fileio.open( FileSizeDemo.hla, fileio.r );
mov( eax, handle );
fileio.size( handle );
if( eax <> -1 ) then
stdout.put( Size of file(2): , (type uns32 eax), nl );
else
stdout.put( Error calculating file size nl );
endif;
fileio.close( handle );
end FileSizeDemo;
Program 7.10
Page 537
Chapter Seven
Volume Three
The single parameter is a string containing the pathname of the le you wish to delete. This function returns
true/false in the EAX register to denote success/failure.
Program 7.11 provides an example of the use of the leio.delete function.
program DeleteFileDemo;
#include( stdlib.hhf )
static
handle:dword;
begin DeleteFileDemo;
// Delete the myfile.txt file:
fileio.delete( xyz );
if( eax ) then
stdout.put( Deleted the file, nl );
else
stdout.put( Error deleting the file nl );
endif;
end DeleteFileDemo;
Program 7.11
7.8
Directory Operations
In addition to manipulating les, you can also manipulate directories with some of the leio functions.
The HLA Standard Library includes several functions that let you create and use subdirectories. These functions are leio.cd (change directory), leio.gwd (get working directory), and leio.mkdir (make directory).
Their calling sequences are
fileio.cd( pathnameString );
fileio.gwd( stringToHoldPathname );
fileio.mkdir( newDirectoryName );
The leio.cd and leio.mkdir functions return success or failure (true or false, respectively) in the EAX register. For the leio.gwd function, the string parameter is a destination string where the system will store the
pathname to the current directory. You must allocate sufcient storage for the string prior to passing the
string to this function (260 characters6 is a good default amount if youre unsure how long the pathname
Page 538
File I/O
could be). If the actual pathname is too long to t in the destination string you supply as a parameter, the
leio.gwd function will raise the ex.StringOverow exception.
The leio.cd function sets the current working directory to the pathname you specify. After calling this
function, the OS will assume that all future unadorned le references (those without any \ or / characters in the pathname) will default to the directory you specify as the leio.cd parameter. Proper use of this
function can help make your program much more convenient to use by your programs users since they
wont have to enter full pathnames for every le they manipulate.
The leio.gwd function lets you query the system to determine the current working directory. After a
call to leio.cd, the string that leio.gwd returns should be the same as leio.cds parameter. Typically, you
would use this function to keep track of the default directory when your program rst starts running. You
program will exhibit good manners by switching back to this default directory when your program terminates.
The leio.mkdir function lets your program create a new subdirectory. If your program creates data les
and stores them in a default directory somewhere, its good etiquette to let the user specify the subdirectory
where your program should put these les. If you do this, you should give your users the option to create a
new directory (in case they want the data placed in a brand-new directory). You can use leio.mkdir for this
purpose.
7.9
6. This is the default MAX_PATH value in Windows. This is probably sufcient for most Linux applications, too.
Page 539
Chapter Seven
Page 540
Volume Three