0% found this document useful (0 votes)
49 views24 pages

Files Chapter Seven

In most assembly languages, File I / O is a major headache. Not so in HLA with the HLA Standard Library. In this chapter you will learn how to create and manipulate sequential and random-access files.

Uploaded by

qthermal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views24 pages

Files Chapter Seven

In most assembly languages, File I / O is a major headache. Not so in HLA with the HLA Standard Library. In this chapter you will learn how to create and manipulate sequential and random-access files.

Uploaded by

qthermal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

File I/O

Files
7.1

Chapter Seven

Chapter Overview
In this chapter you will learn about the le persistent data type. In most assembly languages, le I/O is
a major headache. Not so in HLA with the HLA Standard Library. File I/O is no more difcult than writing
data to the standard output device or reading data from the standard input device. In this chapter you will
learn how to create and manipulate sequential and random-access les.

7.2

File Organization
A le is a collection of data that the system maintains in persistent storage. Persistent means that the
storage is non-volatile that is, the system maintains the data even after the program terminates; indeed,
even if you shut off system power. For this reason, plus the fact that different programs can access the data
in a le, applications typically use les to maintain data across executions of the application and to share
data with other applications.
The operating system typically saves le data on a disk drive or some other form of secondary storage
device. As you may recall from the chapter on the memory hierarchy (see The Memory Hierarchy on
page 303), secondary storage (disk drives) is much slower than main memory. Therefore, you generally do
not store data that a program commonly accesses in les during program execution unless that data is far too
large to t into main memory (e.g., a large database).
Under Linux and Windows, a standard le is simply a stream of bytes that the operating system does not
interpret in any way. It is the responsibility of the application to interpret this information, much the same as
it is your applications responsibility to interpret data in memory. The stream of bytes in a le could be a
sequence of ASCII characters (e.g., a text le) or they could be pixel values that form a 24-bit color photograph.
Files generally take one of two different forms: sequential les or random access les. Sequential les
are great for data you read or write all at once; random access les work best for data you read and write in
pieces (or rewrite, as the case may be). For example, a typical text le (like an HLA source le) is usually a
sequential le. Usually your text editor will read or write the entire le at once. Similarly, the HLA compiler will read the data from the le in a sequential fashion without skipping around in the le. A database
le, on the other hand, requires random access since the application can read data from anywhere in the le
in response to a query.

7.2.1 Files as Lists of Records


A good view of a le is as a list of records. That is, the le is broken down into a sequential string of
records that share a common structure. A list is simply an open-ended single dimensional array of items, so
we can view a le as an array of records. As such, we can index into the le and select record number zero,
record number one, record number two, etc. Using common le access operations, it is quite possible to skip
around to different records in a le. Under Windows and Linux, the principle difference between a sequential le and a random access le is the organization of the records and how easy it is to locate a specic
record within the le. In this section well take a look at the issues that differentiate these two types of les.
The easiest le organization to understand is the random access le. A random access le is a list of
records whose lengths are all identical (i.e., random access les require xed length records). If the record
length is n bytes, then the rst record appears at byte offset zero in the le, the second record appears at byte
offset n in the le, the third record appears at byte offset n*2 in the le, etc. This organization is virtually
identical to that of an array of records in main memory; you use the same computation to locate an ele-

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 517

Chapter Seven

Volume Three

ment of this list in the le as you would use to locate an element of an array in memory; the only difference
is that a le doesnt have a base address in memory, you simply compute the zero-based offset of the
record in the le. This calculation is quite simple, and using some le I/O functions you will learn about a
little later, you can quickly locate and manipulate any record in a random access le.
Sequential les also consist of a list of records. However, these records do not all have to be the same
length1. If a sequential le does not use xed length records then we say that the le uses variable-length
records. If a sequential le uses variable-length records, then the le must contain some kind of marker or
other mechanism to separate the records in the le. Typical sequential les use one of two mechanisms: a
length prex or some special terminating value. These two schemes should sound quite familiar to those
who have read the chapter on strings. Character strings use a similar scheme to determine the bounds of a
string in memory.
A text le is the best example of a sequential le that uses variable-length records. Text les use a special marker at the end of each record to delineate the records. In a text le, a record corresponds to a single
line of text. Under Windows, the carriage return/line feed character sequence marks the end of each record.
Other operating systems may use a different sequence; e.g., Linux uses a single line feed character while the
Mac OS uses a single carriage return. Since were working with Windows or Linux here, well adopt the
carriage return/line feed or single line feed convention.
Accessing records in a le containing variable-length records is problematic. Unless you have an array
of offsets to each record in a variable-length le, the only practical way to locate record n in a le is to read
the rst n-1 records. This is why variable-length les are sequential-access you have the read the le
sequentially from the start in order to locate a specic record in the le. This will be much slower than
accessing the le in a random access fashion. Generally, you would not use a variable-length record organization for les you need to access in a random fashion.
At rst blush it would seem that xed-length random access les offer all the advantages here. After all,
you can access records in a le with xed-length records much more rapidly than les using the variable-length record organization. However, there is a cost to this: your xed-length records have to be large
enough to hold the largest possible data object you want to store in a record. To store a sequence of lines in
a text le, for example, your record sizes would have to be large enough to hold the longest possible input
line. This could be quite large (for example, HLA allows lines up to 256 characters). Each record in the le
will consume this many bytes even if the record uses substantially less data. For example, an empty line
only requires one or two bytes (for the line feed [Linux] or carriage return/line feed [Windows] sequence).
If your record size is 256 bytes, then youre wasting 255 or 254 bytes for that blank line in your le. If the
average line length is around 60 characters, then each line wastes an average of about 200 characters. This
problem, known as internal fragmentation, can waste a tremendous amount of space on your disk, especially
as your les get larger or you create lots of les. File organizations that use variable-length records generally dont suffer from this problem.

7.2.2 Binary vs. Text Files


Another important thing to realize about les is that they dont all contain human readable text. Object
and executable les are good examples of les that contain binary information rather than text. A text le is
a very special kind of variable-length sequential le that uses special end of line markers (carriage
returns/line feeds) at the end of each record (line) in the le. Binary les are everything else.
Binary les are often more compact than text les and they are usually more efcient to access. Consider a text le that contains the following set of two-byte integer values:
1234
543
3645
32000

1. There is nothing preventing a sequential le from using xed length records. However, they dont require xed length
records.

Page 518

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
1
87
0

As a text le, this le consumes at least 34 bytes (assuming a two-byte end of line marker on each line).
However, were we to store the data in a xed-record length binary le, with two bytes per integer value, this
le would only consume 14 bytes less than half the space. Furthermore, since the le now uses
xed-length records (two bytes per record) we can efciently access it in a random fashion. Finally, there is
one additional, though hidden, efciency aspect to the binary format: when a program reads and writes
binary data it doesnt have to convert between the binary and string formats. This is an expensive process
(with respect to computer time). If a human being isnt going to read this le with a separate program (like
a text editor) then converting to and from text format on every I/O operation is a wasted effort.
Consider the following HLA record type:
type
person:
record
name:string;
age:int16;
ssn:char[11];
salary:real64;
endrecord;

If we were to write this record as text to a text le, a typical record would take the following form (<nl> indicates the end of line marker, a line feed or carriage return/line feed pair):
Hyde, Randall<nl>
45<nl>
555-55-5555<nl>
123456.78<nl>

Presumably, the next person record in the le would begin with the next line of text in the text le.
The binary version of this le (using a xed length record, reserving 64 bytes for the name string) would
look, schematically, like the following:

...

Hyde, Randall

64 bytes for the Name field

Two bytes for the Age field


11 bytes for the SSN field
Eight bytes for the salary field

Figure 7.1

Fixed-lengthFormat for Person Record

Dont get the impression that binary les must use xed length record sizes. We could create a variable-length version of this record by using a zero byte to terminate the string, as follows:

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 519

Chapter Seven

Volume Three

Hyde,
Randall0
Two bytes for the Age field

14 bytes for the Name field


11 bytes for the SSN field

Eight bytes for the salary field

Figure 7.2

Variable-length Format for Person Record

In this particular record format the age eld starts at offset 14 in the record (since the name eld and the
end of eld marker [the zero byte] consume 14 bytes). If a different name were chosen, then the age eld
would begin at a different offset in the record. In order to locate the age, ssn, and salary elds of this record,
the program would have to scan past the name and nd the zero terminating byte. The remaining elds
would follow at xed offsets from the zero terminating byte. As you can see, its a bit more work to process
this variable-length record than the xed-length record. Once again, this demonstrates the performance difference between random access (xed-length) and sequential access (variable length, in this case) les.
Although binary les are often more compact and more efcient to access, they do have their drawbacks. In particular, only applications that are aware of the binary les record format can easily access the
le. If youre handed an arbitrary binary le and asked to decipher its contents, this could be very difcult.
Text les, on the other hand, can be read by just about any text editor or lter program out there. Hence,
your data les will be more interchangeable with other programs if you use text les. Furthermore, it is easier to debug the output of your programs if they produce text les since you can load a text le into the same
editor you use to edit your source les.

7.3

Sequential Files
Sequential les are perfect for three types of persistent data: ASCII text les, memory dumps, and
stream data. Since youre probably familiar with ASCII text les, well skip their discussion. The other two
methods of writing sequential les deserve more explanation.
A memory dump is a le that consists of data you transfer from data structures in memory directly to
a le. Although the term memory dump suggests that you sequentially transfer data from consecutive
memory locations to the le, this isnt necessarily the case. Memory access can, an often does, occur in a
random access fashion. However, once the application constructs a record to write to the le, it writes that
record in a sequential fashion (i.e., each record is written in order to the le). A memory dump is what
most applications do when you request that they save the programs current data to a le or read data from a
le into application memory. When writing, they gather all the important data from memory and write it to
the le in a sequential fashion; when reading (loading) data from a le, they read the data from the le in a
sequential fashion and store the data into appropriate memory-based data structures. Generally, when loading or saving le data in this manner, the program opens a le, reads/writes data from/to the le, and then it
closes the le. Very little processing takes place during the data transfer and the application does not leave
the le open for any length of time beyond what is necessary to read or write the les data.
Stream data on input is like data coming from a keyboard. The program reads the data at various points
in the application where it needs new input to continue. Similarly, stream data on output is like a write to the
console device. The application writes data to the le at various points in the program after important computations have taken place and the program wishes to report the results of the calculation. Note that when
reading data from a sequential le, once the program reads a particular piece of data, that data is no longer
available in future reads (unless, of course, the program closes and reopens the le). When writing data to a

Page 520

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
sequential le, once data is written, it becomes a permanent part of the output le. When processing this
kind of data the program typically opens a le and then continues execution. As program execution continues, the application can read or write data in the le. At some point, typically towards the end of the applications execution, the program closes the le and commits the data to disk.
Although disk drives are generally thought of as random access devices, the truth is that they are only
pseudo-random access; in fact, they perform much better when writing data sequentially on the disk surface.
Therefore, sequential access les tend to provide the highest performance (for sequential data) since they
match the highest performance access mode of the disk drive.
Working with sequential les in HLA is very easy. In fact, you already know most of the functions you
need in order to read or write sequential les. All thats left to learn is how to open and close les and perform some simple tests (like have we reached the end of a le when reading data from the le?).
The le I/O functions are nearly identical to the stdin and stdout functions. Indeed, stdin and stdout are
really nothing more than special le I/O functions that read data from the standard input device (a le) or
write data to the standard output device (which is also a le). You use the le I/O functions in a manner analogous to stdin and stdout except you use the leio prex rather than stdin or stdout. For example, to write a
string to an output le, you could use the leio.puts function almost the same way you use the stdout.puts
routine. Similarly, if you wanted to read a string from a le, you would use leio.gets. The only real difference between these function calls and their stdin and stdout counterparts is that you must supply an extra
parameter to tell the function what le to use for the transfer. This is a double word value known as the le
handle. Youll see how to initialize this le handle in a moment, but assuming you have a dword variable
that holds a le handle value, you can use calls like the following to read and write data to sequential les:
fileio.get( inputHandle, i, j, k ); // Reads i, j, k, from file inputHandle.
fileio.put( outputHandle, I = , i, J = , j, K = , k, nl );

Although this example only demonstrates the use of get and put, be aware that almost all of the stdin and stdout functions are available as leio functions, as well (in fact, most of the stdin and stdout functions simply
call the appropriate leio function to do the real work).
There is, of course, the issue of this le handle variable. Youre probably wondering what a le handle
is and how you tell the leio routines to work with data in a specic le on your disk. Well, the denition of
the le handle object is the easiest to explain its just a dword variable that the operating system initializes
and uses to keep track of your le. To declare a le handle, youd just create a dword variable, e.g.,
static
myFileHandle:dword;

You should never explicitly manipulate the value of a le handle variable. The operating system will initialize this variable for you (via some calls youll see in a moment) and the OS expects you to leave this value
alone as long as youre working with the le the OS associates with that handle. If youre curious, both
Linux and Windows store small integer values into the handle variable. Internally, the OS uses this value as
an index into an array that contains pertinent information about open les. If you mess with the le handles
value, you will confuse the OS greatly the next time you attempt to access the le. Moral of the story leave
this value alone while the le is open.
Before you can read or write a le you must open that le and associate a lename with it. The HLA
Standard Library provides a couple of functions that provide this service: leio.open and leio.openNew.
The leio.open function opens an existing le for reading, writing, or both. Generally, you open sequential
les for reading or writing, but not both (though there are some special cases where you can open a sequential le for reading and writing). The syntax for the call to this function is
fileio.open( filename, access );

The rst parameter is a string value that species the lename of the le to open. This can be a string constant, a register that contains the address of a string value, or a string variable. The second parameter is a
constant that species how you want to open the le. You may use any of the three predened constants for
the second parameter:
fileio.r

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 521

Chapter Seven

Volume Three
fileio.w
fileio.rw

leio.r obviously species that you want to open an existing le in order to read the data from that le;
likewise, leio.w says that you want to open an existing le and overwrite the data in that le. The leio.rw
option lets you open a le for both reading and writing.
The leio.open routine, if successful, returns a le handle in the EAX register. Generally, you will want
to save the return value into a double word variable for use by the other HLA leio routines (i.e., the MyFileHandle variable in the earlier example).
If the OS cannot open the le, leio.open will raise an ex.FileOpenFailure exception. This usually
means that it could not nd the specied le on the disk.
The leio.open routine requires that the le exist on the disk or it will raise an exception. If you want to
create a new le, that might not already exist, the leio.openNew function will do the job for you. This function uses the following syntax:
fileio.openNew( filename );

Note that this call has only a single parameter, a string specifying the lename. When you open a le with
leio.openNew, the le is always opened for writing. If a le by the specied lename already exists, then
this function will delete the existing le and the new data will be written over the top of the old le (so be
careful!).
Like leio.open, leio.openNew returns a le handle in the EAX register if it successfully opens the le.
You should save this value in a le handle variable. This function raises the ex.FileOpenFailure exception if
it cannot open the le.
Once you open a sequential le with leio.open or leio.openNew and you save the le handle value
away, you can begin reading data from an input le (leio.r) or writing data to an output le (leio.w). To do
this, you would use functions like leio.put as noted above.
When the le I/O is complete, you must close the le to commit the le data to the disk. You should
always close all les you open as soon as you are through with them so that the program doesnt consume
excess system resources. The syntax for leio.close is very simple, it takes a single parameter, the le handle value returned by leio.open or leio.openNew:
fileio.close( file_handle );

If there is an error closing the le, leio.close will raise the ex.FileCloseError exception. Note that Linux
and Windows automatically close all open les when an application terminates; however, it is very bad programming style to depend on this feature. If the system crashes (or the user turns off the power) before the
application terminates, le data may be lost. So you should always close your les as soon as you are done
accessing the data in that le.
The last function of interest to us right now is the leio.eof function. This function returns true (1) or
false (0) in the AL register depending on whether the current le pointer is at the end of the le. Generally
you would use this function when reading data from an input le to determine if there is more data to read
from the le. You would not normally call this function for output les; it always returns false2. Since the
leio routines will raise an exception if the disk is full, there is no need to waste time checking for end of le
(EOF) when writing data to a le. The syntax for leio.eof is
fileio.eof( file_handle );

The following program example demonstrates a complete program that opens and writes a simple text
le:

program SimpleFileOutput;

2. Actually, it will return true under Windows if the disk is full.

Page 522

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
#include( stdlib.hhf )
static
outputHandle:dword;
begin SimpleFileOutput;
fileio.openNew( myfile.txt );
mov( eax, outputHandle );
for( mov( 0, ebx ); ebx < 10; inc( ebx )) do
fileio.put( outputHandle, (type uns32 ebx ), nl );
endfor;
fileio.close( outputHandle );
end SimpleFileOutput;

Program 7.1

A Simple File Output Program

The following sample program reads the data that Program 7.1 produces and writes the data to the standard output device:

program SimpleFileInput;
#include( stdlib.hhf )
static
inputHandle:dword;
u:uns32;
begin SimpleFileInput;
fileio.open( myfile.txt, fileio.r );
mov( eax, inputHandle );
for( mov( 0, ebx ); ebx < 10; inc( ebx )) do
fileio.get( inputHandle, u );
stdout.put( ebx=, ebx, u=, u, nl );
endfor;
fileio.close( inputHandle );
end SimpleFileInput;

Program 7.2

A Sample File Input Program

There are a couple of interesting functions that you can use when working with sequential les. They
are the following:

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 523

Chapter Seven

Volume Three

fileio.rewind( fileHandle );
fileio.append( fileHandle );

The leio.rewind function resets the le pointer (the cursor into the le where the next read or write
will take place) back to the beginning of the le. This name is a carry-over from the days of les on tape
drives when the system would rewind the tape on the tape drive to move the read/write head back to the
beginning of the le.
If youve opened a le for reading, then leio.rewind lets you begin reading the le from the start (i.e.,
make a second pass over the data). If youve opened the le for writing, then leio.rewind will cause future
writes to overwrite the data youve previously written; you wont normally use this function with les
youve opened only for writing. If youve opened the le for reading and writing (using the leio.rw option)
then you can write the data after youve rst opened the le and then rewind the le and read the data youve
written. The following is a modication to Program 7.2 that reads the data le twice. This program also
demonstrates the use of leio.eof to test for the end of the le (rather than just counting the records).

program SimpleFileInput2;
#include( stdlib.hhf )
static
inputHandle:dword;
u:uns32;
begin SimpleFileInput2;
fileio.open( myfile.txt, fileio.r );
mov( eax, inputHandle );
for( mov( 0, ebx ); ebx < 10; inc( ebx )) do
fileio.get( inputHandle, u );
stdout.put( ebx=, ebx, u=, u, nl );
endfor;
stdout.newln();
// Rewind the file and reread the data from the beginning.
// This time, use fileio.eof() to determine when weve
// reached the end of the file.
fileio.rewind( inputHandle );
while( fileio.eof( inputHandle ) = false ) do
// Read and display the next item from the file:
fileio.get( inputHandle, u );
stdout.put( u=, u, nl );
//
//
//
//
//
//

Note: after we read the last numeric value, there is still


a newline sequence left in the file, if we dont read the
newline sequence after each number then EOF will be false
at the start of the loop and well get an EOF exception
when we try to read the next value. Calling fileio.ReadLn
eats the newline after each number and solves this problem.

fileio.readLn( inputHandle );

endwhile;

Page 524

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
fileio.close( inputHandle );
end SimpleFileInput2;

Program 7.3

Another Sample File Input Program

The leio.append function moves the le pointer to the end of the le. This function is really only useful for les youve opened for writing (or reading and writing). After executing leio.append, all data you
write to the le will be written after the data that already exists in the le (i.e., you use this call to append
data to the end of a le youve opened). The following program demonstrates how to use this program to
append data to the le created by Program 7.1:

program AppendDemo;
#include( stdlib.hhf )
static
fileHandle:dword;
u:uns32;
begin AppendDemo;
fileio.open( myfile.txt, fileio.rw );
mov( eax, fileHandle );
fileio.append( eax );
for( mov( 10, ecx ); ecx < 20; inc( ecx )) do
fileio.put( fileHandle, (type uns32 ecx), nl );
endfor;

// Okay, lets rewind to the beginning of the file and


// display all the data from the file, including the
// new data we just wrote to it:
fileio.rewind( fileHandle );
while( !fileio.eof( fileHandle )) do
// Read and display the next item from the file:
fileio.get( fileHandle, u );
stdout.put( u=, u, nl );
fileio.readLn( fileHandle );
endwhile;
fileio.close( fileHandle );
end AppendDemo;

Program 7.4

Demonstration of the fileio.Append Routine

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 525

Chapter Seven

Volume Three

Another function, similar to leio.eof, that will prove useful when reading data from a le is the
leio.eoln function. This function returns true if the next character(s) to be read from the le are the end of
line sequence (carriage return, linefeed, or the sequence of these two characters under Windows, just a line
feed under Linux). This function returns true or false in the EAX register if it detects an end of line
sequence. The calling sequence for this function is
fileio.eoln( fileHandle );

If leio.eoln detects an end of line sequence, it will read those characters from the le (so the next read
from the le will not read the end of line characters). If leio.eoln does not detect the end of line sequence,
it does not modify the le pointer position. The following sample program demonstrates the use of
leio.eoln in the AppendDemo program, replacing the call to leio.readLn (since leio.eoln reads the end of
line sequence, there is no need for the call to leio.readLn):

program EolnDemo;
#include( stdlib.hhf )
static
fileHandle:dword;
u:uns32;
begin EolnDemo;
fileio.open( myfile.txt, fileio.rw );
mov( eax, fileHandle );
fileio.append( eax );
for( mov( 10, ecx ); ecx < 20; inc( ecx )) do
fileio.put( fileHandle, (type uns32 ecx), nl );
endfor;

// Okay, lets rewind to the beginning of the file and


// display all the data from the file, including the
// new data we just wrote to it:
fileio.rewind( fileHandle );
while( !fileio.eof( fileHandle )) do
// Read and display the next item from the file:
fileio.get( fileHandle, u );
stdout.put( u=, u, nl );
if( !fileio.eoln( fileHandle )) then
stdout.put( Hmmm, expected the end of the line, nl );
endif;
endwhile;
fileio.close( fileHandle );
end EolnDemo;

Page 526

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
Program 7.5

7.4

fileio.eoln Demonstration Program.

Random Access Files


The problem with sequential les is that they are, well, sequential. They are great for dumping and
retrieving large blocks of data all at once, but they are not suitable for applications that need to read, write,
and rewrite the same data in a le multiple times. In those situations random access les provide the only
reasonable alternative.
Windows and Linux dont differentiate sequential and random access les anymore than the CPU differentiates byte and character values in memory; its up to your application to treat the les as sequential or
random access. As such, you use many of the same functions to manipulate random access les as you use
to manipulate sequential access les; you just use them differently is all.
You still open les with leio.open and leio.openNew. Random access les are generally opened for
reading or reading and writing. You rarely open a random access le as write-only since a program typically
needs to read data if its jumping around in the le.
You still close the les with leio.close.
You can read and write the les with leio.get and leio.put, although you would not normally use these
functions for random access le I/O because each record you read or write has to be exactly the same length
and these functions arent particularly suited for xed-length record I/O. Most of the time you will use one
of the following functions to read and write xed-length data:
fileio.write( fileHandle, buffer, count );
fileio.read( fileHandle, buffer, count );

The leHandle parameter is the usual le handle value (a dword variable). The count parameter is an uns32
object that species how many bytes to read or write. The buffer parameter must be an array object with at
least count bytes. This parameter supplies the address of the rst byte in memory where the I/O transfer will
take place. These functions return the number of bytes read or written in the EAX register. For leio.read, if
the return value in EAX does not equal counts value, then youve reached the end of the le. For
leio.write, if EAX does not equal count then the disk is full.
Here is a typical call to the leio.read function that will read a record from a le:
fileio.read( myHandle, myRecord, @size( myRecord ) );

If the return value in EAX does not equal @size( myRecord ) and it does not equal zero (indicating end of
le) then there is something seriously wrong with the le since the le should contain an integral number of
records.
Writing data to a le with leio.write uses a similar syntax to leio.read.
You can use leio.read and leio.write to read and write data from/to a sequential le, just as you can
use routines like leio.get and leio.put to read/write data from/to a random access le. Youd typically use
these routines to read and write data from/to a binary sequential le.
The functions weve discussed to this point dont let you randomly access records in a le. If you call
leio.read several times in a row, the program will read those records sequentially from the text le. To do
true random access I/O we need the ability to jump around in the le. Fortunately, the HLA Standard
Librarys le module provides several functions you can use to accomplish this.
The leio.position function returns the current offset into the le in the EAX register. If you call this
function immediately before reading or writing a record to a le, then this function will tell you the exact

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 527

Chapter Seven

Volume Three

position of that record. You can use this value to quickly locate that record for a future access. The calling
sequence for this function is
fileio.position( fileHandle ); // Returns current file position in EAX.

The leio.seek function repositions the le pointer to the offset you specify as a parameter. The following is the calling sequence for this function:
fileio.seek( fileHandle, offset ); // Repositions file to specified offset.

The function call above will reposition the le pointer to the byte offset specied by the offset parameter. If
you feed this function the value returned by leio.position, then the next read or write operation will access
the record written (or read) immediately after the leio.position call.
You can pass any arbitrary offset value as a parameter to the leio.seek routine; this value does not have
to be one that the leio.position function returns. For random access le I/O you would normally compute
this offset le by specifying the index of the record you wish to access multiplied by the size of the record.
For example, the following code computes the byte offset of record index in the le, repositions the le
pointer to that record, and then reads the record:
intmul( @size( myRecord ), index, ebx );
fileio.seek( fileHandle, ebx );
fileio.read( fileHandle, (type byte myRecord), @size( myRecord ) );

You can use essentially this same code sequence to select a specic record in the le for writing.
Note that it is not an error to seek beyond the current end of le and then write data. If you do this, the
OS will automatically ll in the intervening records with uninitialized data. Generally, this isnt a great way
to create les, but it is perfectly legal. On the other hand, be aware that if you do this by accident, you may
wind up with garbage in the le and no error to indicate that this has happened.
The leio module provides another routine for repositioning the le pointer: leio.rSeek. This functions calling sequence is very similar to leio.seek, it is
fileio.rSeek( fileHandle, offset );

The difference between this function and the regular leio.seek function is that this function repositions the
le pointer offset bytes from the end of the le (rather than offset bytes from the start of the le). The r in
rSeek stands for reverse seek.
Repositioning the le pointer, especially if you reposition it a fair distance from its current location, can
be a time-consuming process. If you reposition the le pointer and then attempt to read a record from the
le, the system may need to reposition a disk arm (a very slow process) and wait for the data to rotate underneath the disk read/write head. This is why random access I/O is much less efcient than sequential I/O.
The following program demonstrates random access I/O by writing and reading a le of records:

program RandomAccessDemo;
#include( stdlib.hhf )
type
fileRec:
record
x:int16;
y:int16;
magnitude:uns8;
endrecord;
const
// Some arbitrary data we can use to initialize the file:

Page 528

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
fileData:=
[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
fileRec:[
];

2000, 1, 1 ],
1000, 10, 2 ],
750, 100, 3 ],
500, 500, 4 ],
100, 1000, 5 ],
62, 2000, 6 ],
32, 2500, 7 ],
10, 3000, 8 ]

static
fileHandle:
RecordFromFile:
InitialFileData:

dword;
fileRec;
fileRec[ 8 ] := fileData;

begin RandomAccessDemo;
fileio.openNew( fileRec.bin );
mov( eax, fileHandle );
// Okay, write the initial data to the file in a sequential fashion:
for( mov( 0, ebx ); ebx < 8; inc( ebx )) do
intmul( @size( fileRec ), ebx, ecx );
fileio.write
(
fileHandle,
(type byte InitialFileData[ecx]),
@size( fileRec )
);

// Compute index into fileData

endfor;
// Okay, now lets demonstrate a random access of this file
// by reading the records from the file backwards.
stdout.put( Reading the records, backwards: nl );
for( mov( 7, ebx ); (type int32 ebx) >= 0; dec( ebx )) do
intmul( @size( fileRec ), ebx, ecx );
fileio.seek( fileHandle, ecx );
fileio.read
(
fileHandle,
(type byte RecordFromFile),
@size( fileRec )
);
if( eax = @size( fileRec )) then

// Compute file offset

stdout.put
(
Read record #,
(type uns32 ebx),
, values: nl
x: , RecordFromFile.x, nl

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 529

Chapter Seven

Volume Three

y: , RecordFromFile.y, nl
magnitude: , RecordFromFile.magnitude, nl nl

);
else
stdout.put( Error reading record number , (type uns32 ebx), nl );
endif;
endfor;
fileio.close( fileHandle );
end RandomAccessDemo;

Program 7.6

7.5

Random Access File I/O Example

ISAM (Indexed Sequential Access Method) Files


ISAM is a trick that attempts to allow random access to variable-length records in a sequential le. This
is a technique employed by IBM on their mainframe data bases in the 1960s and 1970s. Back then, disk
space was very precious (remember why we wound up with the Y2K problem?) and IBMs engineers did
everything they could to save space. At that time disks held about ve megabytes, or so, were the size of
washing machines, and cost tens of thousands of dollars. You can appreciate why they wanted to make every
byte count. Today, data base designers have disk drives with hundreds of gigabytes per drive and RAID3
devices with dozens of these drives installed. They dont bother trying to conserve space at all (Heck, I
dont know how big the persons name can get, so Ill allocate 256 bytes for it!). Nevertheless, even with
large disk arrays, saving space is often a wise idea. Not everyone has a terabyte (1,000 gigabytes) at their
disposal and a user of your application may not appreciate your decision to waste their disk space. Therefore, techniques like ISAM that can reduce disk storage requirements are still important today.
ISAM is actually a very simple concept. Somewhere, the program saves the offset to the start of every
record in a le. Since offsets are four bytes long, an array of dwords will work quite nicely4. Generally, as
you construct the le you ll in the list (array) of offsets and keep track of the number of records in the le.
For example, if you were creating a text le and you wanted to be able to quickly locate any line in the le,
you would save the offset into the le of each line you wrote to the le. The following code fragment shows
how you could do this:
static
outputLine: string;
ISAMarray: dword[ 128*1024 ]; // allow up to 128K records.
.
.
.
mov( 0, ecx );
// Keep record count here.
forever
<< create a line of text in outputLine >>
fileio.position( fileHandle );

3. Redundant array of inexpensive disks. RAID is a mechanism for combining lots of cheap disk drives together to form the
equivalent of a really large disk drive.
4. This assumes, of course, that your les have a maximum size of four gigabytes.

Page 530

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
mov( eax, ISAMarray[ecx*4] ); // Save away current record offset.
fileio.put( fileHandle, outputLine, nl ); // Write the record.
inc( ecx ); // Advance to next element of ISAMarray.
<< determine if were done and BREAK if we are >>
endfor;
<< At this point, ECX contains the number of records and >>
<< ISAMarray[0]..ISAMarray[ecx-1] contain the offsets to >>
<< each of the records in the file.
>>

After building the le using the code above, you can quickly jump to an arbitrary line of text by fetching
the index for that line from the ISAMarray list. The following code demonstrates how you could read line
recordNumber from the le:
mov( recordNumber, ebx );
fileio.seek( fileHandle, ISAMarray[ ebx*4 ] );
fileio.a_gets( fileHandle, inputString );

As long as youve precalculated the ISAMarray list, accessing an arbitrary line in this text le is a trivial
matter.
Of course, back in the days when IBM programmers were trying to squeeze every byte from their databases as possible so they would t on a ve megabyte disk drive, they didnt have 512 kilobytes of RAM to
hold 128K entries in the ISAMarray list. Although a half a megabyte is no big deal today, there are a couple
of reasons why keeping the ISAMarray list in a memory-based array might not be such a good idea. First,
databases are much larger these days. Some databases have hundreds of millions of entries. While setting
aside a half a megabyte for an ISAM table might not be a bad thing, few people are willing to set aside a half
a gigabyte for this purpose. Even if your database isnt amazingly big, there is another reason why you
might not want to keep your ISAMarray in main memory its the same reason you dont keep the le in
memory memory is volatile and the data is lost whenever the application quits or the user removes power
from the system. The solution is exactly the same as for the le data: you store the ISAMarray data in its
own le. A program that builds the ISAM table while writing the le is a simple modication to the previous ISAM generation program. The trick is to open two les concurrently and write the ISAM data to one
le while youre writing the text to the other le:
static
fileHandle: dword;
outputLine: string;
CurrentOffset: dword;
.
.
.
forever

// file handle for the text file.


// file handle for the ISAM file.
// Holds the current offset into the text file.

<< create a line of text in outputLine >>


// Get the offset of the next record in the text file
// and write this offset (sequentially) to the ISAM file.
fileio.position( fileHandle );
mov( eax, CurrentOffset );
fileio.write( isamHandle, (type byte CurrentOffset), 4 );
// Okay, write the actual text data to the text file:
fileio.put( fileHandle, outputLine, nl ); // Write the record.
<< determine if were done and BREAK if we are >>

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 531

Chapter Seven

Volume Three

endfor;

If necessary, you can count the number of records as before. You might write this value to the rst record of
the ISAM le (since you know the rst record of the text le is always at offset zero, you can use the rst
element of the ISAM list to hold the count of ISAM/text le records).
Since the ISAM le is just a sequence of four-byte integers, each record in the le (i.e., an integer) has
the same length. Therefore, we can easily access any value in the ISAM le using the random access le I/O
mechanism. In order to read a particular line of text from the text le, the rst task is to read the offset from
the ISAM le and then use that offset to read the desired line from the text le. The code to accomplish this
is as follows:
// Assume we want to read the line specified by the lineNumber variable.
if( lineNumber <> 0 ) then
// If not record number zero, then fetch the offset to the desired
// line from the ISAM file:
intmul( 4, lineNumber, eax );
// Compute the index into the ISAM file.
fileio.seek( isamHandle, eax );
fileio.read( isamHandle, (type byte CurrentOffset), 4 ); // Read offset
else
mov( 0, eax );

// Special case for record zero because the file


// contains the record count in this position.

endif;
fileio.seek( fileHandle, CurrentOffset ); // Set text file position.
fileio.a_gets( fileHandle, inputLine );
// Read the line of text.

This operation runs at about half the speed of having the ISAM array in memory (since it takes four le
accesses rather than two to read the line of text from the le), but the data is non-volatile and is not limited
by the amount of available RAM.
If you decide to use a memory-based array for your ISAM table, its still a good idea to keep that data in
a le somewhere so you dont have to recompute it (by reading the entire le) every time your application
starts. If the data is present in a le, all youve got to do is read that le data into your ISAMarray list.
Assuming youve stored the number of records in element number zero of the ISAM array, you could use the
following code to read your ISAM data into the ISAMarray variable:
static
isamSize: uns32;
isamHandle: dword;
fileHandle: dword;
ISAMarray: dword[ 128*1024 ];
.
.
.
// Read the first record of the ISAM file into the isamSize variable:
fileio.read( isamHandle, (type byte isamSize), 4 );
// Now read the remaining data from the ISAM file into the ISAMarray
// variable:
if( isamSize >= 128*1024 ) then
raise( ex.ValueOutOfRange );

Page 532

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
endif;
intmul( 4, isamSize, ecx ); // #records * 4 is number of bytes to read.
fileio.read( isamHandle, (type byte ISAMarray), ecx );
// At this point, ISAMarray[0]..ISAMarray[isamSize-1] contain the indexes
// into the text file for each line of text.

7.6

Truncating a File
If you open an existing le (using leio.open) for output and write data to that le, it overwrites the
existing data from the start of the le. However, if the new data you write to the le is shorter than the data
originally appearing in the le, the excess data from the original le, beyond the end of the new data youve
written, will still appear at the end of the new data. Sometimes this might be desirable, but most of the time
youll want to delete the old data after writing the new data.
One way to delete the old data is to use the leio.openNew function to open the le. The leio.openNew
function automatically deletes any existing le so only the data you write to the le will be present in the
le. However, there may be times when you may want to read the old data rst, rewind the le, and then
overwrite the data. In this situation, youll need a function that will truncate the old data at the end of the le
after youve written the new data. The leio.truncate function accomplishes this task. This function uses the
following calling syntax:
fileio.truncate( fileHandle );

Note that this function does not close the le. You still have to call leio.close to commit the data to the disk.
The following sample program demonstrates the use of the leio.truncate function:

program TruncateDemo;
#include( stdlib.hhf )
static
fileHandle:dword;
u:uns32;
begin TruncateDemo;
fileio.openNew( myfile.txt );
mov( eax, fileHandle );
for( mov( 0, ecx ); ecx < 20; inc( ecx )) do
fileio.put( fileHandle, (type uns32 ecx), nl );
endfor;

// Okay, lets rewind to the beginning of the file and


// rewrite the first ten lines and then truncate the
// file at that point.
fileio.rewind( fileHandle );
for( mov( 0, ecx ); ecx < 10; inc( ecx )) do
fileio.put( fileHandle, (type uns32 ecx), nl );

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 533

Chapter Seven

Volume Three

endfor;
fileio.truncate( fileHandle );

// Rewind and display the file contents to ensure that


// the file truncation has worked.
fileio.rewind( fileHandle );
while( !fileio.eof( fileHandle )) do
// Read and display the next item from the file:
fileio.get( fileHandle, u );
stdout.put( u=, u, nl );
fileio.readLn( fileHandle );
endwhile;
fileio.close( fileHandle );
end TruncateDemo;

Program 7.7

7.7

Using fileio.truncate to Eliminate Old Data From a File

File Utility Routines


The following subsections describe leio functions that manipulate les or return meta-information
about les (e.g., the le size and attributes).

7.7.1 Copying, Moving, and Renaming Files


Some very useful le utilities are copying, moving, and renaming les. For example, you might want to
copy a le an application has created in order to make a backup copy. Moving les from one subdirectory to
another, or even from one disk to another is another common operation. Likewise, the need to change the
name of a le arises all the time. In this section well take a look at the HLA Standard Library routines that
accomplish these operations.
Copying a le is a nearly trivial process under Windows5. All youve got to do is open a source le,
open a destination le, then read the bytes from the source le and write them to the destination le until you
hit end of le. Unfortunately, this simple approach to copying a le can suffer from performance problems.
Windows provides an internal function to copy les using a high performance algorithm (Linux does not
provide this call). The HLA Standard Library leio.copy function provides an interface to this copy operation. The copy a le using the leio.copy procedure, youd use the following call sequence:
fileio.copy( sourcefileName, destFileName, failIfExists );

The sourceFileName and destFileName parameters are strings that specify the pathnames of the source and
destination les. These can be string constants or variables. The last parameter is a boolean variable that
species what should happen if the destination le exists. If this parameter contains true and the le already
exists, then the function will fail; if failIfExists is false, the leio.copy routine will replace the existing destination le with a copy of the source le. In either case, of course, the source le must exist or this function

5. Sorry, the leio.copy function is not available under Linux.

Page 534

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
will fail. This function returns a boolean success/failure result in the EAX register. It returns true if the
function returns TRUE in EAX.
Program 7.8 demonstrates the use of this function to copy a le:

program CopyDemo;
#include( stdlib.hhf )
begin CopyDemo;
// Make a copy of myfile.txt to itself to demonstrate
// a true failsIfExists parameter.
if( !fileio.copy( myfile.txt, myfile.txt, true )) then
stdout.put( Did not copy myfile.txt over itself nl );
else
stdout.put( Whoa!

The failsIfExists parameter didnt work. nl );

endif;
// Okay, make a copy of the file to a different file, to verify
// that this works properly:
if( fileio.copy( myfile.txt, copyOfMyFile.txt, false )) then
stdout.put( Successfully copied the file nl );
else
stdout.put( Failed to copy the file (maybe it doesnt exist?) nl );
endif;
end CopyDemo;

Program 7.8

Demonstration of a fileio.copy Operation

To move a le from one location to another might seem like another trivial task all youve got to do is
copy the le to the destination and then delete the original le. However, these scheme is quite inefcient in
most situations. Copying the le can be an expensive process if the le is large; Worse, the move operation
may fail if youre moving the le to a new location on the same disk and there is insufcient space for a second copy of the le. A much better solution is to simply move the les directory entry from one location to
another on the disk. Win32s disk directory entries are quite small, so moving a le to a different location on
the same disk by simply moving its directory entry is very fast and efcient. Unfortunately, if you move a
le from one le system (disk) to another, you will have to rst copy the le and then delete the original le.
Once again, you dont have to bother with the complexities of this operation because Windows has a built-in
function that automatically moves les for you. The HLA Standard Librarys leio.move procedure provides
a direct interface to this function (available only under Windows). The calling sequence is
fileio.move( source, dest );

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 535

Chapter Seven

Volume Three

The two parameters are strings providing the source and destination lenames. This function returns true or
false in EAX to denote the success or failure of the operation.
Not only can the leio.move procedure move a le around on the disk, it can also move subdirectories
around. The only catch is that you cannot move a subdirectory from one volume (le system/disk) to
another.
If both the destination and source lenames are simple lenames, not a pathnames, then the leio.move
function moves the source le from the current directory back to the current directory. Although this seems
rather weird, this is a very common operation; this is how you rename a le. The HLA Standard Library
does not have a separate leio.rename function. Instead, you use the leio.move function to rename les
by moving them to the same directory but with a different lename. Program 7.9 demonstrates how to use
leio.move in this capacity.

program FileMoveDemo;
#include( stdlib.hhf )
begin FileMoveDemo;
// Rename the myfile.txt file to the name renamed.txt.
if( !fileio.move( myfile.txt, renamed.txt )) then
stdout.put
(
Could not rename myfile.txt (maybe it doesnt exist?) nl
);
else
stdout.put( Successfully renamed the file nl );
endif;

end FileMoveDemo;

Program 7.9

Using fileio.move to Rename a File

7.7.2 Computing the File Size


Another useful function to have is one that computes the size of an existing le on the disk. The
leio.size function provides this capability. The calling sequences for this function are
fileio.size( filenameString );
fileio.size( fileHandle );

The rst form above expects you to pass the lename as a string parameter. The second form expects a handle to a le youve opened with leio.open or leio.openNew. These two calls return the size of the le in
EAX. If an error occurs, these functions return -1 ($FFFF_FFFF) in EAX. Note that the les must be less
than four gigabytes in length when using this function (if you need to check the size of larger les, you will
have to call the appropriate OS function rather than these functions; however, since les larger than four
gigabytes are rather rare, you probably wont have to worry about this problem).

Page 536

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
One interesting use for this function is to determine the number of records in a xed-length-record random access le. By getting the size of the le and dividing by the size of a record, you can determine the
number of records in the le.
Another use for this function is to allow you to determine the size of a (smaller) le, allocate sufcient
storage to hold the entire le in memory (by using malloc), and then read the entire le into memory using
the leio.read function. This is generally the fastest way to read data from a le into memory.
Program 7.10 demonstrates the use of the two forms of the leio.size function by displaying the size of
the myle.txt le created by other sample programs in this chapter.

program FileSizeDemo;
#include( stdlib.hhf )
static
handle:dword;
begin FileSizeDemo;
// Display the size of the FileSizeDemo.hla file:
fileio.size( FileSizeDemo.hla );
if( eax <> -1 ) then
stdout.put( Size of file: , (type uns32 eax), nl );
else
stdout.put( Error calculating file size nl );
endif;
// Same thing, using the file handle as a parameter:
fileio.open( FileSizeDemo.hla, fileio.r );
mov( eax, handle );
fileio.size( handle );
if( eax <> -1 ) then
stdout.put( Size of file(2): , (type uns32 eax), nl );
else
stdout.put( Error calculating file size nl );
endif;
fileio.close( handle );

end FileSizeDemo;

Program 7.10

Sample Program That Demonstrates the fileio.size Function

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 537

Chapter Seven

Volume Three

7.7.3 Deleting Files


Another useful le utility function is the leio.delete function. As its name suggests, this function
deletes a le that you specify as the functions parameter. The calling sequence for this function is
fileio.delete( filenameToDelete );

The single parameter is a string containing the pathname of the le you wish to delete. This function returns
true/false in the EAX register to denote success/failure.
Program 7.11 provides an example of the use of the leio.delete function.

program DeleteFileDemo;
#include( stdlib.hhf )
static
handle:dword;
begin DeleteFileDemo;
// Delete the myfile.txt file:
fileio.delete( xyz );
if( eax ) then
stdout.put( Deleted the file, nl );
else
stdout.put( Error deleting the file nl );
endif;

end DeleteFileDemo;

Program 7.11

7.8

Example Usage of the fileio.delete Procedure

Directory Operations
In addition to manipulating les, you can also manipulate directories with some of the leio functions.
The HLA Standard Library includes several functions that let you create and use subdirectories. These functions are leio.cd (change directory), leio.gwd (get working directory), and leio.mkdir (make directory).
Their calling sequences are
fileio.cd( pathnameString );
fileio.gwd( stringToHoldPathname );
fileio.mkdir( newDirectoryName );

The leio.cd and leio.mkdir functions return success or failure (true or false, respectively) in the EAX register. For the leio.gwd function, the string parameter is a destination string where the system will store the
pathname to the current directory. You must allocate sufcient storage for the string prior to passing the
string to this function (260 characters6 is a good default amount if youre unsure how long the pathname
Page 538

2001, By Randall Hyde

Beta Draft - Do not distribute

File I/O
could be). If the actual pathname is too long to t in the destination string you supply as a parameter, the
leio.gwd function will raise the ex.StringOverow exception.
The leio.cd function sets the current working directory to the pathname you specify. After calling this
function, the OS will assume that all future unadorned le references (those without any \ or / characters in the pathname) will default to the directory you specify as the leio.cd parameter. Proper use of this
function can help make your program much more convenient to use by your programs users since they
wont have to enter full pathnames for every le they manipulate.
The leio.gwd function lets you query the system to determine the current working directory. After a
call to leio.cd, the string that leio.gwd returns should be the same as leio.cds parameter. Typically, you
would use this function to keep track of the default directory when your program rst starts running. You
program will exhibit good manners by switching back to this default directory when your program terminates.
The leio.mkdir function lets your program create a new subdirectory. If your program creates data les
and stores them in a default directory somewhere, its good etiquette to let the user specify the subdirectory
where your program should put these les. If you do this, you should give your users the option to create a
new directory (in case they want the data placed in a brand-new directory). You can use leio.mkdir for this
purpose.

7.9

Putting It All Together


This chapter began with a discussion of the basic le operations. That section was rather short because
youve already learned most of what you need to know about le I/O when learning the stdout and stdin
functions. So the introductory material concentrated on a le general le concepts (like the differences
between sequential and random access les and the differences between binary and text les). After teaching you the few extra routines you need in order to open and close les, the remainder of this chapter simply
concentrated on providing a few examples (like ISAM) of le access and a discussion of the leio routines
available in the HLA Standard Library.
While this chapter demonstrates the mechanics of le I/O, how you efciently use les is well beyond
the scope of this chapter. In future volumes you will see how to search for data in les, sort data in les, and
even create databases. So keep on reading if youre interested in more information about le operations.

6. This is the default MAX_PATH value in Windows. This is probably sufcient for most Linux applications, too.

Beta Draft - Do not distribute

2001, By Randall Hyde

Page 539

Chapter Seven

Page 540

Volume Three

2001, By Randall Hyde

Beta Draft - Do not distribute

You might also like