0% found this document useful (0 votes)
494 views40 pages

18IS61 FSmodule1 Notes

The document discusses file structures and fundamental file operations. It covers topics like file structure design goals, a history of file structure design, physical and logical files, opening, closing, reading and writing files. It also discusses file access modes, protection modes, and input/output in UNIX.

Uploaded by

Sybil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
494 views40 pages

18IS61 FSmodule1 Notes

The document discusses file structures and fundamental file operations. It covers topics like file structure design goals, a history of file structure design, physical and logical files, opening, closing, reading and writing files. It also discusses file access modes, protection modes, and input/output in UNIX.

Uploaded by

Sybil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

FILE STRUCTURES 18IS61

TABLE OF CONTENTS
Module 1- Chapter 1, 2,3,4,5
INTRODUCTION: FILE STRUCTURES, FUNDAMENTAL FILE OPERATIONS, SECONDARY
STORAGE AND SYSTEM SOFTWARE, FUNDAMENTAL FILE STRUCTURE CONCEPTS,
MANAGING FILES OF RECORDS
Chapter1: INTRODUCTION
1.1 The Heart of the file structure Design
1.2 A Short History of File Structure Design
1.3 A Conceptual Toolkit
2.1 Physical Files and Logical Files
2.2 Opening Files
2.3 Closing Files
2.4 Reading and Writing
2.5 Seeking
2.6 Special Characters in Files
2.7 The Unix Directory Structure
2.8 Physical devices and Logical Files
2.9 File-related Header Files
2.10 UNIX file System Commands
3.1 Disks
3.2 Magnetic Tape
3.3 Disk versus Tape
3.4 Introduction to CD-ROM
3.5 Physical Organization of CD-ROM
3.6 CD-ROM Strengths and Weaknesses
3.7 Storage as Hierarchy
3.8 A journey of a Byte
3.9 Buffer Management
3.10 Input /Output in UNIX.
4.1 Field and Record Organization
4.2 Using Classes to Manipulate Buffers
4.3 Using Inheritance for Record Buffer Classes
4.4 Managing Fixed Length, Fixed Field Buffers
4.5 An Object-Oriented Class for Record Files
5.1 Record Access
5.2 More about Record Structures
5.3 Encapsulating Record Operations in a Single Class
5.4 File Access and File Organization.

DEPT OF ISE
FILE STRUCTURES 18IS61

Chapter: Introduction to the Design and Specification of File Structures

1.1 The Heart of the file structure Design


File : A data structure on secondary storage which acts as a non-volatile
container for data. File is a name given to any kind of document stored in
any type of storage device which can be read by the computer. A file is
identified by a name followed by a filename extension.

File Structure : A pattern for arranging data in a file. It is a combination of representations


for data in files and of operations for accessing the data.
Primary Goals for Design of File Structures and Algorithms
1) Minimize the number of disk accesses.
• If possible , transfer all information needed in one access.
• Group related information physically so it can be accessed together.
2) Maximize the space utilization
• Use compression techniques wherever possible
• Apply defragmentation procedures
• Avoid data redundancy
Problems and Concerns
• File data is frequently dynamic - that is, it changes from time to time.
• Designing file structures for changes adds complexity.
• Typical file sizes are growing.
• Solutions which work for small files may be inadequate for large files.
• File structures, algorithms, and data structures must work together.

1.2 A Short History of File Structure Design


• Earlier, the file access was sequential, and the cost of access grew in direct
proportion to the size of the file. So, Indexes were added to files.
• Indexes made it possible to keep a list of keys and pointers in a smaller file that could
be searched more quickly.
• Simple indexes became difficult to manage for dynamic files in which the set of keys
changes. Hence tree structures were introduced.
• Trees grew unevenly as records were added and deleted, resulting in long searches
requiring multiple disk accesses to find a record. Hence an elegant, self-adjusting
binary tree structure called an AVL tree was developed for data in memory.

DEPT OF ISE 1
FILE STRUCTURES 18IS61

• Even with a balanced binary tree, dozens of accesses were required to find a record
in moderate-sized files.
• A method was needed to keep a tree balanced when each node of the tree was not a
single record, as in a binary tree, but a file block containing hundreds of records.
Hence, B-Trees were introduced.
• AVL trees grow from top down as records are added, B-Trees grow from the bottom
up.
• B-Trees provided excellent access performance but, a file could not be accessed
sequentially with efficiency.
• The above problem was solved using B+ tree which is a combination of a B-Tree and
a sequential linked list added at the bottom level of the B-Tree.
• To further reduce the number of disk accesses, hashing was introduced for files that
do not change size greatly over time.
• Extendible, dynamic hashing was introduced for volatile, dynamic files which
change.

CHAPTER 2: FUNDAMENTAL FILE OPERATIONS

2. Fundamental File Processing Operations


2.1 Physical Files and Logical Files
Physical file
A file as seen by the operating system, and which actually exists on secondary
storage.
Logical file
A file as seen by a program.

• Programs read and write data from logical files.


• Before a logical file can be used, it must be associated with a physical file.
• This act of connection is called "opening" the file.
• Data in a physical file is persistent.
• Data in a logical file is temporary.
• A logical file is identified (within the program) by a program variable or constant.
• The name and form of the physical file are dependent on the operating system, not
on the programming language.

DEPT OF ISE 2
FILE STRUCTURES 18IS61

2.2 Opening Files


Open: To associate a logical program file with a physical system file.
We have two options: 1) open an existing file or 2) Create a new file, deleting any
existing contents in the physical file.
Opening a file makes it ready for use by the program

The C++ open function is used to open a file.


The open function must be supplied with (as arguments):
o The name of the physical file
o The access mode
o For new files, the protection mode
The value returned by the open is the fd, and is assigned to the file variable.
Function to open a file:
fd = open(filename,flags[,pmode]);
fd-file descriptor
A cardinal number used as the identifier for a logical file by operating systems such as
UNIX and PC-DOS.
For handle level access, the logical file is declared as an int.
The handle is also known as afile descriptor.
Prototypes:
int open (const char* Filename, int Access);
int open (const char* Filename, int Access, int Protection);

DEPT OF ISE 3
FILE STRUCTURES 18IS61

Example:
int Input;
Input= open ("Daily.txt", O_RDONL Y);
The following flags can be bitwise ared together for the access mode:
O_RDONLY : Read only
0 WRONLY: Write only
0 RDWR Read or write
0 CREAT Create file if it does not exist
0 EXCL If the file exists , truncate it to a length of zero, destroying its
contents. (used only with O_CREAT)
0 APPEND Append every write operation to the end of the file
0 TRUNC Delete any prior file contents
Pmode- protection mode
The security status of a file, defining who is allowed to access a file, and which access
modes are allowed.

• Supported protection modes depend on the operating system, not on the


programming language.
• DOS supports protection modes of:
o Read only
o Hidden
o System
for all uses of the system.
• UNIX supports protection modes of:
o Readable
o Writable
o Executable
for users in three categories:
o Owner (Usually the user who created the file)
o Group (members of the same group as the owner)
o World (all valid users of the system)
• Windows supports protection modes of:
o Readable
o Modifiable
o Writable
o Executable

DEPT OF ISE 4
FILE STRUCTURES 18IS61

for users which can be designated individually or in groups.:


In Unix, the pmode is a three digit octal number that indicates how the file can be used by
the owner(first digit), by members of the owner's group(second digit), and by everyone
else(third digit). For example, if pmode is 0751, it is interpreted as

Example:

2.3 Closing Files


close
To disassociate a logical program file from a physical system file.

• Closing a file frees system resources for reuse.


• Data may not be actually written to the physical file until a logical file is closed.
• A program should close a file when it is no longer needed.
The C++ close function is used to close a file for handle level access.
The handle close function must be supplied with (as an argument):
o The handle of the logical file
The value returned by the close is O if the close succeeds, and -1 if the close fails..
Prototypes:
int close (int Handle);
Example:
close (Input);

2.4 Reading and Writing


read
To transfer data from a file to program variable(s).
write
To transfer data to a file from program variable(s) or constant(s).
• The read and write operations are performed on the logical file with calls to library
functions.
• For read, one or more variables must be supplied to the read function, to receive the
data from the file.

DEPT OF ISE 5
FILE STRUCTURES 18IS61

• For write, one or more values (as variables or constants) must be supplied to the
write function, to provide the data for the file.
• For unformatted transfers, the amount of data to be transferred must also be
supplied.
2.4.1 Read and Write Functions
Reading
• The C++ read function is used to read data from a file for handle level access.
• The read function must be supplied with (as an arguments):
o The source file to read from
o The address of the memory block into which the data will be stored
o The number of bytes to be read(byte count)
• The value returned by the read function is the number of bytes read.
Read function:

Prototypes:
int read (int Handle, void* Buffer, unsigned Length);
Example:
read (Input, &C, 1);
Writing
• The C++ write function is used to write data to a file for handle level access.
• The handle write function must be supplied with (as an arguments):
o The logical file name used for sending data
o The address of the memory block from which the data will be written
o The number of bytes to be write
• The value returned by the write function is the number of bytes written.
Write function:

Prototypes:
int write (int Handle, void* Buffer, unsigned Length);
Example:
write (Output, &C, 1);
2.4.2 Files with C Streams and C++ Stream Classes
• For FILE level access, the logical file is declared as a pointer to a FILE (FILE*)
• The FILE structure is defined in the stdio.h header file.
Opening

DEPT OF ISE 6
FILE STRUCTURES 18IS61

The C++ [open function is used to open a file for FILE level access.
• The FILE fopen function must be supplied with (as arguments):
o The name of the physical file
o The access mode
• The value returned by the fopen is a pointer to an open FILE, and is assigned to the
file variable.
fopen function:

Prototypes:
FILE* fopen (const char* Filename, char* Access);
Example:
FILE * Input;
Input= fopen ("Daily.txt", "r");
The access mode should be one of the following strings:
r
Open for reading (existing file only) in text mode
r+
Open for update (existing file only)
w
Open (or create) for writing (and delete any previous data)
w+
Open (or create) for update (and delete any previous data)
a
Open (or create) for append with file pointer at current EOF (and keep any previous
data) in text mode
a+
Open (or create) for append update (and keep any previous data)
Closing
The C++ {close function is used to close a file for FILE level access.
The FILE fclose function must be supplied with (as an argument):
o A pointer to the FILE structure of the logical file
The value returned by the fclose is 0 if the close succeeds, and &neq;0 if the close fails..
Prototypes:
int fclose (FILE * Stream);
Example:
fclose (Input);

DEPT OF ISE 7
FILE STRUCTURES 18IS61

Reading
The C++ /read function is used to read data from a file for FILE level access.
The FILE fread function must be supplied with (as an arguments):
o A pointer to the FILE structure of the logical file
o The address of the buffer into which the data will be read
o The number of items to be read
o The size of each item to be read, in bytes
The value returned by the fread function is the number of items read.
Prototypes:
size_t fread (void* Buffer, size_t Size, size_t Count, FILE* Stream);
Example:
fread (&C, 1, 1, Input);
Writing
The C++ [write function is used to write data to a file for FILE level access.
The FILE fwrite function must be supplied with (as an arguments):
o A pointer to the FILE structure of the logical file
o The address of the buffer from which the data will be written
o The number of items to be written
o The size of each item to be written, in bytes
The value returned by the fivrite function is the number of items written.
Prototypes:
size_t fwrite (void* Buffer, size_t Size, size_t Count, FILE* Stream);
Example:
fwrite (&C, 1, 1, Output);
2.4.3 Programs in C++ to Display the contents of a File
The first simple file processing program opens a file for input and reads it, character by
character, sending each character to the screen after it is read from the file. This program
includes the following steps
1. Display a prompt for the name of the input file.
2. Read the user's response from the keyboard into a variable called filename.
3. Open the file for input.
4. While there are still characters to be read from the input file,
• Read a character from the file;
• Write the character to the terminal screen.
5. Close the input file.
Figures 2.2 and 2.3 are C++ implementations of this program using C streams and C++ stream
classes, respectively.

DEPT OF ISE 8
FILE STRUCTURES 18IS61

In the C++ version, the call file.unsetf(ios::skipws) causes operator>> to include white
space (blanks, end-of-line,tabs, ans so on).
2.4.4 Detecting End of File
end-of-file
A physical location just beyond the last datum in a file.

DEPT OF ISE 9
FILE STRUCTURES 18IS61

• The acronym for end-of-file is EOF.


• When a file reaches EOF, no more data can be read.
• Data can be written at or past EOF.
• Some access methods set the end of file flage after a read reaches the end of file
position.
Other access methods set the end of file flag after a read attempts to read beyond the end of file
position.
Detecting End of File
The C++ft.Q[function is used to detect when the file pointer of an fstream is past end of
file..
The FILE feof function has one argument.
o A pointer to the FILE structure of the logical file
The value returned by the feof function is 1 if end of file is true and O if end of file is
false.
Prototypes:
int feof (FILE * Stream);
Example:
if (feof (Input))
cout << "End of File\n";
In some languages , a function end_of_file can be used to test for end-of-file. The OS
keeps track of read /write pointer. The end_of_file function queries the system to see whether the
read/write pointer has moved past the last element in the file.

2.5 Seeking
The action of moving directly to a certain position in a file is called seeking.
seek
To move to a specified location in a file.
byte offset
The distance , measured in bytes, from the beginning.
• Seeking moves an attribute in the file called the file pointer.
• C++ library functions allow seeking.
• In DOS, Windows, and UNIX, files are organized as streams of bytes, and locations
are in terms of byte count.
• Seeking can be specified from one of three reference points:
o The beginning of the file.
o The end of the file.

DEPT OF ISE 10
FILE STRUCTURES 18IS61

o The current file pointer position.


A seek requires two arguments

Example

2.5.1 Seeking with C Streams


Fseek function:

The C++ (seek function is used to move the file pointer of a file identified by its FILE
structure.
The FILE fseek function must be supplied with (as an arguments):
o The file descriptor of the file(file)
o The number of bytes to move from some origin in the file(byte_offset)
o The starting point from which the byte_offset is to be taken(origin)
The Origin argument should be one of the following, to designate the reference point:
SEEK_SET: Beginning of file
SEEK_CUR: Current file position
SEEK END: End of file
The value returned(pos) by the fseek function is the positon of the read/write pointer from
the beginning of the file after its moved
Prototypes:
long fseek (FILE * file, long Offset, int Origin);
Example:
long pos;
fseek (FILE * file, long Offset, int Origin);

pos=fseek (Output, 100, SEEK_BEG) ;

2.5.2 Seeking with C++ Stream Classes


In C++, an object of type fstream has 2 file pointers:a get pointer for input and a put
pointer for output. Two functions for seeking are
seekg: moves get pointer
seekp: moves put pointer

DEPT OF ISE 11
FILE STRUCTURES 18IS61

syntax for seek operations:

2.6 Special Characters in Files


• When DOS files are opened in text mode, the internal separator ('\n') is translated to
the the external separator (<CR><LF>) during read and write.CR-carriage return,
LF-Line feed
• When DOS files are opened in binary mode, the internal separator ('\n')
is not translated to the the external separator (<CR><LF>) during read and write.
• In DOS (Windows) files, end-of-file can be marked by a "control-Z" character
(ASCII SUB ).
• In C++ implementations for DOS, a control-Zin a file is interpreted as end-of-file.

2.7 The Unix Directory Structure


• The Unix file system is a tree-structured organization of directories,with the root of the
tree signified by the character /.
• In UNIX, the directory structure is a single tree for the entire file system.

• In UNIX, separate disks appear as subdirectories of the root(/).


• In UNIX, the subdirectories of a pathname are separated by the forward slash character
(/).
• Example: /usr/bin/perl
• The directory structure of UNIX is actually a graph, since symbolic links allow entries to
appear at more than one location in the directory structure.

DEPT OF ISE 12
FILE STRUCTURES 18IS61

2.8 Physical Devices and Logical Files


2.8.1 Physical devices as files
In unix, devices like keyboard and console are also files. The keyboard produces a
sequence of bytes that are sent to the computer when keys are pressed. The console accepts a
sequence of bytes and displays the symbols on screen.
A Unix file is represented logically by an integer-the file descriptor
A keyboard, a disk file, and a magnetic tape are all represented by integers.
This view of a file in Unix makes it possible to do with a very few operations compared
to other OS.

2.8.2 The console, the keyboard and standard error


In C streams, the keyboard is called stdin(standard input),console 1s called
stdout(standard output) error file is called stderr(standard error).
Handle FILE iostream Description
0 stdin Cin Standard Input

1 stdout Cout Standard Output


2 stderr Cerr Standard Error

2.8.3 1/0 redirection and Pipes


Operating systems provide shortcuts for switching between standard 1/O(stdin and stdout)
and regular file 1/0
1/0 redirection is used to change a program so it writes its output to a regular file rather
than to stdout.
• In both DOS and UNIX, the standard output of a program can be redirected to a file
with the > symbol.
• In both DOS and UNIX, the standard input of a program can be redirected to a file
with the < symbol.
The notations for input and output redirection on the command line in Unix are

Example:

The output of the executable file is redirected to a file called " myfile"
pipe
Piping: using the output of one program as input to another program.

DEPT OF ISE 13
FILE STRUCTURES 18IS61

A connection between standard output of one process and standard input of a second
process.
• In both DOS and UNIX, the standard output of one program can be piped
(connected) to the standard input of another program with the I symbol.
• Example:

Output of program 1 is used as input for program2

2.9 File-Related Header Files


• Header files can vary with the C++ implementation.
Stdio.h, iostream.h, fstream.h, fcntl.h and file.h are some of the header files used in
different operating systems

2.10 Unix File System Commands


IUNI X !Description

cat filename Type the contents of a file


tail filename Type the last ten lines of a file
cp file1 file2 Copy file1 to file2
mv file1 file2 Move(rename) filel to file2
rm filenames Delete files
chmod mode filename Change the protection mode
ls List contents of a directory
mkdir Create directory
rmdir Remove directory

CHAPTER 3: SECONDARY STORAGE AND SYSTEM SOFTWARE


3. Secondary Storage and Systems Software
3.1 Disks
• Compared with time for memory access, disk access is always expensive.
• Disk drives belong to a class of devices called direct access storage
devices(DASDs).
• Hard-disks offer high capacity and low cost per bit(commonly used).

DEPT OF ISE 14
FILE STRUCTURES 18IS61

• Floppy disks are inexpensive , slow and hold little data.


• Removable disks use disk cartridges that can be mounted on same drive at different
times.data can be accessed directly.
3.1.1 Organization of Disks
• The information on disk is stored on the surface of 1 or more platters.(Fig 3.1)
• The information is stored in successive tracks on the surface of the disk.(Fig 3.2)
• Each track is divided into sectors.
• A sector is the smallest addressable portion of a disk.
• Disk drives have a number of platters.
• The tracks directly above one another form a cylinder(Fig 3.3)
• All information on a single cylinder can be accessed without moving the arm that holds
the read/write heads.
• Moving this arm is called seeking.

DEPT OF ISE 15
FILE STRUCTURES 18IS61

3.1.2 Estimating Capacities and space needs


In a disk, each platter has 2 surfaces, so number of cylinders is same as number of tracks
on a single surface.
Since a cylinder consists of a group of tracks, a track consists of a group of sectors, a
sector has a group of bytes, track,cylinder and drive capacities can be computed as follows

Given a disk with following characteristics

3.1.3 Organizing Tracks by Sector


Two ways to organize data on disk: by sector and by user defined block.

DEPT OF ISE 16
FILE STRUCTURES 18IS61

The physical placement of sectors


Different views of sectors on a track:
• Sectors that are adjacent, fixed size segments of a track that happen to hold a file(Fig
3.4a). When you want to read a series of sectors that are all in the same track, one right
after the other, you often cannot adjacent sectors. In Fig 3.4a, it takes thirty-two
revolutions to read the entire 32 sectors of a track.
• Interleaving sectors: leaving an interval of several physical sectors between logically
adjacent sectors. Fig 3.4(b) illustrates the assignment of logical sector content to the
thirty-two physical sectors in a track with interleaving factor of 5. In Fig 3.4b, It takes
five revolutions to read the entire 32 sectors of a track.

cluster
A group of sectors handled as a unit of file allocation. A cluster is a fixed number of
contiguous sectors
extent
A physical section of a file occupying adjacent clusters.
fragmentation
Unused space within a file.
• Clusters are also referred to as allocation units (ALUs).
• Space is allocated to files as integral numbers of clusters.
• A file can have a single extent, or be scattered in several extents.

DEPT OF ISE 17
FILE STRUCTURES 18IS61

• Access time for a file increases as the number of separate extents increases , because
of seeking.
• Defragmentation utilities physically move files on a disk so that each file has a
single extent.
• Allocation of space in clusters produces fragmentation .
• A file of one byte is allocated the space of one cluster.
• On average, fragmentation is one-half cluster per file.
3.1.5 Organizing Tracks by Block
• Mainframe computers typically use variable size physical blocks for disk drives.
• Track capacity is dependent on block size, due to fixed overhead (gap and address
block) per block.
3.1.6 The Cost of a Disk Access
direct access device
A data storage device which supports direct access.
direct access
Accessing data from a file by record position with the file, without accessing intervening
records.
access time
The total time required to store or retrieve data.
transfer time
The time required to transfer the data from a sector, once the transfer has begun.
seek time
The time required for the head of a disk drive to be positioned to a designated cylinder.
rotational delay
The time required for a designated sector to rotate to the head of a disk drive.
• Access time of a disk is related to physical movement of the disk parts.
• Disk access time has three components: seek time, rotational delay, and transfer
time.
• Seek time is affected by the size of the drive, the number of cylinders in the drive,
and the mechanical responsiveness of the access arm.
• Average seek time is approximately the time to move across 1/3 of the cylinders.
• Rotational delay is also referred to as latency.
• Rotational delay is inversely proportional to the rotational speed of the drive.
• Average rotational delay is the time for the disk to rotate 180°.
• Transfer is inversely proportional to the rotational speed of the drive.
• Transfer time is inversely proportional to the physical length of a sector.
• Transfer time is roughly inversely proportional to the number of sectors per track.

DEPT OF ISE 18
FILE STRUCTURES 18IS61

• Actual transfer time may be limited by the disk interface.


3.1.8 Effect of Block Size
• Fragmentation waste increases as cluster size increases.
• Average access time decreases as cluster size increases.
3.1.9 Disk as a bottleneck
striping
The distribution of single files to two or more physical disk drives.
Redundant Array of Inexpensive Disks
An array of multiple disk drives which appears as a single drive to the system.
RAM disk
A virtual disk drive which actually exists in main memory.
solid state disk
A solid state memory array with an interface which responds as a disk drive.
cache
Solid state memory used to buffer and store data temporarily.
• Several techniques have been developed to improve disk access time.
• Striping allows disk transfers to be made in parallel.
• There are 6 versions, or levels, of RAID technology.
• RAID-0 uses striping.
• RAID-0 improves access time, but does not provide redundancy.
• RAID-1 uses mirroring, in which two drives are written with the same data.
• RAID-1 provides complete redundancy. If one drive fails, the other provides data
backup.
• RAID-1 improves read access time, but slows write access time.
• RAM disks appear to programs as fast disk drives.
• RAM disks are volatile.
• Solid state disks appear to computer systems as fast disk drives.
• Solid state disks are used on high performance data base systems.
• Caching improves average access time.
• Disk caching can occur at three levels: in the computer main memory, in the disk
controller, and in the disk drive.
• Windows operating systems use main memory caching.
• Disk controller caching requires special hardware.
• Most disk drives now contain caching memory.
• With caching, writes are typically reported as complete when the data is in the
cache.

DEPT OF ISE 19
FILE STRUCTURES 18IS61

• The physical write is delayed until later.


• With caching, reads typically read more data than is requested, storing the
unrequested data in the cache.
• If a read can be satisfied from data already in the cache, no additional physical read
is needed.
• Read caching works on average because of program locality.

3.2 Magnetic Tape


sequential access device
A device which supports sequential access.

3.3 Disk vs Tape


Tape-based data backup infrastructures have inherent weaknesses: Tape is not a random
access medium. Backed up data must be accessed as it was written to tape. Recovering a single
file from a tape often requires reading a substantial portion of the tape and can be very time
consuming. The recovery time of restoring from tape can be very costly. Recent studies have
shown most IT administrators do not feel comfortable with their tape backups today.
The Solution
Disk-to-disk backup can help by complimenting tape backup. Within the data center, data
loss is most likely to occur as a result of file corruption or inadvertent deletion. In these
scenarios, disk-to-disk backup allows a much faster and far more reliable restore process than is
possible with a tape device, greatly reducing the demands on the tape infrastructure and on the
manpower required to maintain it. Disk-to-disk backup is quickly becoming the standard for
backup since data can be backed up more quickly than with tape, and restore times are
dramatically reduced.

3.4 Introduction to CD-ROM


A CD-ROM, an acronym of "Compact Disc Read-only memory") is a pre-pressed
compact disc that contains data accessible to, but not writable by, a computer for data storage
and music playback.
CD-ROMs are popularly used to distribute computer software, including video games
and multimedia applications, though any data can be stored (up to the capacity limit of a disc).
Some CDs hold both computer data and audio with the latter capable of being played on a CD
player, while data (such as software or digital video) is only usable on a computer (such as ISO
9660 format PC CD-ROMs). These are called enhanced CDs.
• A single disc can hold more than 600 megabytes of data ( 400 books of the textbook' s
size)

DEPT OF ISE 20
FILE STRUCTURES 18IS61

• CD-ROM is read only. i.e., it is a publishing medium rather than a data storage and
retrieval like magnetic disks.

3.5 Physical Organization of CD-ROM


Tracks and sectors
• There is only one track, a long spiral, much like the groove in a vinyl LP.
• The track is divided up into 2353 byte sectors
• Each sector contains 2048 bytes of user data and 305 bytes of non-data overhead
• Since there is only 1 track, sectors are not addressed as they are on disks.
Sectors and the audio ancestors
• 75 sectors create 1 second of audio. 60 seconds of audio create 1 minute of audio.
• We provide a sector address in this format: mm:ss:cc, mm is minutes, ss is seconds,
cc is sector #
• Humans are said to be capable of hearing sounds approximately in the range form
20Hz - 20KHz.
• You need to sample a signal at twice the rate at which you want to produce it.
• Therefore if we want to reproduce sound, we must sample at approximately twice
this frequency -- about 40KHz.
• Specifically the designers of audio CD's sampled at 44. lKHz
• Each sample on an audio CD is two bytes allowing for 65,536 distinct audio levels.
• For stereo sound, we sample twice each time -- left and right channels
• The implication is that we must store (2 x 2 x 44,200) bytes per second of audio.
(This is 176, 000 bytes per second).
• Divide this 176,000 bytes by the 75 sectors per second, and discover the sector
capacity of 2,352 bytes
Sector format on data disks
• 12 bytes synch
• 4 bytes sector ID
• 2,048 bytes user data
• 4 bytes error detection
• 8 bytes null
• 276 bytes error correction
Data is stored on the disc as a series of microscopic indentations. A laser is shone onto
the reflective surface of the disc to read the pattern of pits and lands ("pits", with the gaps
between them referred to as "lands"). Because the depth of the pits is approximately one-quarter
to one-sixth of the wavelength of the laser light used to read the disc, the reflected beam's phase
is shifted in relation to the incoming beam, causing destructive interference and reducing the

DEPT OF ISE 21
FILE STRUCTURES 18IS61

reflected beam's intensity. This pattern of changing intensity of the reflected beam is converted
into binary data.

3.6 Strengths and Weaknesses


• CD-ROM Strengths: High storage capacity , inexpensive price, durability.
• CD-ROM Weaknesses: extremely slow seek performance (between 1/2 a second to a
second)==> Intelligent File Structures are critical.

3.7 Storage as a Hierarchy

3.8 A Journey of a Byte

DEPT OF ISE 22
FILE STRUCTURES 18IS61

3.9 Buffer Management


Buffer Bottlenecks
• Assume that the system has a single buffer and is performing both input and output on
one character at a time, alternatively.
• In this case, the sector containing the character to be read is constantly over-written by
the sector containing the spot where the character will be written, and vice-versa.
• In such a case, the system needs more than 1 buffer: at least, one for input and the other
one for output.
• Moving data to or from disk is very slow and programs may become 1/0 Bound ==> Find
better strategies to avoid this problem.
Buffering strategies
Multiple buffering

DEPT OF ISE 23
FILE STRUCTURES 18IS61

-Double Buffering
-Buffer Pooling
Move mode and Locate mode
Scatter/Gather 1/0

3.10 1/0 in UNIX


block device
A device which transfers data in blocks (as opposed to character by character.)
block 1/0
Input or output performed in blocks
character device
A device which transfers data character by character (as opposed to in blocks.)
character 1/0
Input or output performed character by character.
• Disks are block devices.
• Keyboards, displays, and terminals are character devices.

DEPT OF ISE 24
FILE STRUCTURES 18IS61
CHAPTER 4: FUNDAMENTAL FILE STRUCTURE CONCEPTS
4.1 Field and Record Organization
4.1.1 A Stream File
• In the Windows, DOS, UNIX, and LINUX operating systems, files are not internally
structured; they are streams of individual bytes.

• The only file structure recognized by these operating systems is the separation of a
text file into lines.
o For Windows and DOS, two characters are used between lines, a carriage
return (ASCII 13) and a line feed (ASCII 1O);
o For UNIX and LINUX, one character is used between lines, a line feed
(ASCII 10);
• The code in applications programs can, however , impose internal organization on
stream files.
Record Structures
record
A subdivision of a file, containing data related to a single entity.
field
A subdivision of a record containing a single attribute of the entity which the record
describes.
stream of bytes
A file which is regarded as being without structure beyond separation into a sequential set
of bytes.

4.2 Using Classes to Manipulate Buffers


• Within a program, data is temporarily stored in variables.
• Individual values can be aggregated into structures, which can be treated as a single
variable with parts.
• In C++, classes are typically used as as an aggregate structure.
• C++ Person class (version 0.1):
class Person {
public:
char FirstName [11];
char LastName[l 1];

DEPT OF ISE 25
FILE STRUCTURES 18IS61

char Address [21];


char City [21];
char State [3];
char ZIP [5];
};
• With this class declaration, variables can be declared to be of type Person. The
individual fields within a Person can be referred to as the name of the variable and
the name of the field, separated by a period(.).
• C++ Program:
#include
class Person {
public:
char FirstName [11];
char LastName[l 1];
char Address [31];
char City [21];
char State [3];
char ZIP [5];
};

void Display (Person);


int main() {
Person Clerk;
Person Customer;
strcpy (Clerk.FirstName, "Fred");
strcpy (Clerk.LastName , "Flintstone") ;
strcpy (Clerk.Address, "4444 Granite Place");
strcpy (Clerk.City, "Rockville");
strcpy (Clerk.State , "MD");
strcpy (Clerk.ZIP, "00001");
strcpy (Custome r.FirstName, "Lily");
strcpy (Custome r.LastName, "Munster");
strcpy (Customer.Address, "1313 Mockingbird Lane");
strcpy (Custome r.City, "Hollywood");
strcpy (Custome r.State, "CA");
strcpy (Customer.ZIP, "90210");

DEPT OF ISE 26
FILE STRUCTURES 18IS61

Display (Clerk);
Display (Customer);
}
void Display (Person Someone) {
cout << Someone.FirstName << Someone.LastName
<< Someone.Address<< Someone.City
<< Someone.State << Someone.ZIP;
}
• In memory, each Person will appear as an aggregate , with the individual values
being parts of the aggregate

• The output of this program will be:


FredFlintstone4444 Granite PlaceRockvilleMD00001LilyMunster1313
Mockingbird LaneHollywoodCA90210
• Obviously, this output could be improved. It is marginally readable by people, and
it would be difficult to program a computer to read and correctly interpret this
output.

4.3 Using Inheritance for Record Buffer Classes


• Inheritance
The implicit inclusion of members of a parent class in a child class .
Delineation of Records in a File
fixed length record
A record which is predetermined to be the same length as the other records in the file.

• The file is divided into records of equal size.


• All records within a file have the same size.
• Different files can have different length records.
• Programs which access the file must know the record length.
• Offset, or position, of the nth record of a file can be calculated.

DEPT OF ISE 27
FILE STRUCTURES 18IS61

• There is no external overhead for record separation.


• There may be internal fragmentation (unused space within records.)
• There will be no external fragmentation (unused space outside of records) except for
deleted records.
• Individual records can always be updated in place.
• Example (80 byte records):
• 0 66 69 72 73 74 20 6C 69 6E 65 0 0 1 0 0 0 first line......
• 10 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................
• 20 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.<.B.x...
• 30 CD E3 40 0 3C BA 42 0 8 BB 42 0 E4 FB 12 0 ..@.<.B...B.....
• 40 3C 18 41 0 C4 FB 12 0 2 0 0 0 FC 3A 7C 0 <.A............ I.
• 50 73 65 63 6F 6E 64 20 6C 69 6E 65 0 1 0 0 0 second line.....
• 60 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................
• 70 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.<.B.x...
• 80 CD E3 40 0 3C BA 42 0 8 BB 42 0 E4 FB 12 0 ..@.<.B...B.....
• 90 3C 18 41 0 C4 FB 12 0 2 0 0 0 FC 3A 7C 0 <.A ............ I.
• Advantage: the offset of each record can be calculated from its record number. This
makes direct access possible.
• Advantage: there is no space overhead.
• Disadvantage: there will probably be internal fragmentation (unusable space within
records.)
Delimited Variable Length Records
variable length record
A record which can differ in length from the other records of the file.
delimited record
A variable length record which is terminated by a special character or sequence of
characters.
delimiter
A special character or group of characters stored after a field or record, which indicates
the end of the preceding unit.

• The records within a file are followed by a delimiting byte or series of bytes.
• The delimiter cannot occur within the records.
• Records within a file can have different sizes.

DEPT OF ISE 28
FILE STRUCTURES 18IS61

• Different files can have different length records.


• Programs which access the file must know the delimiter.
• Offset, or position, of the nth record of a file cannot be calculated.
• There is external overhead for record separation equal to the size of the delimiter per
record.
• There should be no internal fragmentation (unused space within records.)
• There may be no external fragmentation (unused space outside of records) after file
updating.
• Individual records cannot always be updated in place.
• Algorithms for Accessing Delimited Variable Length Records
• Code for Accessing Delimited Variable Length Records
• Code for Accessing Variable Length Line Records
• Example (Delimiter = ASCII 30 (IE) = RS character:
• 0 66 69 72 73 74 20 6C 69 6E 65 lE 73 65 63 6F 6E first line.secon
• 10 64 20 6C 69 6E 65 lE d line.
• Example (Delimiter= '\n'):
• 0 46 69 72 73 74 20 28 31 73 74 29 20 4C 69 6E 65 First (1st) Line
• 10 D A 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 6C ..Second (2nd) 1
• 20 69 6E 65 D A me..
• Disadvantage: the offset of each record cannot be calculated from its record
number. This makes direct access impossible .
• Advantage: there is space overhead for the length prefix.
• Advantage: there will probably be no internal fragmentation (unusable space within
records.)
Length Prefixed Variable Length Records

• The records within a file are prefixed by a length byte or bytes.


• Records within a file can have different sizes.
• Different files can have different length records.
• Programs which access the file must know the size and format of the length prefix.
• Offset, or position, of the nth record of a file cannot be calculated.
• There is external overhead for record separation equal to the size of the length prefix
per record.
• There should be no internal fragmentation (unused space within records.)

DEPT OF ISE 29
FILE STRUCTURES 18IS61

• There may be no external fragmentation (unused space outside of records) after file
updating.
• Individual records cannot always be updated in place.
• Algorithms for Accessing Prefixed Variable Length Records
• Code for Accessing Prefixed Variable Length Records
• Example:
• 0 A O 46 69 72 73 74 20 4C 69 6E 65 B O 53 65 ..First Line..Se
• 10 63 6F 6E 64 20 4C 69 6E 65 1F O 54 68 69 72 64 cond Line..Third
• 20 20 4C 69 6E 65 20 77 69 74 68 20 6D 6F 72 65 20 Line with more
• 30 63 68 61 72 61 63 74 65 72 73 characters
• Disadvantage: the offset of each record can be calculated from its record number.
This makes direct access possible.
• Disadvantage: there is space overhead for the delimiter suffix.
• Advantage: there will probably be no internal fragmentation (unusable space within
records.)
Indexed Variable Length Records

• An auxiliary file can be used to point to the beginning of each record.


• In this case, the data records can be contiguous.
• If the records are contiguous, the only access is through the index file.
• Code for Accessing Indexed VariableLength Records
• Example:
Index File:
• 0 12 0 0 0 25 0 0 0 47 0 0 0 ....%... G...

Data File:
• 0 46 69 72 73 74 20 28 31 73 74 29 20 53 74 72 69 First (1st) Stri
• 10 6E 67 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 53 ngSecond (2nd) S
• 20 74 72 69 6E 67 54 68 69 72 64 20 28 33 72 64 29 tringThird (3rd)
• 30 20 53 74 72 69 6E 67 20 77 68 69 63 68 20 69 73 String which is
• 40 20 6C 6F 6E 67 65 72 longer

DEPT OF ISE 30
FILE STRUCTURES 18IS61
• Advantage: the offset of each record is be contained in the index, and can be looked
up from its record number. This makes direct access possible.
• Disadvantage: there is space overhead for the index file.
• Disadvantage: there is time overhead for the index file.
• Advantage: there will probably be no internal fragmentation (unusable space within
records.)
• The time overhead for accessing the index file can be minimized by reading the
entire index file into memory when the files are opened.
Fixed Field Count Records
• Records can be recognized if they always contain the same (predetermined) number
of fields.
Delineation of Fields in a Record
Fixed Length Fields

]
• Each record is divided into fields of correspondingly equal size.
• Different fields within a record have different sizes.
• Different records can have different length fields.
• Programs which access the record must know the field lengths.
• There is no external overhead for field separation.
• There may be internal fragmentation (unused space within fields.)
Delimited Variable Length Fields

• The fields within a record are followed by a delimiting byte or series of bytes.
• Fields within a record can have different sizes.
• Different records can have different length fields.
• Programs which access the record must know the delimiter.
• The delimiter cannot occur within the data.
• If used with delimited records, the field delimiter must be different from the record
delimiter.
• There is external overhead for field separation equal to the size of the delimiter per
field.
• There should be no internal fragmentation (unused space within fields.)
Length Prefixed Variable Length Fields

DEPT OF ISE 31
FILE STRUCTURES 10IS63

The fields within a record are prefixed by a length byte or bytes.

• Fields within a record can have different sizes.


• Different records can have different length fields.
• Programs which access the record must know the size and format of the length
prefix.
• There is external overhead for field separation equal to the size of the length prefix
per field.
• There should be no internal fragmentation (unused space within fields.)
Representing Record or Field Length
• Record or field length can be represented in either binary or character form.
• The length can be considered as another hidden field within the record.
• This length field can be either fixed length or delimited.
• When character form is used, a space can be used to delimit the length field.
• A two byte fixed length field could be used to hold lengths of O to 65535 bytes in
binary form.
• A two byte fixed length field could be used to hold lengths of O to 99 bytes in
decimal character form.
• A variable length field delimited by a space could be used to hold effectively any
length.
• In some languages, such as strict Pascal, it is difficult to mix binary values and
character values in the same file.
• The C++ language is flexible enough so that the use of either binary or character
format is easy.
Tagged Fields
• Tags, in the form "Keyword=Value", can be used in fields.
• Use of tags does not in itself allow separation of fields, which must be done with
another method.
• Use of tags adds significant space overhead to the file.
• Use of tags does add flexibility to the file structure.
• Fields can be added without affecting the basic structure of the file.
• Tags can be useful when records have sparse fields - that is, when a significant
number of the possible attributes are absent.
Byte Order
• The byte order of integers (and floating point numbers) is not the same on all
computers.
DEPT OF ISE 32
FILE STRUCTURES 10IS63
• This is hardware dependent (CPU), not software dependent.
• Many computers store numbers as might be expected: 4010 = 2816 is stored in a four
byte integer as 00 00 00 28.
• PCs reverse the byte order, and store numbers with the least significant byte first:
4010 = 2816 is stored in a four byte integer as 28 00 00 00.
• On most computers, the number 40 would be stored in character form in its ASCII
values: 34 30.
• IBM mainframe computers use EBCDIC instead of ASCII, and would store "40" as
F4 F0.

CHAPTER 5: MANAGING FILES OF RECORDS


5.1 Record Access
5.1.1 Record Keys
key
A value which is contained within or associated with a record and which can be used to
identify the record.
canonical form
A standard form for a key into which a nonstandard form of the key can be transformed
algorithmically, for comparison purposes.
primary key
A key which uniquely identifies the records within a file.
secondary key
A search key other than the primary key.
• Search keys should be converted to canonical form before being used for a search.
• Examples: All upper case, all lower case, no dashes in phone number or SSN, etc.
• Primary keys should uniquely identify a single record.
• Ideally, the primary key should identify the entity to which a record corresponds, but
not be an attribute of the identity. That is, a primary key should be dataless.
• Secondary keys are typically not unique.
• Algorithms and structures for handling secondary keys should not assume
uniqueness.
5.1.2 A Sequential Search
sequential access
Accessing data from a file whose records are organized on the basis of their
successive physical positions.
sequential search

DEPT OF ISE 33
FILE STRUCTURES 10IS63

A search which reads each record sequentially from the beginning until the record
or records being sought are found.
• A sequential search is O(n); that is, the search time is proportional to the number of
items being searched.
• For a file of 1000 records and unique random search keys, an average of 500 records
must be read to find the desired record.
• For an unsuccessful search, the entire file must be examined.
• Sequential is unsatisfactory for most file searches.
• Sequential search is satisfactory for certain special cases:
o Sequential search is satisfactory for small files.
o Sequential search is satisfactory for files that are searched only infrequently.
o Sequential search is satisfactory when a high percentage of the records in a
file will match.
o Sequential search is required for unstructured text files.
5.1.3 Unix Tools for Sequential Processing
• Unix style tools for MS-DOS are available from the KU chat BBS website.
• Linux style tools for MS-DOS are available from the Cygwin website.
• The cat (concatenate) utility can be used to copy files to standard output.
• The cat (concatenate) utility can be used to combine (concatenate) two or more files
into one.
• The grep (general regular expression) prints lines matching a pattern.
• The grep manual (man ) is available on line (Shadow Island.)
• The wc (word count) utility counts characters, words, and lines in a file.
• wc and other utilities are also available from National Taiwan University.
5.1.4 Direct Access
Accessing data from a file by record position with the file, without accessing
intervening records.
relative record number
An ordinal number indicating the position of a record within a file.
• Direct access allows individual records to be read from different locations in the file
without reading intervening records.

5.2 More about Record Structures


5.2.1 Choosing a Record Structure and Record Length
5.2.2 Header Records
header record: A record placed at the beginning of a file which contains information about the
organization or the data of the file.

DEPT OF ISE 34
FILE STRUCTURES 10IS63

self-describing file
A file which contains metadata describing the data within the file and its organization.

• Header records can be used to make a file self describing.


• 0 10 0 56 2 0 44 1C 0 0 0 0 0 0 0 0 0 ..V..D..........
• 10 2C 0 54E616E6379 54A6F6E6573 D31 ,..Nancy.Jones.I
• 20 32 33 20 45 6C 6D 20 50 6C 61 63 65 8 4C 61 6E 23 Elm Place.Lan
• 30 67 73 74 6F 6E 2 4F 4B 5 37 32 30 33 32 34 0 gston.OK.720324.
• 40 6 48 65 72 6D 61 6E 7 4D 75 6E 73 74 65 72 15 .Herman.Munster.
• 50 31 33 31 33 20 4D 6F 63 6B 69 6E 67 62 69 72 64 1313 Mockingbird
• 60 20 4C 61 6E 65 5 54 75 6C 73 61 2 4F 4B 5 37 Lane.Tulsa.OK.?
• 70 34 31 31 34 34 0 5 55 68 75 72 61 5 53 6D 69 41144..Uhura.Smi
• 80 74 68 13 31 32 33 20 54 65 6C 65 76 69 73 69 6F th.123 Televisio
• 90 6E 20 4C 61 6E 65 A 45 6E 74 65 72 70 72 69 73 n Lane.Enterpris
• AO 65 2 43 41 5 39 30 32 31 30 e.CA.90210
• The above dump represents a file with a 16 byte (10 00) header, Variable length
records with a 2 byte length prefix, and fields delimited by ASCII code 28.
5.2.3 Adding Header Records to C++ Buffer Classes
file organization method
The arrangement and differentiation of fields and records within a file.
file-access method
The approach used to locate information in a file.
• File organization is static.
• Design decisions such as record format (fixed, variable, etc.) and field format (fixed,
variable, etc.) determine file organization.
• File access is dynamic.
• File access methods include sequentail and direct.
• File organization and file access are not functionally independent; for example,
some file organizations make direct access impractical.

5.5 Beyond Record Structures


5.5.1 Abstract Data Models for File Access
5.3.2 Headers and Self-Describing Files
header record
A record placed at the beginning of a file which contains information about the
organization or the data of the file.

DEPT OF ISE 35
FILE STRUCTURES 10IS63

self describing file


A file which contains metadata describing the data within the file and its
organization.

• Files often begin with headers, which describe the data m the file, and the
organization of the file.

• The header can contain such information as:


o The record format (fixed, prefixed, delimited, etc.)
o The field format (fixed, prefixed, delimited, etc.)
o The names of each field.
o The type of data in each field.
o The size of each field.
o Etc.
• Header records can be used to make a file self describing.

• 0 10 0 56 2 0 44 lC O O O O O O O O O .. V .. D ..........
• 10 2C 0 5 4E 61 6E 63 79 5 4A 6F 6E 65 73 D 31 ,..Nancy.Jones.I
• 20 32 33 20 45 6C 6D 20 50 6C 61 63 65 8 4C 61 6E 23 Elm Place.Lan
• 30 67 73 74 6F 6E 2 4F 4B 5 37 32 30 33 32 34 0 gston.OK.720324.
• 40 6 48 65 72 6D 61 6E 7 4D 75 6E 73 74 65 72 15 .Herman.Munster.
• 50 31 33 31 33 20 4D 6F 63 6B 69 6E 67 62 69 72 64 1313 Mockingbird
• 60 20 4C 61 6E 65 5 54 75 6C 73 61 2 4F 4B 5 37 Lane.Tulsa .OK.7
• 70 34 31 31 34 34 0 5 55 68 75 72 61 5 53 6D 69 41144..Uhura.Smi
• 80 74 68 13 31 32 33 20 54 65 6C 65 76 69 73 69 6F th.123 Televisio
• 90 6E 20 4C 61 6E 65 A 45 6E 74 65 72 70 72 69 73 n Lane.Enterpris
AO 65 2 43 41 5 39 30 32 31 30 e.CA.90210
• The above dump represents a file with a 16 byte (10 00) header, Variable length
records with a 2 byte length prefix, and fields delimited by ASCII code 28 (1C16) .
The actual data begins at byte 16 (1016) .
5.3.3 Metadata
metadata
Data which describes the data in a file or table.
5.3.4 Mixing Object Types in One File
5.3.5 Representation Independent File Access

DEPT OF ISE 36
FILE STRUCTURES 10IS63

5.3.6 Extensibility
extensibility
Having the ability to be extended (e.g., by adding new fields) without redesign.

5.4 Portability and Standardization


portability
The ability to be easily accessed by different systems and applications.
5.4.1 Factors Affecting Portability
5.4.2 Achieving Portability
File Access
• File organization is static.
• File access is dynamic.
Sequential Access

sequential access
Access of data in order.
Accessing data from a file whose records are organized on the basis of their
successive physical positions.

• Sequential access processes a file from its beginning.

• All operating systems support sequential access of files.


• Sequential access is the fastest way to read or write all of the records in a file.
• Sequential access is slow when reading a singlt random record, since all the
preceeding records must be read.
Direct Access

direct access
Access of data in arbitrary order, with variable access time.
Accessing data from a file by record position with the file, without accessmg
intervening records.

DEPT OF ISE 37
FILE STRUCTURES 10IS63

relative record number


An ordinal number indicating the position of a record within a file.

• Direct access processes single records at a time by position in the file.

• Mainfreme and midrange operating systems support direct access of files.


• The Windows, DOS, UNIX, and Linux operating systems do not natively support
direct access of files.
• When using Windows , DOS, UNIX, and Linux operating systems applications must
be programmed to use direct access.
• Direct access is slower than sequential when reading or writing all of the records in
a file.
• Direct access is fast when reading a singlt random record , since the preceeding
records are ignored.
• Direct access allows individual records to be read from different locations in the file
without reading intervening records.
• When files are organized with fixed length records, the location of a record in a file
can be calculated from its relative record number , and the file can be accessed using
the seek functions.

• ByteOffset = (RRN - 1) x RecLen


• When files have variable length records supported by an index, the records can be
accessed directly through the index, with the use of the seek function.

DEPT OF ISE 38
FILE STRUCTURES 10IS63

• For direct access to be useful, the relative record number of the record of interest
must be known.
• Direct access is often used to support keyed access.
Keyed Access

keyed access
Accessing data from a file by an alphanumeric key associated with each record.
key
A value which is contained within or associated with a record and which can be used
to identify the record.

• Keyed access processes single records at a time by record key.

• Mainfreme and midrange operating systems support keyed access of files.


• The Windows, DOS, UNIX, and Linux operating systems do not natively support
keyed access of files.
• When using Windows , DOS, UNIX, and Linux operating systems applications must
be programmed to use keyed access.
• Keyed access will be covered in more detail in later chapters.
metadata
Data which describes the data in a file or table.

DEPT OF ISE 39

You might also like