18IS61 FSmodule1 Notes
18IS61 FSmodule1 Notes
TABLE OF CONTENTS
Module 1- Chapter 1, 2,3,4,5
INTRODUCTION: FILE STRUCTURES, FUNDAMENTAL FILE OPERATIONS, SECONDARY
STORAGE AND SYSTEM SOFTWARE, FUNDAMENTAL FILE STRUCTURE CONCEPTS,
MANAGING FILES OF RECORDS
Chapter1: INTRODUCTION
1.1 The Heart of the file structure Design
1.2 A Short History of File Structure Design
1.3 A Conceptual Toolkit
2.1 Physical Files and Logical Files
2.2 Opening Files
2.3 Closing Files
2.4 Reading and Writing
2.5 Seeking
2.6 Special Characters in Files
2.7 The Unix Directory Structure
2.8 Physical devices and Logical Files
2.9 File-related Header Files
2.10 UNIX file System Commands
3.1 Disks
3.2 Magnetic Tape
3.3 Disk versus Tape
3.4 Introduction to CD-ROM
3.5 Physical Organization of CD-ROM
3.6 CD-ROM Strengths and Weaknesses
3.7 Storage as Hierarchy
3.8 A journey of a Byte
3.9 Buffer Management
3.10 Input /Output in UNIX.
4.1 Field and Record Organization
4.2 Using Classes to Manipulate Buffers
4.3 Using Inheritance for Record Buffer Classes
4.4 Managing Fixed Length, Fixed Field Buffers
4.5 An Object-Oriented Class for Record Files
5.1 Record Access
5.2 More about Record Structures
5.3 Encapsulating Record Operations in a Single Class
5.4 File Access and File Organization.
DEPT OF ISE
FILE STRUCTURES 18IS61
DEPT OF ISE 1
FILE STRUCTURES 18IS61
• Even with a balanced binary tree, dozens of accesses were required to find a record
in moderate-sized files.
• A method was needed to keep a tree balanced when each node of the tree was not a
single record, as in a binary tree, but a file block containing hundreds of records.
Hence, B-Trees were introduced.
• AVL trees grow from top down as records are added, B-Trees grow from the bottom
up.
• B-Trees provided excellent access performance but, a file could not be accessed
sequentially with efficiency.
• The above problem was solved using B+ tree which is a combination of a B-Tree and
a sequential linked list added at the bottom level of the B-Tree.
• To further reduce the number of disk accesses, hashing was introduced for files that
do not change size greatly over time.
• Extendible, dynamic hashing was introduced for volatile, dynamic files which
change.
DEPT OF ISE 2
FILE STRUCTURES 18IS61
DEPT OF ISE 3
FILE STRUCTURES 18IS61
Example:
int Input;
Input= open ("Daily.txt", O_RDONL Y);
The following flags can be bitwise ared together for the access mode:
O_RDONLY : Read only
0 WRONLY: Write only
0 RDWR Read or write
0 CREAT Create file if it does not exist
0 EXCL If the file exists , truncate it to a length of zero, destroying its
contents. (used only with O_CREAT)
0 APPEND Append every write operation to the end of the file
0 TRUNC Delete any prior file contents
Pmode- protection mode
The security status of a file, defining who is allowed to access a file, and which access
modes are allowed.
DEPT OF ISE 4
FILE STRUCTURES 18IS61
Example:
DEPT OF ISE 5
FILE STRUCTURES 18IS61
• For write, one or more values (as variables or constants) must be supplied to the
write function, to provide the data for the file.
• For unformatted transfers, the amount of data to be transferred must also be
supplied.
2.4.1 Read and Write Functions
Reading
• The C++ read function is used to read data from a file for handle level access.
• The read function must be supplied with (as an arguments):
o The source file to read from
o The address of the memory block into which the data will be stored
o The number of bytes to be read(byte count)
• The value returned by the read function is the number of bytes read.
Read function:
Prototypes:
int read (int Handle, void* Buffer, unsigned Length);
Example:
read (Input, &C, 1);
Writing
• The C++ write function is used to write data to a file for handle level access.
• The handle write function must be supplied with (as an arguments):
o The logical file name used for sending data
o The address of the memory block from which the data will be written
o The number of bytes to be write
• The value returned by the write function is the number of bytes written.
Write function:
Prototypes:
int write (int Handle, void* Buffer, unsigned Length);
Example:
write (Output, &C, 1);
2.4.2 Files with C Streams and C++ Stream Classes
• For FILE level access, the logical file is declared as a pointer to a FILE (FILE*)
• The FILE structure is defined in the stdio.h header file.
Opening
DEPT OF ISE 6
FILE STRUCTURES 18IS61
The C++ [open function is used to open a file for FILE level access.
• The FILE fopen function must be supplied with (as arguments):
o The name of the physical file
o The access mode
• The value returned by the fopen is a pointer to an open FILE, and is assigned to the
file variable.
fopen function:
Prototypes:
FILE* fopen (const char* Filename, char* Access);
Example:
FILE * Input;
Input= fopen ("Daily.txt", "r");
The access mode should be one of the following strings:
r
Open for reading (existing file only) in text mode
r+
Open for update (existing file only)
w
Open (or create) for writing (and delete any previous data)
w+
Open (or create) for update (and delete any previous data)
a
Open (or create) for append with file pointer at current EOF (and keep any previous
data) in text mode
a+
Open (or create) for append update (and keep any previous data)
Closing
The C++ {close function is used to close a file for FILE level access.
The FILE fclose function must be supplied with (as an argument):
o A pointer to the FILE structure of the logical file
The value returned by the fclose is 0 if the close succeeds, and &neq;0 if the close fails..
Prototypes:
int fclose (FILE * Stream);
Example:
fclose (Input);
DEPT OF ISE 7
FILE STRUCTURES 18IS61
Reading
The C++ /read function is used to read data from a file for FILE level access.
The FILE fread function must be supplied with (as an arguments):
o A pointer to the FILE structure of the logical file
o The address of the buffer into which the data will be read
o The number of items to be read
o The size of each item to be read, in bytes
The value returned by the fread function is the number of items read.
Prototypes:
size_t fread (void* Buffer, size_t Size, size_t Count, FILE* Stream);
Example:
fread (&C, 1, 1, Input);
Writing
The C++ [write function is used to write data to a file for FILE level access.
The FILE fwrite function must be supplied with (as an arguments):
o A pointer to the FILE structure of the logical file
o The address of the buffer from which the data will be written
o The number of items to be written
o The size of each item to be written, in bytes
The value returned by the fivrite function is the number of items written.
Prototypes:
size_t fwrite (void* Buffer, size_t Size, size_t Count, FILE* Stream);
Example:
fwrite (&C, 1, 1, Output);
2.4.3 Programs in C++ to Display the contents of a File
The first simple file processing program opens a file for input and reads it, character by
character, sending each character to the screen after it is read from the file. This program
includes the following steps
1. Display a prompt for the name of the input file.
2. Read the user's response from the keyboard into a variable called filename.
3. Open the file for input.
4. While there are still characters to be read from the input file,
• Read a character from the file;
• Write the character to the terminal screen.
5. Close the input file.
Figures 2.2 and 2.3 are C++ implementations of this program using C streams and C++ stream
classes, respectively.
DEPT OF ISE 8
FILE STRUCTURES 18IS61
In the C++ version, the call file.unsetf(ios::skipws) causes operator>> to include white
space (blanks, end-of-line,tabs, ans so on).
2.4.4 Detecting End of File
end-of-file
A physical location just beyond the last datum in a file.
DEPT OF ISE 9
FILE STRUCTURES 18IS61
2.5 Seeking
The action of moving directly to a certain position in a file is called seeking.
seek
To move to a specified location in a file.
byte offset
The distance , measured in bytes, from the beginning.
• Seeking moves an attribute in the file called the file pointer.
• C++ library functions allow seeking.
• In DOS, Windows, and UNIX, files are organized as streams of bytes, and locations
are in terms of byte count.
• Seeking can be specified from one of three reference points:
o The beginning of the file.
o The end of the file.
DEPT OF ISE 10
FILE STRUCTURES 18IS61
Example
The C++ (seek function is used to move the file pointer of a file identified by its FILE
structure.
The FILE fseek function must be supplied with (as an arguments):
o The file descriptor of the file(file)
o The number of bytes to move from some origin in the file(byte_offset)
o The starting point from which the byte_offset is to be taken(origin)
The Origin argument should be one of the following, to designate the reference point:
SEEK_SET: Beginning of file
SEEK_CUR: Current file position
SEEK END: End of file
The value returned(pos) by the fseek function is the positon of the read/write pointer from
the beginning of the file after its moved
Prototypes:
long fseek (FILE * file, long Offset, int Origin);
Example:
long pos;
fseek (FILE * file, long Offset, int Origin);
DEPT OF ISE 11
FILE STRUCTURES 18IS61
DEPT OF ISE 12
FILE STRUCTURES 18IS61
Example:
The output of the executable file is redirected to a file called " myfile"
pipe
Piping: using the output of one program as input to another program.
DEPT OF ISE 13
FILE STRUCTURES 18IS61
A connection between standard output of one process and standard input of a second
process.
• In both DOS and UNIX, the standard output of one program can be piped
(connected) to the standard input of another program with the I symbol.
• Example:
DEPT OF ISE 14
FILE STRUCTURES 18IS61
DEPT OF ISE 15
FILE STRUCTURES 18IS61
DEPT OF ISE 16
FILE STRUCTURES 18IS61
cluster
A group of sectors handled as a unit of file allocation. A cluster is a fixed number of
contiguous sectors
extent
A physical section of a file occupying adjacent clusters.
fragmentation
Unused space within a file.
• Clusters are also referred to as allocation units (ALUs).
• Space is allocated to files as integral numbers of clusters.
• A file can have a single extent, or be scattered in several extents.
DEPT OF ISE 17
FILE STRUCTURES 18IS61
• Access time for a file increases as the number of separate extents increases , because
of seeking.
• Defragmentation utilities physically move files on a disk so that each file has a
single extent.
• Allocation of space in clusters produces fragmentation .
• A file of one byte is allocated the space of one cluster.
• On average, fragmentation is one-half cluster per file.
3.1.5 Organizing Tracks by Block
• Mainframe computers typically use variable size physical blocks for disk drives.
• Track capacity is dependent on block size, due to fixed overhead (gap and address
block) per block.
3.1.6 The Cost of a Disk Access
direct access device
A data storage device which supports direct access.
direct access
Accessing data from a file by record position with the file, without accessing intervening
records.
access time
The total time required to store or retrieve data.
transfer time
The time required to transfer the data from a sector, once the transfer has begun.
seek time
The time required for the head of a disk drive to be positioned to a designated cylinder.
rotational delay
The time required for a designated sector to rotate to the head of a disk drive.
• Access time of a disk is related to physical movement of the disk parts.
• Disk access time has three components: seek time, rotational delay, and transfer
time.
• Seek time is affected by the size of the drive, the number of cylinders in the drive,
and the mechanical responsiveness of the access arm.
• Average seek time is approximately the time to move across 1/3 of the cylinders.
• Rotational delay is also referred to as latency.
• Rotational delay is inversely proportional to the rotational speed of the drive.
• Average rotational delay is the time for the disk to rotate 180°.
• Transfer is inversely proportional to the rotational speed of the drive.
• Transfer time is inversely proportional to the physical length of a sector.
• Transfer time is roughly inversely proportional to the number of sectors per track.
DEPT OF ISE 18
FILE STRUCTURES 18IS61
DEPT OF ISE 19
FILE STRUCTURES 18IS61
DEPT OF ISE 20
FILE STRUCTURES 18IS61
• CD-ROM is read only. i.e., it is a publishing medium rather than a data storage and
retrieval like magnetic disks.
DEPT OF ISE 21
FILE STRUCTURES 18IS61
reflected beam's intensity. This pattern of changing intensity of the reflected beam is converted
into binary data.
DEPT OF ISE 22
FILE STRUCTURES 18IS61
DEPT OF ISE 23
FILE STRUCTURES 18IS61
-Double Buffering
-Buffer Pooling
Move mode and Locate mode
Scatter/Gather 1/0
DEPT OF ISE 24
FILE STRUCTURES 18IS61
CHAPTER 4: FUNDAMENTAL FILE STRUCTURE CONCEPTS
4.1 Field and Record Organization
4.1.1 A Stream File
• In the Windows, DOS, UNIX, and LINUX operating systems, files are not internally
structured; they are streams of individual bytes.
• The only file structure recognized by these operating systems is the separation of a
text file into lines.
o For Windows and DOS, two characters are used between lines, a carriage
return (ASCII 13) and a line feed (ASCII 1O);
o For UNIX and LINUX, one character is used between lines, a line feed
(ASCII 10);
• The code in applications programs can, however , impose internal organization on
stream files.
Record Structures
record
A subdivision of a file, containing data related to a single entity.
field
A subdivision of a record containing a single attribute of the entity which the record
describes.
stream of bytes
A file which is regarded as being without structure beyond separation into a sequential set
of bytes.
DEPT OF ISE 25
FILE STRUCTURES 18IS61
DEPT OF ISE 26
FILE STRUCTURES 18IS61
Display (Clerk);
Display (Customer);
}
void Display (Person Someone) {
cout << Someone.FirstName << Someone.LastName
<< Someone.Address<< Someone.City
<< Someone.State << Someone.ZIP;
}
• In memory, each Person will appear as an aggregate , with the individual values
being parts of the aggregate
DEPT OF ISE 27
FILE STRUCTURES 18IS61
• The records within a file are followed by a delimiting byte or series of bytes.
• The delimiter cannot occur within the records.
• Records within a file can have different sizes.
DEPT OF ISE 28
FILE STRUCTURES 18IS61
DEPT OF ISE 29
FILE STRUCTURES 18IS61
• There may be no external fragmentation (unused space outside of records) after file
updating.
• Individual records cannot always be updated in place.
• Algorithms for Accessing Prefixed Variable Length Records
• Code for Accessing Prefixed Variable Length Records
• Example:
• 0 A O 46 69 72 73 74 20 4C 69 6E 65 B O 53 65 ..First Line..Se
• 10 63 6F 6E 64 20 4C 69 6E 65 1F O 54 68 69 72 64 cond Line..Third
• 20 20 4C 69 6E 65 20 77 69 74 68 20 6D 6F 72 65 20 Line with more
• 30 63 68 61 72 61 63 74 65 72 73 characters
• Disadvantage: the offset of each record can be calculated from its record number.
This makes direct access possible.
• Disadvantage: there is space overhead for the delimiter suffix.
• Advantage: there will probably be no internal fragmentation (unusable space within
records.)
Indexed Variable Length Records
DEPT OF ISE 30
FILE STRUCTURES 18IS61
• Advantage: the offset of each record is be contained in the index, and can be looked
up from its record number. This makes direct access possible.
• Disadvantage: there is space overhead for the index file.
• Disadvantage: there is time overhead for the index file.
• Advantage: there will probably be no internal fragmentation (unusable space within
records.)
• The time overhead for accessing the index file can be minimized by reading the
entire index file into memory when the files are opened.
Fixed Field Count Records
• Records can be recognized if they always contain the same (predetermined) number
of fields.
Delineation of Fields in a Record
Fixed Length Fields
]
• Each record is divided into fields of correspondingly equal size.
• Different fields within a record have different sizes.
• Different records can have different length fields.
• Programs which access the record must know the field lengths.
• There is no external overhead for field separation.
• There may be internal fragmentation (unused space within fields.)
Delimited Variable Length Fields
• The fields within a record are followed by a delimiting byte or series of bytes.
• Fields within a record can have different sizes.
• Different records can have different length fields.
• Programs which access the record must know the delimiter.
• The delimiter cannot occur within the data.
• If used with delimited records, the field delimiter must be different from the record
delimiter.
• There is external overhead for field separation equal to the size of the delimiter per
field.
• There should be no internal fragmentation (unused space within fields.)
Length Prefixed Variable Length Fields
DEPT OF ISE 31
FILE STRUCTURES 10IS63
DEPT OF ISE 33
FILE STRUCTURES 10IS63
A search which reads each record sequentially from the beginning until the record
or records being sought are found.
• A sequential search is O(n); that is, the search time is proportional to the number of
items being searched.
• For a file of 1000 records and unique random search keys, an average of 500 records
must be read to find the desired record.
• For an unsuccessful search, the entire file must be examined.
• Sequential is unsatisfactory for most file searches.
• Sequential search is satisfactory for certain special cases:
o Sequential search is satisfactory for small files.
o Sequential search is satisfactory for files that are searched only infrequently.
o Sequential search is satisfactory when a high percentage of the records in a
file will match.
o Sequential search is required for unstructured text files.
5.1.3 Unix Tools for Sequential Processing
• Unix style tools for MS-DOS are available from the KU chat BBS website.
• Linux style tools for MS-DOS are available from the Cygwin website.
• The cat (concatenate) utility can be used to copy files to standard output.
• The cat (concatenate) utility can be used to combine (concatenate) two or more files
into one.
• The grep (general regular expression) prints lines matching a pattern.
• The grep manual (man ) is available on line (Shadow Island.)
• The wc (word count) utility counts characters, words, and lines in a file.
• wc and other utilities are also available from National Taiwan University.
5.1.4 Direct Access
Accessing data from a file by record position with the file, without accessing
intervening records.
relative record number
An ordinal number indicating the position of a record within a file.
• Direct access allows individual records to be read from different locations in the file
without reading intervening records.
DEPT OF ISE 34
FILE STRUCTURES 10IS63
self-describing file
A file which contains metadata describing the data within the file and its organization.
DEPT OF ISE 35
FILE STRUCTURES 10IS63
• Files often begin with headers, which describe the data m the file, and the
organization of the file.
• 0 10 0 56 2 0 44 lC O O O O O O O O O .. V .. D ..........
• 10 2C 0 5 4E 61 6E 63 79 5 4A 6F 6E 65 73 D 31 ,..Nancy.Jones.I
• 20 32 33 20 45 6C 6D 20 50 6C 61 63 65 8 4C 61 6E 23 Elm Place.Lan
• 30 67 73 74 6F 6E 2 4F 4B 5 37 32 30 33 32 34 0 gston.OK.720324.
• 40 6 48 65 72 6D 61 6E 7 4D 75 6E 73 74 65 72 15 .Herman.Munster.
• 50 31 33 31 33 20 4D 6F 63 6B 69 6E 67 62 69 72 64 1313 Mockingbird
• 60 20 4C 61 6E 65 5 54 75 6C 73 61 2 4F 4B 5 37 Lane.Tulsa .OK.7
• 70 34 31 31 34 34 0 5 55 68 75 72 61 5 53 6D 69 41144..Uhura.Smi
• 80 74 68 13 31 32 33 20 54 65 6C 65 76 69 73 69 6F th.123 Televisio
• 90 6E 20 4C 61 6E 65 A 45 6E 74 65 72 70 72 69 73 n Lane.Enterpris
AO 65 2 43 41 5 39 30 32 31 30 e.CA.90210
• The above dump represents a file with a 16 byte (10 00) header, Variable length
records with a 2 byte length prefix, and fields delimited by ASCII code 28 (1C16) .
The actual data begins at byte 16 (1016) .
5.3.3 Metadata
metadata
Data which describes the data in a file or table.
5.3.4 Mixing Object Types in One File
5.3.5 Representation Independent File Access
DEPT OF ISE 36
FILE STRUCTURES 10IS63
5.3.6 Extensibility
extensibility
Having the ability to be extended (e.g., by adding new fields) without redesign.
sequential access
Access of data in order.
Accessing data from a file whose records are organized on the basis of their
successive physical positions.
direct access
Access of data in arbitrary order, with variable access time.
Accessing data from a file by record position with the file, without accessmg
intervening records.
DEPT OF ISE 37
FILE STRUCTURES 10IS63
DEPT OF ISE 38
FILE STRUCTURES 10IS63
• For direct access to be useful, the relative record number of the record of interest
must be known.
• Direct access is often used to support keyed access.
Keyed Access
keyed access
Accessing data from a file by an alphanumeric key associated with each record.
key
A value which is contained within or associated with a record and which can be used
to identify the record.
DEPT OF ISE 39