0% found this document useful (0 votes)

66 views151 pages

FS M1 Part1

The document discusses introducing file structures and their design, covering concepts like file operations, data structures for efficient file management, and programming skills for object-oriented languages to develop file management solutions. It also provides an overview of the topics that will be covered in the book, including file organization techniques, various indexing methods, sorting large files, hashing, and B-trees. The goal is to understand storage and processing of files using different operations and data structures.

Uploaded by

Chirag Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views151 pages

FS M1 Part1

Uploaded by

Chirag Srinivas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 151

To Provide a Solid Introduction to the Topic of File Structures

Design.
To Discuss a number of Advanced Data Structure Concepts
that are necessary for achieving high eﬃciency in File
Operations.
To Develop important programming skills in and Object-
Oriented Language such as C++ or Java.

2
File Structures, An Object-Oriented Approach
with C++
by
Michael J. Folk, Bill Zoellick
and Greg Riccardi

Publisher: Addison Wesley

3
CO1 To understand the concepts of storage, manipula ons, and processing of file using various
file opera ons
CO2 Apply various data structures to achieve improved file opera ons
CO3 Analyze various file indexing techniques to improve performance of file structures
CO4 Illustrate different file organiza on and storage management techniques
CO5 Design and develop solu ons for real- me file management problems

COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 2
CO2 3 2
CO3 3
CO4 3 2
CO5 3

4
Module1:
Introduc on: File Structures: The Heart of the file structure Design, A Short History of File
Structure Design, A Conceptual Toolkit; Fundamental File Opera ons: Physical Files and
Logical Files, Opening Files, Closing Files, Reading and Wri ng, Seeking, Special Characters,
The Unix Directory Structure, Physical devices and Logical Files, File-related Header Files,
UNIX file System Commands; Secondary Storage and System So ware: Disks, Magne c Tape,
Disk versus Tape; CD-ROM: Introduc on, Physical Organiza on, Strengths and Weaknesses;
Storage as Hierarchy, A journey of a Byte, Buffer Management, Input /Output in UNIX.
Fundamental File Structure Concepts, Managing Files of Records : Field and Record
Organiza on, Using Classes to Manipulate Buffers, Using Inheritance for Record Buffer
Classes, Managing Fixed Length, Fixed Field Buffers, An Object-Oriented Class for Record
Files, Record Access, More about Record Structures, Encapsula ng Record Opera ons in a
Single Class, File Access and File Organiza on.

5
Module2:
Organiza on of Files for Performance, Indexing: Data Compression, Reclaiming Space in ﬁles,
Internal Sor ng and Binary Searching, Keysor ng; What is an Index? A Simple Index for Entry-
Sequenced File, Using Template Classes in C++ for Object I/O, Object-Oriented support for
Indexed, Entry-Sequenced Files of Data Objects, Indexes that are too large to hold in Memory,
Indexing to provide access by Mul ple keys, Retrieval Using Combina ons of Secondary Keys,
Improving the Secondary Index structure: Inverted Lists, Selec ve indexes, Binding.

Module3:
Consequen al Processing and the Sor ng of Large Files: A Model for Implemen ng
Cosequen al Processes, Applica on of the Model to a General Ledger Program, Extension of
the Model to include Mu way Merging, A Second Look at Sor ng in Memory, Merging as a Way
of Sor ng Large Files on Disk.
Mul -Level Indexing and B-Trees: The inven on of B-Tree, Statement of the problem, Indexing
with Binary Search Trees; Mul -Level Indexing, B-Trees, Example of Crea ng a B-Tree, An
Object-Oriented Representa on of B-Trees, B-Tree Methods; Nomenclature, Formal Deﬁni on
of B-Tree Proper es, Worst-case Search Depth, Dele on, Merging and Redistribu on,
Redistribu on during inser on; B* Trees, Buﬀering of pages; Virtual B-Trees; Variable-length
Records and keys.

6
Module4:
Indexed Sequen al File Access and Prefix B + Trees: Indexed Sequen al Access, Maintaining a
Sequence Set, Adding a Simple Index to the Sequence Set, The Content of the Index: Separators
Instead of Keys, The Simple Prefix B+ Tree and its maintenance, Index Set Block Size, Internal
Structure of Index Set Blocks: A Variable-order B- Tree, Loading a Simple Prefix B+ Trees, B-
Trees, B+ Trees and Simple Prefix B+ Trees in Perspec ve.

Module5:
Hashing: Introduc on, A Simple Hashing Algorithm, Hashing Func ons and Record Distribu on,
How much Extra Memory should be used?, Collision resolu on by progressive overﬂow, Buckets,
Making dele ons, Other collision resolu on techniques, Pa erns of record access.
Extendible Hashing: How Extendible Hashing Works, Implementa on, Dele on, Extendible
Hashing Performance, Alterna ve Approaches.

7
What are File Structures?
Why Study File Structure Design
Overview of File Structure Design

8
A File Structure is a combination of representations for
data in ﬁles and of operations for accessing the data.
A File Structure allows applications to read, write and
modify data. It might also support ﬁnding the data that
matches some search criteria or reading through the
data in some particular order.

9
Computer Data can be stored in three kinds of locations:
◦ Primary Storage ==> Memory [Computer Memory]
◦ Secondary Storage [Online Disk/ Tape/ CDRom that can be
Our accessed by the computer]
Focus
◦ Tertiary Storage ==> Archival Data [Oﬄine Disk/
Tape/ CDRom not directly available to the computer.]

10
Secondary storage such as disks can pack thousands of megabytes in a
small physical location.

Computer Memory (RAM) is limited.

However, relative to Memory, access to secondary storage is extremely
slow [E.g., getting information from slow RAM takes 120. 10-9 seconds (=
120 nanoseconds) while getting information from Disk takes 30. 10-3
seconds (= 30 milliseconds)]

11
By improving the File Structure.
Since the details of the representation of the data and the
implementation of the operations determine the eﬃciency of the
ﬁle structure for particular applications, improving these details can
help improve secondary storage access time.

12
Get the information we need with one access to the disk.
If that’s not possible, then get the information with as few accesses
as possible.
Group information so that we are likely to get everything we need
with only one trip to the disk.

13
It is relatively easy to come up with file structure designs that meet
the general goals when the files never change.
When files grow or shrink when information is added and deleted, it
is much more difficult.

14
Early Work assumed that ﬁles were on tape.
Access was sequential and the cost of acces grew in direct
proportion to the size of the ﬁle.

15
As files grew very large, unaided sequential access was not a good
solution.
Disks allowed for direct access.
Indexes made it possible to keep a list of keys and pointers in a
small file that could be searched very quickly.
With the key and pointer, the user had direct access to the large,
primary file.

16
As indexes also have a sequential ﬂavour, when they grew too
much, they also became diﬃcult to manage.

The idea of using tree structures to manage the index emerged

in the early 60’s.

However, trees can grow very unevenly as records are added

and deleted, resulting in long searches requiring many disk
accesses to ﬁnd a record.

17
In 1963, researchers came up with the idea of AVL trees for data in memory.

AVL trees, however, did not apply to ﬁles because they work well when tree nodes
are composed of single records rather than dozens or hundreds of them.
In the 1970’s came the idea of B-Trees which require an O(logk N) access time

where N is the number of entries in the file and k, th number of entries indexed in
a single block of the B-Tree structure --> B-Trees can guarantee that one can find
one file entry among millions of others with only 3 or 4 trips to the disk.

18
Retrieving entries in 3 or 4 accesses is good, but it does
not reach the goal of accessing data with a single request.
From early on, Hashing was a good way to reach this goal
with ﬁles that do not change size greatly over time.
Recently, Extendible Dynamic Hashing guarantees one or
at most two disk accesses no matter how big a ﬁle
becomes.

19
Fundamental File Processing Operations

20
Physical versus Logical Files
Opening and Closing Files
Reading, Writing and Seeking
Special Characters in Files
The Unix Directory Structure
Physical Devices and Logical Files
Unix File System Commands

21
Physical File: A collection of bits/bytes stored on the secondary
storage like disk or tape.
Logical File: A channel that connects the program to the physical
ﬁle(stream)

File ->Telephone line -> Connected to a telephone N/W

Program can have 2 operations:

◦ Read: read bytes from ﬁle =>through telephone line
◦ Write: write bytes to ﬁle =>through telephone line
◦ Knows only about the line through which the data is sent/ received.

22
Logical File: A “Channel” (like a telephone line) that hides
the details of the file’s location and physical format to
the program.
When a program wants to use a particular file:
◦ It has to hookup between logical and physical file.
◦ E.g:Select inp_file assign to “myfile.dat”

23
◦ Tells=> the operating system

To find the physical file called “myfile.dat”

make the hookup by assigning a logical ﬁle to it.
Returns a number that refers the phone line
This number is returned through a variable “inp_ﬁle”.
This logical name is used inside the program.

Program doesn’t know the physical location of a ﬁle and

the format in which it is saved.

24
Once we have a logical file identifier hooked up to a
physical file or device, we need to declare what we
intend to do with the file:
Open an existing file
Create a new file
 That makes the file ready to use by the program
 We are positioned at the beginning of the file and are
ready to read or write.

25
System Call fd = open(filename, flags [, pmode]);
Argument Type Explanation
fd (file int Logical file descriptor
descriptor) *error to open a file:negative value
filename Char * Character string
(physical file
name)
flags int O_APPEND(to the end of file)
O_CREAT( no effect if file exist)
O_EXCL( returns error if O_CREAT is specified and
file exists )
O_RDONLY, O_RDWR, O_WRONLY.
(opens a file specified mode)
O_TRUNC(If file exists truncates to length to zero
by destroying its contents. )

27
Argument Type Explanation
flags Int In O_CREAT is specified =>pmode is required
pmode = rwe rwe rwe
111 101 001
owner group world
E.g.
1.fd=open(filename, O_RDWR| O_CREAT, 0751)
If file exist and to be start at initial position:

2. fd=open(ﬁlename, O_RDWR| O_CREAT|O_TRUNC, 0751)

28
Makes the logical file name/file descriptor available for
another physical file (it’s like hanging up the telephone
after a call).
Ensures that everything has been written to the file [since
data is written to a buffer prior to the file].
Files are usually closed automatically by the operating
system (unless the program is abnormally interrupted).

29
Read(Source_ﬁle, Destination_addr, Size)

Source_ﬁle = location the program reads

from, i.e., its logical file name
Destination_addr = first address of the
memory block where we want to store the
data.
Size = how much information is being
brought in from the file (byte count).

30
Write(Destination_ﬁle, Source_addr, Size)

Destination_ﬁle = the logical ﬁle name where

the data will be written.
Source_addr = ﬁrst address of the memory
block where the data to be written is stored.
Size = the number of bytes to be written.

31
C Stream: stdio.h
◦ Input stream: stdin
◦ Output stream: stdout
Open function:
◦ file=fopen(filename, type);
file=> File * =>file descriptor
filename=> char *
type=> char * =>read/input(r), write/output(w), append(a)
,

32
type=> char * =>
r+ => Open an existing file for input/output
w+ => Create a new file / truncate an existing one for input/
output
a+ => Create a new file / append to an existing one for input/
output
fread, fget, fwrite, fput
fscanf, fprintf (formatted I/O)
C++ Stream:
◦ We use cin and cout for input and output
◦ To use this cin and cout overload the respective
operators.
◦ Other file operations are defined in fstream.h
◦ int open(char * filename, int mode)
◦ ios::in, ios::out, ios::nocreate(fail if the file does not
exists), ios::noreplace(fail if the file does exists), ios::
binary(file is in binary)
This are 2 constructor in fstream.h which
attaches a file to fstream.
◦ fstream()
◦ fstream(char * filename,int mode)

35
Opening Files:
◦ links a logical file to a physical file.
In C:
FILE * outfile;
outfile = fopen(“myfile.txt”, “w”);
In C++:
fstream outfile;
outfile.open(“myfile.txt”, ios::out);

36
In C :
fclose(outﬁle);
In C++ :
outﬁle.close();

37
Read data from a file and place it in a variable inside the
program.
In C:
char c;
FILE * infile;
infile = fopen(“myfile.txt”,”r”);
fread(&c, 1, 1, infile);
In C++:
char c;
fstream infile;
infile.open(“myfile.txt”,ios::in);
infile >> c;

38
Write data from a variable inside the program
into the file.
In C:
char c;
FILE * outfile;
outfile = fopen(“mynew.txt”,”w”);
fwrite(&c, 1, 1, outfile);
In C++:
char c;
fstream outfile;
outfile.open(“mynew.txt”,ios::out);
outfile << c;

39
#include <stdio.h>
main(){
char ch;
FILE * file;// file descriptor
char filename[20];
printf(“enter the name of the file:”);
gets(filename);
file = fopen(filename, “r”);
while (fread(&ch,1,1,file)!=0)//element size and no. of elements
fwrite(&ch,1,1,stdout);
fclose(file);
}

40
#include <iostream.h>
main(){
char ch;
fstream file;// file descriptor
char filename[20];
cout<<“enter the name of the file:”<<flush;
cin>> filename;
file.open(filename, ios::in);
file.unset(ios::skipws);

41
while(1){
file>>ch; //operator overload(data read from file to
ch)
if(file.fail())break; // end of file
cout<<ch;
}
file.close();
}

42
A program does not necessarily have to read through a file
sequentially: It can jump to specific locations in the file or to the
end of file so as to append to it.
The action of moving directly to a certain position in a file is often
called seeking.
Seek(Source_file, Offset)
◦ Source_file = the logical file name in which the seek will occur
◦ Offset = the number of positions in the file the pointer is to be
moved from the start of the file.

43
pos=fseek(file, byte_offset, origin)
◦ pos => address to which read/write pointer has moved.
◦ file => file descriptor
◦ byte_offset => no.of bytes(location) from some origin.
◦ Origin => 0 - fseek from beginning of the file
1 - fseek from current of the file
2 - fseek from end of the file
Pos=fseek(file,373L,0); // 373 bytes

44
It is similar to C
Two difference:
◦ fstream has 2 pointers
Get pointer(seekg)
file.seekg(byte_offset,origin)
Put pointer(seekp)
file.seekp(byte_offset,origin)
◦ Origin is of class ios
ios::beg, ios::cur, ios::end
e.g.:file.seekg/seekp(373, ios::beg)

45
Sometimes, the operating system attempts to make
“regular” user’s life easier by automatically adding or
deleting characters for them.
These modiﬁcations, however, make the life of
programmers building sophisticated ﬁle structures (YOU)
more complicated!

46
Control-Z is added at the end of all ﬁles (MS-
DOS). This is to signal an end-of-ﬁle.
<Carriage-Return> + <Line-Feed> are added to
the end of each line (again, MS-DOS).
<Carriage-Return> is removed and replaced by a
character count on each line of text (VMS)

47
In any computer systems, there are many files (100’s or 1000’s). These
files need to be organized using some method. In Unix, this is called
the File System.
The Unix File System is a tree-structured organization of directories.
With the root of the tree represented by the character “/”.
Each directory can contain regular files or other directories.
The file name stored in a Unix directory corresponds to its physical
name.

48
49
Any file can be uniquely identified by giving it its
absolute pathname. E.g., /usr6/mydir/addr.
The directory you are in is called your current
directory.
You can refer to a file by the path relative to the
current directory.
“.” stands for the current directory and “..” stands for
the parent directory.

50
Unix has a very general view of what a ﬁle is:
◦ It corresponds to a sequence of bytes without knowing
about
where the bytes are stored
where they come from.
Magnetic disks or tapes can be thought of as ﬁles
and so can the keyboard and the console.
◦ /dev/kbd (keyboard sends sequence of bytes)
◦ /dev/console (accepts sequence of bytes and displays)

51
No matter what the physical form of a
Unix file
◦ real file or device
◦ Both are represented in the same way in Unix:
By an integer.
i.e. file descriptor
Which is a logical name

52
Stdout --> Console
fwrite(&ch, 1, 1, stdout);
Stdin --> Keyboard
fread(&ch, 1, 1, stdin);
Stderr --> Standard Error (again, Console)
[When the compiler detects an error, the error
message is written in this ﬁle]

53
If we want to write to a file instead of stdout.
OR
If we want to use the output of the file as the input in
another file
We go for concept of redirection.

54
< filename [redirect stdin to “filename”]
> filename [redirect stdout to “filename”]
E.g., a.out > my-output
program1 | program2 [take any stdout output from
program1 and use it in place of any stdin input to
program2.
E.g., list | sort

55
cat filenames --> Print the content of the named textfiles.
tail filename --> Print the last 10 lines of the text file.
cp file1 file2 --> Copy file1 to file2.
mv file1 file2 --> Move (rename) file1 to file2.
rm filenames --> Remove (delete) the named files.
chmod mode filename --> Change the protection mode on the
named file.
ls --> List the contents of the directory.
mkdir name --> Create a directory with the given name.
rmdir name --> Remove the named directory.

56
Secondary Storage and
System Software: Magnetic
Disks &Tapes

57
The Organization of Disks
Estimating Capacities and Space Needs
Organizing Tracks by Sector
Organizing Tracks by Block
Non Data Overhead
The Cost of a Disk Access
Disk as Bottleneck

58
Having learned how to manipulate files, we now
learn about the nature and limitations of the
devices and systems used to store and retrieve
files, so that we can design good file structures
that arrange the data in ways that minimize
access costs given the device used by the system.

59
Disks belong to the category of Direct Access Storage Devices (DASDs)
because they make it possible to access the data directly.
This is in contrast to Serial Devices (e.g., Magnetic Tapes) which allows
only serial access [all the data before the one we are interested in has to
be read or written in order].
Different Types of Disks:
◦ Hard Disk: High Capacity + Low Cost per bit.
◦ Floppy Disk: Cheap, but slow and holds little data. (zip disks: removable disk
cartridges)
◦ Optical Disk (CD-ROM): Read Only, but holds a lot of data and can be reproduced
cheaply. However, slow.

60
The information to be stored on a disk is stored on
the surface of one or more platters.
The information is stored in successive tracks on
the surface of the disk.
Each track is often divided into a number of
sectors which is the smallest addressable portion
of a disk.

61
62
63
64
65
When a read statement calls for a particular byte
from a disk file, the computer’s operating system
finds the correct platter, track and sector, reads
the entire sector into a special area in memory
called a buffer, and then finds the requested byte
within that buffer.

66
Disk drives typically have a number of platters and the
tracks that are directly above and below one another
form a cylinder.
All the info on a single cylinder can be accessed
without moving the arm that holds the read/write
heads.
Moving this arm is called seeking. The arm movement
is usually the slowest part of reading information
from a disk.

67
Track Capacity = number of sectors per track *
bytes per sector
Cylinder Capacity = number of tracks per cylinder
* track capacity
Drive Capacity = number of cylinders * cylinder
capacity

68
Suppose we want to store 50,000 bytes of ﬁxed
data records with following characteristics:
◦ No. of bytes per sector=512
◦ No. of sectors per track=63
◦ No. of tracks per cylinder=16
◦ No. of cylinders=4092
How many cylinders are required to store the above
data if data record is 256 bytes?

69
As 1 data record is 256
Each sector can hold 2 records as sector is 512.
No. of sectors:50000/2=25000
One cylinder can hold:
◦ No. of tracks*no. of sectors=63*16=1008
No. of cylinders are:
◦ 25000/1008=24.8 cylinders.

70
Data on disks can be organized in 2 ways:
1. By sector
2. By user deﬁned block

71
Logical organization of sectors on a track is that sectors are :
◦ adjacent
◦ ﬁxed-sized segments of a track
◦ that holds a ﬁle.
Physically, however, this organization is not optimal:
◦ After reading the data, it takes the disk controller some time
◦ to process the received information before it is ready to
accept more.
◦ If the sectors were physically adjacent:
we would miss the start of the next sector while processing the info
just read in.

January 18 & 20, 2015 72

Physically
adjacent

73
Traditional Solution: Interleave the sectors.
Namely, leave an interval of several physical
sectors between logically adjacent sectors.

74
Nowadays, however, the controller’s speed has
improved so that no interleaving is necessary
anymore.

75
The ﬁle can also be viewed as a series of
clusters of sectors which represent a ﬁxed
number of (logically) contiguous sectors.
Once a cluster has been found on a disk, all
sectors in that cluster can be accessed without
requiring an additional seek.

76
When a program wants to access a file it will be
the task of file manager.
As file manager maps logical file to the physical
one.
File is viewed as a series of cluster of sectors
A cluster physical location is identified by the
File Allocation Table which maps logical sectors
to the physical clusters they belong to.

77
78
We go to all the above methods to reduce seeking.
(one extent) the file can be processed with a
minimum of seeking time.
To do the above we can check whether the disks is
free:
◦ If free place => place file in a contiguous slot.
◦ If disk is not free then we can 2 or more non
contiguous (2/more extents)
◦ As the number of extents in a file increases, the file
becomes more spread out on the disk, and the
amount of seeking necessary increases.

79
80
If the size of sector is 512 bytes
And if all records in a ﬁle is 300 bytes
How to store this data?
There are 2 possible organizations for records (if the
records are smaller than the sector size:
1. Store 1 record per sector
2. Store the records successively (i.e., one record may span two sectors)

81
82
Trade-Offs
Advantage of 1: Each record can be retrieved from 1 sector.
Disadvantage of 1: Loss of Space with each sector ==>
Internal Fragmentation
Advantage of 2: No internal fragmentation
Disadvantage of 2: 2 sectors may need to be accessed to
retrieve a single record.
The use of clusters also leads to internal fragmentation.

83
Rather than being divided into sectors,
◦ The disk tracks may be divided into user-deﬁned blocks.

When the data on a track is organized by block,

◦ The amount of data transferred in a single I/O operation
can vary depending on the needs of the software designer (not the hardware).

Blocks can normally be either ﬁxed or variable in length,

depending on the requirements of the ﬁle designer and the
capabilities of the operating system.

84
Blocks don’t have the sector-spanning and
fragmentation problem of sectors
since they vary in size to ﬁt the logical organization of the data.
A block consist of integral number of logical
records.

85
The blocking factor indicates the number of
records that are to be stored in each block in a ﬁle.
Each block is usually accompanied by sub-blocks:
◦ key-subblock (each data block can be given a key
depending on the last record)
◦ count-subblock (no. of bytes in a data block)

86
Key sub-block: The disk controller can search a
particular block on a track by the key deﬁned.
Accessing is eﬃcient.

87
• FAT stands for File Allocation Table and FAT32 is an extension which
means that data is stored in chunks of 32 bits. These is an older type
of ﬁle system that isn’t commonly used these days.
• NTFS stands for New Technology File System and this took over
from FAT as the primary ﬁle system being used in Windows.

88
Whether using a block or a sector organization, some
space on the disk is taken up by non-data overhead.
◦ i.e., information stored on the disk during pre-formatting.

On sector-addressable disks, pre-formatting involves

storing, at the beginning of each sector,
◦ sector address,
◦ track address
◦ condition (usable or defective)
◦ placing gaps and synchronization marks between ﬁelds of info to
help the read/write mechanism distinguish between them.
89
On Block-Organized disks some of the non-data
overhead is,
◦ Sub block + inter block gaps have to be provided with
every block.
◦ The relative amount of non-data space necessary for a
block scheme is higher than for a sector-scheme.

90
The greater the block-factor,
◦ Advantage: Efficient use of storage.
◦ Disadvantage :Since tracks are fixed in size- at the end of
each track some space is left leads to internal track
fragmentation.
The flexibility introduced by the use of blocks rather
than sectors
◦ can save time since it lets the programmer determine, to a large
extent, how the data is to be organized physically on disk.
On the negative side: overhead is on the
Programmer and Operating System determining
data organization .

91
Seek Time is the time required to move the access arm to the
correct cylinder.

◦ sequentially-=>less access time

◦ E.g. two ﬁles =>ends of a cylinder => seeking is expensive

◦ Generally we take average seek time.

Rotational Delay is the time it takes for the disk to rotate so the

sector we want is under the read/write head.

◦ Hard disk => 5000 rpm(Revolutions per minute)

◦ Floppy disks => 300 rpm

92
Transfer Time =
(Number of Bytes Transferred/ Number of Bytes on a Track)
* Rotation Time

93
Processes are often Disk-Bound, i.e., the network and
the CPU often have to wait inordinate lengths of time for
the disk to transmit data.
Solution 1: Multiprogramming (CPU works on other jobs
while waiting for the disk)
Solution 2: Striping: splitting the parts of a ﬁle on several
different drives, then letting the separate drives deliver
parts of the ﬁle to the network simultaneously ==>
Parallelism

94
Solution 3: RAID: Redundant Array of Independent
Disks(is a data storage virtualization technology that combines multiple
physical disk drive components into a single logical unit )

Solution 4: RAM disk ==> Simulate the behavior of the

mechanical disk in memory.
Solution 5: Disk Cache= large block of memory
conﬁgured to contain pages of data from a disk. Check
cache ﬁrst. If not there, go to the disk and replace some
page already in cache with page from disk containing
the data.

95
Secondary Storage and System
Software: Magnetic Disks &Tapes

96
Description of Tape Systems
Organization of Data on Nine-Track Tapes
Estimating Tape Length Requirements
Estimating Data Transmission Times
Disk versus Tape

97
No direct accessing facility, but very rapid
sequential access.
Easy to store and transport, cheaper than disk.
Used for application data
Currently, tapes are primarily used as archival
storage.

98
On a tape, the logical position of a byte within a ﬁle
◦ corresponds directly to its physical position relative to
the start of the ﬁle.
The surface of a typical tape can be seen
◦ as a set of parallel tracks
◦ each of which is a sequence of bits.
◦ These bits correspond to 1 byte + a parity bit.
One Byte = a one-bit-wide slice of tape called a
frame.

99
100
In odd parity, the bit is set to make the number
of bits in the frame odd. This is done to check
the validity of the data.
Frames are organized into data blocks of
variable size separated by inter block gaps (long
enough to permit stopping and starting)

101
Calculate how much tape space needed?
◦ To store 1 million 100 byte records.
◦ To store a ﬁle on 6250 bpi(bytes per inch)
◦ Interblock gap= 0.3 inches

102
Let b= the physical length of a data block
Let g= the length of an interblock gap, and
Let n= the number of data blocks.
The space requirement, s, for storing the ﬁle is
s = n * (b+g)

103
b= blocksize (i.e., bytes per block)
tape density (i.e., bytes per inch)

n is calculated by blocking factor

n= total no. of bytes
blocking factor

104
b = 100/6250 => 0.016 inch
If blocking factor = 1 then,
n= 1 000 000 =>1 000 000
1
s= 1 000 000 *(0.016+0.3)
=> 316000 inches= 26,333 feet

105
If blocking factor = 50 then,
n= 1 000 000 =>20000
50
s= 20000 *(0.016+0.3)
=> 6320 inches

106
The number of records stored in a physical block
is called the blocking factor.
If blocking factor = 1 then,

If blocking factor = 50 then,

107
Greater the blocking factor lesser interblock gap and better space
utilization.
Hence it depends on blocking factor.
If we try to use a generalized measure i.e. =>
Effective Record Density: a general measure of the effect of choosing
different block sizes:
number of bytes per block => 100
number of inches required to store a block (0.016+0.3)
 Space utilization is sensitive to the relative sizes of data blocks and
interblock gaps.

108
Nominal Data Transmission Rate=
(Tape Density in (bpi)) * (Tape Speed in (ips))
Interblock gaps, however, must be taken into
consideration ==> Effective Transmission Rate/
((Effective Recording Density)* (Tape Speed))
Effective Recording Density=
No. of bytes per block
No. of inches required to store a block

109
In the past: Both Disks and Tapes were used for
secondary storage. Disks were preferred for
random access and tape was better for sequential
access.
Now (1): Disks have taken over much of secondary
storage ==> Because of the decreased cost of disk
+ memory storage
Now (2): Tapes are used as Tertiary storage
(Cheap, fast & easy to stream large ﬁles or sets of
ﬁles between tape and disk)

110
Secondary Storage and System
Software:
CD-ROM & Issues in Data
Management

111
CD-ROM (Compact Disk, Read-Only
Memory)
A Journey of a Byte
Buffer Management
I/O in Unix

112
A single disc can hold more than 600 megabytes of data (~
400 books of the textbook’s size)
CD-ROM is read only. i.e., it is a publishing medium rather
than a data storage and retrieval like magnetic disks.
CD-ROM Strengths: High storage capacity, inexpensive and
durable.
CD-ROM Weaknesses: extremely slow seek performance
(between 1/2 a second to a second) ==> Intelligent File
Structures are critical.

113
CD-ROM is a descendent of CD Audios. i.e.,
listening to music is sequential and does not
require fast random access to data.
Reading Pits and Lands:
◦ CD-ROMs are stamped from a glass master disk which
has a coating.
◦ Coating :Which is changed by the laser beam

114
Reading Pits and Lands:
◦ When the coating is developed, the areas hit by the laser
beam turn into pits along the track followed by the beam.
The smooth unchanged areas between the pits are called
lands.

115
◦ To read data from CD ROM
A beam of laser light is passed on the track as it
moves under the optical pickup.
The pits scatter the light, but the lands reﬂect
most of it back to the pickup.
This alternating pattern of high- and low-intensity
reﬂected light is the signal used to reconstruct
the original digital information.
116
1’s are represented by the transition from pit to
land and back again.
0’s are represented by the amount of time
between transitions. The longer between
transitions, the more 0s we have.

117
Given this scheme, it is not possible to have 2
adjacent 1s: 1s are always separated by 0s. As a
matter of fact, because of physical limitations,
there must be at least two 0s between any pair of
1s.
Raw patterns of 1s and 0s have to be translated to
get the 8-bit patterns of 1s and 0s that form the
bytes of the original data.

118
E.g.
◦ EFM encoding (Eight to Fourteen Modulations) turns the
original 8 bits of data into 14 expanded bits that can be
represented in the pits and lands on the disk.

119
CLV- Constant linear velocity
CAV- Constant angular velocity
Data on a CD-ROM is stored in a single, spiral track.
This allows the data to be packed as tightly as possible
since all the sectors have the same size (whether in the
center or at the edge).
This type of pattern is used in audio system.
In the “regular arrangement”, the data is packed more
densely in the center than in the edge ==> Space is lost in
the edge.
120
Since reading the data requires that it passes under
the optical pick-up device at a constant rate,
The disc has to spin more slowly when reading the
outer edges than when reading towards the center.

121
Part of the problem is the need to change
rotational speed.
In CLV- rotation time needs to be changed
In CAV- data storage at the center is more and at
the edges is less.
◦ Some space is not used in CAV.

122
To read the address info that is stored on the disc
◦ Along with the user’s data, we need to be moving the
data under the optical pick up at the correct speed.
◦ But to know how to adjust the speed, we need to be able
to read the address info so we know where we are.
◦ How do we break this loop? By guessing and through trial
and error ==> Slows down performance.

123
Different from the “regular” disk method.
Each second of playing time on a CD is divided into 75 sectors.
Each sector holds 2 Kilobytes of data. Each CD-ROM contains
at least one hour of playing time.
==> The disc is capable of holding at least
◦ 60 min * 60 sec/min * 75 sector/sec * 2 Kilobytes/sector =
540, 000 KBytes
Often, it is actually possible to store over 600, 000 KBytes.
Sectors are addressed by min:sec:sector e.g., 16:22:34

124
125
Seek Performance: very bad
Data Transfer Rate: Not Terrible/Not Great
Storage Capacity: Great
◦ Beneﬁt: enables us to build indexes and other support structures that
can help overcome some of the limitations associated with CD-ROM’s
poor performance.

Read-Only Access: There can’t be any changes ==> File organization can
be optimized.

Asymmetric Writing and Reading

No need for interaction with the user (which requires a quick response)

127
Part that takes place in memory:
Statement calls the Operating System (OS) which
overseas the operation

128
File manager (Part of the OS that deals with I/O)
◦ Checks whether the operation is permitted
◦ Locates the physical location where the byte will be stored
(Drive, Cylinder, Track & Sector)
◦ Finds out whether the sector to locate the ‘P’ is already in
memory (if not, call the I/O Buffer)
◦ Puts ‘P’ in the I/O Buffer
◦ Keep the sector in memory to see if more bytes will be going
to the same sector in the ﬁle

129
130
131
Part that takes place outside of memory:
I/O Processor: Wait for an external data path to become
available (CPU is faster than data-paths ==> Delays)
Disk Controller:
◦ I/O Processor asks the disk controller if the disk drive is
available for writing
◦ Disk Controller instructs the disk drive to move its read/
write head to the right track and sector.
◦ Disk spins to right location and byte is written

132
What happens to data travelling between a program’s
data area and secondary storage?
We use temporary storage i.e. BUFFER
The use of Buffers: Buffering involves
◦ working with a large chunk of data in memory so the
number of accesses to secondary storage can be
reduced.

133
Problems:
In depends how many buffers are used:
◦ Has a single buffer
◦ Performing both input and output
◦ One character at a time, alternatively.
In this case,
◦ Read (input sector) the character to the buffer.
◦ Over write previous sector with output sector where the
character to be written
◦ Vice versa
134
In such a case, the system needs more than 1
buffer: at least, one for input and the other one for
output.
Moving data to or from disk is very slow and
programs may become I/O Bound ==> Find better
strategies to avoid this problem.

135
Multiple Buffering
Move Mode and Locate Mode
Scatter/Gather I/O

136
Multiple Buffering: using 2 or more buffer to
perform Input/Output. There are two types :
◦ Double Buffering
◦ Buffer Pooling

137
Double Buffering:
◦ Suppose we have task with only write(to disk)
operations
◦ If we use 2 buffers where I/O and CPU operations are
overlapped.
If Buffer-1 =>used by the CPU to ﬁll data
Then Buffer-2 =>used to transmit the data to disk.
If Buffer-2 =>used by the CPU to ﬁll data
Then Buffer-1 =>used to transmit the data to disk.

138
139
Buffer pooling:
◦ Pool of buffers will be available
◦ On requirement a buffer will be chosen from the pool.
◦ Method to choose buffer from pool:
LRU (Least Recently Used) strategy
Maintained by a queue.

140
Move Mode and Locate Mode:
Buffers work in two modes:
◦ Move mode:
Works while accessing data:
Deals with the movement of data from system buffer
and programs area and vice versa.
Problem: To access a byte of data we need to
transfer the data between system buffer and
program area.

141
• Locate mode:
To avoid the above problem:
We can avoid the unnecessary transfer of data.
The above can be done as follows:
System buffer is used for all operations.
Providing the pointer of the location to program.
This mode is called Locate mode.

142
Scatter Input and Gather Output:
Suppose we have to access a ﬁle with many blocks:
 Where each block consist of a header followed by
data.
 The above can be done:
 To read the entire block to a single big buffer.
 And then read the header in one buffer
 The data into other buffer
 The above method includes 2 steps.

143
To avoid 2 steps we use scatter input:
◦ A single block of data to be scattered in a collection of
buffers.
◦ The data to be read in a single read call.
Gather output:
◦ Several buffers can be gathered.
◦ Written in a single write call.
◦ Avoids: need to copy to a single big output buffer.

144
The Kernel:

145
When a program executes following instruction(Journey of
a byte):
◦ Write (fd, ch,1)
1.System call is invoked
2.Which invokes kernel(S/m call insist the kernel to write a
character)
3.Kernel I/O system- connects the ﬁle descriptor to a ﬁle or
I/O device by using four tables

146
Four tables are:
1.File descriptor table (Program)
2.Open ﬁle table (Kernel)
3.File allocation table (Kernel)
4.Table of Index nodes (Part of ﬁle system)

147
File descriptor table:
◦ Owned by the program.
◦ information to access open ﬁle table

148
Open ﬁle table:
◦ Contains information about the opened ﬁles.

149
File allocation table is a part of inode.
When a ﬁle is opened a copy of it is added to inode table.
Inode is a structure used to describe a ﬁle.

150
Once the file is identified device drivers are called to
access the data.
Linking file names to files:
◦ Two links
Hard links
Pointer from the directory to the inode of a file
Soft links/ symbolic link
Specifies a path name

151

Project Report Computer Hardware Networking Mass Infotech (Cedti), Yamuna Nagar (Hariyana)
No ratings yet
Project Report Computer Hardware Networking Mass Infotech (Cedti), Yamuna Nagar (Hariyana)
151 pages
Data Recovery Training Course
No ratings yet
Data Recovery Training Course
16 pages
File Structures Subject Code: 17IS72: Part - A
No ratings yet
File Structures Subject Code: 17IS72: Part - A
109 pages
SNIA-SA 110 Chapter 3 Storage Virtualization (Version 1.1)
No ratings yet
SNIA-SA 110 Chapter 3 Storage Virtualization (Version 1.1)
147 pages
V80M Motherboard
No ratings yet
V80M Motherboard
94 pages
File Management and Organization: Adil Yousif, PHD
No ratings yet
File Management and Organization: Adil Yousif, PHD
30 pages
تنظيم الملفات
No ratings yet
تنظيم الملفات
179 pages
Chapter 5 File Management
100% (2)
Chapter 5 File Management
37 pages
Hard Disk Drive
100% (1)
Hard Disk Drive
21 pages
Award BIOS Setup Program: Rom Pci / Isa Bios (2A69Jc39) Cmos Setup Utility Award Software, Inc
No ratings yet
Award BIOS Setup Program: Rom Pci / Isa Bios (2A69Jc39) Cmos Setup Utility Award Software, Inc
24 pages
Unit-V Storage Management
No ratings yet
Unit-V Storage Management
98 pages
Lecture 09 10
No ratings yet
Lecture 09 10
54 pages
Os 5TH
No ratings yet
Os 5TH
38 pages
Lecture 10 File Management
No ratings yet
Lecture 10 File Management
73 pages
OS CHAPTER-11 - File Management
No ratings yet
OS CHAPTER-11 - File Management
37 pages
Data File
No ratings yet
Data File
22 pages
Group 4 OS Work
No ratings yet
Group 4 OS Work
34 pages
File Processing
No ratings yet
File Processing
55 pages
Unit 6 File Management
No ratings yet
Unit 6 File Management
70 pages
File System Interface: Unit - 5
No ratings yet
File System Interface: Unit - 5
24 pages
Week 11
No ratings yet
Week 11
78 pages
Oslecture14 15
No ratings yet
Oslecture14 15
48 pages
Unit 4 Information and File MGMT
No ratings yet
Unit 4 Information and File MGMT
42 pages
Wa0024
No ratings yet
Wa0024
30 pages
Chapter 12 File Management
No ratings yet
Chapter 12 File Management
57 pages
OS Unit IV File System - Part 1
No ratings yet
OS Unit IV File System - Part 1
28 pages
File Structures UNIT 1 Notes
50% (2)
File Structures UNIT 1 Notes
13 pages
Unit 4 Information and File MGMT
No ratings yet
Unit 4 Information and File MGMT
42 pages
7chapter Seven - File Management BEST
No ratings yet
7chapter Seven - File Management BEST
26 pages
7269IV - 5th Semester - Computer Science and Engineering
No ratings yet
7269IV - 5th Semester - Computer Science and Engineering
37 pages
Floppy Disk Drives
50% (2)
Floppy Disk Drives
10 pages
Unit-Iv File Management
No ratings yet
Unit-Iv File Management
21 pages
Lecture 37-39
No ratings yet
Lecture 37-39
35 pages
Os Unit 4
No ratings yet
Os Unit 4
20 pages
File Organization-Lec1
No ratings yet
File Organization-Lec1
37 pages
File Management - Chap 12 - OS
No ratings yet
File Management - Chap 12 - OS
54 pages
OS Unit-5
No ratings yet
OS Unit-5
20 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
14 File System Interface 26-06-2023
No ratings yet
14 File System Interface 26-06-2023
28 pages
OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
No ratings yet
OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
23 pages
FS Mod1
No ratings yet
FS Mod1
13 pages
(Ch11) File System Interface
No ratings yet
(Ch11) File System Interface
54 pages
Osy 6
No ratings yet
Osy 6
19 pages
Unit 5 OS
No ratings yet
Unit 5 OS
25 pages
Diagnostics Command Manual - Eagate Confidential
No ratings yet
Diagnostics Command Manual - Eagate Confidential
497 pages
Os Module 4 Editted
No ratings yet
Os Module 4 Editted
13 pages
Unit 5
No ratings yet
Unit 5
19 pages
Lecture 8
No ratings yet
Lecture 8
42 pages
File Organization (IS 211) : Dr. Howida Youssry
No ratings yet
File Organization (IS 211) : Dr. Howida Youssry
15 pages
Os Unit 5
No ratings yet
Os Unit 5
21 pages
COSS Mid Sem Session 1 - 8 PDF
100% (1)
COSS Mid Sem Session 1 - 8 PDF
412 pages
File System-1
No ratings yet
File System-1
11 pages
File System Notes UNIT V
No ratings yet
File System Notes UNIT V
24 pages
Lectures 1-10 (7 Files Merged)
No ratings yet
Lectures 1-10 (7 Files Merged)
386 pages
OS CHAP8-alok PDF
No ratings yet
OS CHAP8-alok PDF
10 pages
File 1. File Concept
No ratings yet
File 1. File Concept
6 pages
Os CH 7
No ratings yet
Os CH 7
36 pages
File System: 1.1 Metadata
No ratings yet
File System: 1.1 Metadata
9 pages
CSI3131 Mod 9 File Sys
No ratings yet
CSI3131 Mod 9 File Sys
89 pages
Ca C3 Operating System Unit 4
No ratings yet
Ca C3 Operating System Unit 4
7 pages
DoQuangHuy HE191197
No ratings yet
DoQuangHuy HE191197
8 pages
18IS61 FSmodule1 Notes
No ratings yet
18IS61 FSmodule1 Notes
40 pages
Ch9 Filesystems
No ratings yet
Ch9 Filesystems
52 pages
9 File Systems
No ratings yet
9 File Systems
38 pages
(As Per Choice Based Credit System (CBCS) Scheme) (Effective From The Academic Year 2016 - 2017)
No ratings yet
(As Per Choice Based Credit System (CBCS) Scheme) (Effective From The Academic Year 2016 - 2017)
3 pages
File Structures
No ratings yet
File Structures
6 pages
Module-1 Introduction To File Structures
No ratings yet
Module-1 Introduction To File Structures
50 pages
Motherboard Manual Ga-965p-Ds3 (s3) 2.0 (3.3) e
No ratings yet
Motherboard Manual Ga-965p-Ds3 (s3) 2.0 (3.3) e
88 pages
Unit 6 - Secondary Storage Structures Unit 6
No ratings yet
Unit 6 - Secondary Storage Structures Unit 6
24 pages
Operating Systems Unit - 5: I/O and File Management
No ratings yet
Operating Systems Unit - 5: I/O and File Management
48 pages
English Manual BIOS
No ratings yet
English Manual BIOS
63 pages
Luth 2024
No ratings yet
Luth 2024
326 pages
The Floppy User Guide: Michael Haardt (Michael@moria - De) Alain Knaff (Alain@linux - Lu)
No ratings yet
The Floppy User Guide: Michael Haardt (Michael@moria - De) Alain Knaff (Alain@linux - Lu)
26 pages
Operating Systems
No ratings yet
Operating Systems
38 pages
Solaris Command Reference
100% (12)
Solaris Command Reference
7 pages
Toshiba HDD
No ratings yet
Toshiba HDD
4 pages
User Manual Floppy To USB Emulator: Model
No ratings yet
User Manual Floppy To USB Emulator: Model
2 pages
MANUAL Ae31 A0 Eng
No ratings yet
MANUAL Ae31 A0 Eng
56 pages
ASM Lab Manual Guide
No ratings yet
ASM Lab Manual Guide
25 pages
U-Boot For Freescale i.MX6X: 1 Overview
No ratings yet
U-Boot For Freescale i.MX6X: 1 Overview
18 pages
Nettur Technical Training Foundation Diploma in Computer Engineering - CP08 PC Hardware
No ratings yet
Nettur Technical Training Foundation Diploma in Computer Engineering - CP08 PC Hardware
72 pages
Administracion
No ratings yet
Administracion
9 pages
Specifikacije
No ratings yet
Specifikacije
10 pages
3390 Disks and Channel Speeds
No ratings yet
3390 Disks and Channel Speeds
10 pages
Docx
No ratings yet
Docx
3 pages
Log
No ratings yet
Log
53 pages
Bryant Model2 1965 102646212
No ratings yet
Bryant Model2 1965 102646212
6 pages
Disk
No ratings yet
Disk
49 pages
12 The EBIOS
No ratings yet
12 The EBIOS
4 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet

FS M1 Part1

Uploaded by

FS M1 Part1

Uploaded by

To Provide a Solid Introduction to the Topic of File Structures

Publisher: Addison Wesley

Computer Memory (RAM) is limited.

The idea of using tree structures to manage the index emerged

However, trees can grow very unevenly as records are added

File ->Telephone line -> Connected to a telephone N/W

Program can have 2 operations:

To find the physical file called “myfile.dat”

Program doesn’t know the physical location of a ﬁle and

2. fd=open(ﬁlename, O_RDWR| O_CREAT|O_TRUNC, 0751)

Source_ﬁle = location the program reads

Destination_ﬁle = the logical ﬁle name where

January 18 & 20, 2015 72

When the data on a track is organized by block,

Blocks can normally be either ﬁxed or variable in length,

On sector-addressable disks, pre-formatting involves

◦ sequentially-=>less access time

◦ E.g. two ﬁles =>ends of a cylinder => seeking is expensive

◦ Generally we take average seek time.

sector we want is under the read/write head.

◦ Hard disk => 5000 rpm(Revolutions per minute)

◦ Floppy disks => 300 rpm

Solution 4: RAM disk ==> Simulate the behavior of the

n is calculated by blocking factor

If blocking factor = 50 then,

Asymmetric Writing and Reading

You might also like