File Organization and Management Theory Pages 4
File Organization and Management Theory Pages 4
Related technologies
Network connectivity
Robotic storage
91
92
Robotic storage is used for backups, and for high-capacity archives in imaging,
medical, and video industries. Hierarchical storage management is a most known
archiving strategy of automatically migrating long-unused files from fast hard disk
storage to libraries or jukeboxes. If the files are needed, they are retrieved back
to disk.
92
93
WEEK 8
To understand:
93
94
Storage media/devices
A device that can receive data and retain for subsequent retrieval is called
storage medium. The file storage and retrieval describe the organization,
storage, location and retrieval of coded information in computer system.
Important factors in storing and retrieving information are the type of media or
storage device used to store information, the media’s storage capacity, the speed
of access and information transfer to and from the storage media, the number of
times new information can be written to the media and how the media interacts
with the computer.
a. Permanent storage
Information is stored permanently on storage that is written to only once, such
ROM (read only memory) chips and CD-ROM (compact disk read only memory).
Permanent storage media is used for archiving files or in case of ROM chips, for
storing basic information that the computer needs to function that cannot be
overwritten.
94
95
b. Temporary storage
Temporary information storage is used as intermediate storage between
permanent or semi-permanent storage and computer’s central processing unit
(CPU). The temporary storage is the form of memory chip called random access
memory (RAM). Information is stored in RAM while the CPU is using it. It is then
returned to a more permanent form of memory. RAM chips are known as volatile
memory because they must have power supplied to them continuously otherwise
they lose the content of their memory.
When the CPU needs to access some piece of stored information, the process is
reversed. The CPU determines where on the physical media the appropriate file
95
96
is stored, then directs the read/write head to position itself at that location on the
media, and then directs it to read the information stored there.
Storage media:
Information is stored on many different types of media, the most common being
floppy disks, hard drives, CD-ROMs, magnetic tapes and flash disks.
Floppy disks
Floppy disks are most often used to store information, such as application
programs that are frequently accessed by the users. A floppy disk is a thin piece
of magnetic material inside a protective envelop. The size of the disk is usually
given as a diameter of the magnetic media. The two most common sizes are; 2.5
inch and 5.25 inch. Both sizes of floppies are removable disk – that is, they must
be inserted into a compatible disk drive to read from or written to. This drive is
usually internal to, or part of a computer. Most floppy drives today are double
sided, with one head on each side of the disk. This doubles the storage capacity
of the disk, allowing it to be written to on both sides. Information is organized on
the disk by dividing the disk into tracks and sectors. Tracks are concentric
circular regions on the surface of the disk. Before a floppy can be used, the
computer has to format it by placing special information on the disk that enables
the computer to find each track and sector.
Hard drive
96
97
hard drives have multiple platters stacked on top of one another, each with it’s
own read/write head. The media in a hard drive is generally not removable from
the drive assembly, although external hard drive do exist with removable hard
disks. The read/write heads in a hard drive are precisely aligned with the
surfaces of the hard disks, allowing thousands of tracks and dozens of sectors
per track. The combination of more heads and more tracks allows hard drives to
store more data and to transfer data at a higher rate than floppy disks.
Accessing information on a hard disk involves moving the heads to the right track
and then waits for the correct sector to revolve underneath the head. Seek time
is the average time needed/required to move the heads from one track to some
other desired track on the disk. The time needed to move from one track to a
neighbouring track is often in the 1 millisecond (i.e one thousand per second)
range, and the average seek time to reach arbitrary track anywhere on the disk is
in the 6 to 15 millisecond range.
Rational latency is the average time required for the correct sector to come under
the head once they are positioned on the correct track. This time depends on
how fast the disk is revolving. Today, many drives run 120 to 180 (or more)
revolutions per second or faster, yielding average rotational latencies of a few
milliseconds.
If a file required more than one sector for storage, the positions of the sectors on
the individual tracks can greatly affect the average access time. Typically, it takes
the disk controller a small amount of time to finish reading a sector. If the next
sector to be read is the neighbouring sector on the track, the electronic may not
have enough time to get ready to read it before it rotates under the read/write
head. If this is the case, the drive must wait until the sector comes all the way
round again. This access time can be reduced by interleaving, or alternatively
placing the sectors on the tracks so that sequential sectors for the same file are
97
98
separated from each other by one or two sectors. When information is distributed
optimally, the device controller is ready to start reading just as the appropriate
sector comes under the read/write head.
After many files have been written to and erased from a disk, fragmentation can
occur. Fragmentation happens when pieces of single files are inefficiently
distributed in many locations on a disk. The result is an increase in average file
access time. This problem can be fixed by running a defragmentation program,
which goes through the drive track by track and rearranges the sectors for each
file so that they can be accessed more quickly.
Disk structure
Unlike floppy drives, in which the read/write heads actually touch the surface of
the material, the heads in most hard disks float slightly of the surface. When the
heads accidentally touch the media, either because the drive is dropped or
bumped hard or because of an electrical malfunction, the surface becomes
scratched. Any data stored where the head has touched the disk is lost. This is
called head crash. To help reduce the possibility of head crash, most disk
controllers park the heads over an unused track on the disk when the drive is not
being used by the CPU.
98
99
It is important to have a large hard disk drive to undertake some tasks like video
and sound editing, however the latest desktop computers come with a Minimum
of 60Gbyte of capacity which is enough for most standard tasks. Currently, hard
disk drives have capacities up to 120Gbytes and data transfer rates of 160Mbits
per second.
CD-ROMs
The computer uses a CD-ROM drive to access information on the CD-ROM. The
drive be external to, or part of, the computer. A light-sensitive instrument in the
drive reads the disk by watching the amount of light reflected back from a smaller
laser positioned over the spinning disk. Such disks can hold large amounts of
information, but can only be written to once. The drives capable of writing to CD-
ROMs are called write once, read many (WORM) drives. Due to their
inexpensive production costs, CD-ROMs are widely used today for storing music,
video, and application programs.
99
100
Magnetic tape
Magnetic tape has served as efficient and reliable information storage media
since 1950s. Most magnetic tapes are made of mylar, a type of strong plastic, in
which metallic particles have been embedded. A read/write head is identical to
those used for audio tape reads and writes binary information to the tape. Reel –
to-reel magnetic tape is commonly used to store information for large mainframe
or supercomputers. High-density cassette tapes, resembling audio cassette
tapes are used to store information for personal computers and mainframes.
Magnetic tape storage has advantage of being able to hold enormous amounts of
data. For this reason, it is used to store information on the largest computer
system. Magnetic tape has two major shortcomings; it has a very slow data
access time when compared to other form of storage media and access to
information on magnetic tape is sequential. In sequential data storage, data are
stored with the first bit at the beginning of the tape and the last bit at the end of
the tape, in a linear fashion. To access a random bit of information, the tape drive
has to forward or reverse through the tape until it finds the location of the bit. The
bit closest to the location of the read/write head can be accessed relatively faster,
but bits far away take a considerable time to access. RAM on the other hand, is
random access, meaning that it can locate any one bit as easily as any other.
Flash memory
100
101
the electric field it creates are late used to read the stored value. To rewrite to
flash memory, the charges in the wells must be first drained. Such drives are
useful for storing information that changes infrequently. Most flash memories are
of large capacity and hence are used to store large volume of data.
Future technology
• Reliability
– disk interleaving or striping
– RAIDs (Redundant Array of Inexpensive Disks): various levels, e.g., level 0 is
disk mirroring or shadowing, consists of keeping a duplicate of each disk)
• Controller caches
– newer disks have on-disk caches (128KB—512KB)
101
102
WEEK 9
To understand:
Different file access types:- random access and direct access storage methods.
Seek time and rotational delay
The concept of a buffer and its functions
The calculation of buffer requirement of a file.
102
103
Direct access: The direct access of the index sequential makes full use of the
index to access the records using the record address associated with the index.
The records are accessed without any defined sequence of sorting. The method
is used when the transaction is to be processed immediately without waiting for
sorting and its widely used in commercial data processing because it can be
allow both sequential and random methods of access.
Seek Time
Seek time is one of the three delays associated with reading or writing data on a
computer’s drive and somewhat similar for CD or DVD drives. The others are
rotational delay and transfer time and their sum is the access time. In order to
read or write data in a particular place on the disk, the read/write head of the disk
needs to be physically moved to the correct place. This process is known as
seeking and the time it takes for the head to move to the right place is the seek
time. Seek time for a given disk varies depending on how far the head
destination is from its origin at the time of each read/write instruction.
Rotational Delay
The rotational delay is the time required for the address area of the disk to rotate
into a position where it is accessible by the read/write head.
Transfer Time
Transfer Time is the number of bits that are transferred per unit of time. The unit
therefore is in bit per second (bps).
103
104
Concept of buffer
Buffering technique is used to compensate for the slow and possible erratic rate
at which a peripheral device produces or consume data. If the device
communicates directly with the program, the program is constraint to run in
synchronism with the device to operate independently.
Consider a program sending output to a slow device, a memory area (the buffer)
is set aside for communication. The program places data in the buffer at its own
rate. Although the device may be slow, the program does not have to stop unless
the buffer fills up, at the same time the device runs at full speed unless the buffer
is empty. This buffering technique is applicable to input device.
By providing a data buffer in the device or control unit, devices that would
ordinarily require the channel for a long period can be made to perform
independently of the channel. For instance, a card reader may physically require
about 60 ms to read a card and transfer the data to memory through the channel.
However, a buffered card reader always reads one card before it is needed and
save the 80 bytes in a buffer in the card reader or control unit. When the channel
requests that a card be read, the contents of the 80-bytes buffered are
transferred to memory at high speed (e.g 100 µs) and the channel is released.
The device then proceeds to read the next card and buffer it independently. Card
readers, card punches and printers are frequently buffered.
104
105
Functions of Buffer
Buffer synchronizes the speed of data transfer among the systems/devices that
are under the control of the CPU. It makes individual devices (that is, input and
output devices) to perform their functions independently. This is because, the
rate at which the CPU performs/sends or receive it data is very high compare to
the rate at which the I/O devices receive or send data. Buffer is equally used to
accommodate the differences in the rate at which two devices can handle data
during data communication/transfer.
105
106
WEEK 10
To understand:
106
107
File Processing
File is a body of stored data or information in an electronic format. Almost all information stored
on computers is in the form of files. Files reside on mass storage devices such as hard drives,
CD-ROMs, magnetic tape, and floppy disks. When the central processing unit (CPU) of a
computer needs data from a file, or needs to write data to a file, it temporarily stores the file in its
main memory, or Random Access Memory (RAM), while it works on the data.
Information in computers is usually classified into two different types of files: data files (those
containing data) and program files (those that manipulate data). Within each of these categories,
many different types of files exist that store various kinds of information.
Different computer operating systems have unique rules for the naming of files. Windows 95
(Win95) and disk operating systems (DOS), for instance, make use of an extension attached to
the end of each filename in order to indicate the type of file. Extensions begin with a period (.),
and then have one or more letters. An example of a file extension used in Win95 and DOS is
.bak, which indicates that the file is a backup file.
When saving a file, a user can give it any name within the rules of the operating system. In
addition, the name must be unique. Two files in the same directory may not have the same name,
but some operating systems allow the same name for one file to be placed in more than one
location. These additional names are called aliases.
Directory files contain information used to organize other files into a hierarchical structure. In
the Macintosh operating system, directory files are called folders. The topmost directory in any
file system is the root directory. A directory contained within another directory is called a
subdirectory. Directories containing one or more subdirectories are called parent directories.
Directory files contain programs or commands that are executable by the computer.
Executable files have a .exe suffix at the end of their names and are often called EXE
(pronounced EX-ee) files. Text files contain characters represented by their ASCII (American
Standard Code for Information Interchange) codes. These files are often called ASCII
(pronounced ASK-ee) files. Files that contain words, sentences, and bodies of paragraphs are
107
108
frequently referred to as text files. The diagram below shows the root directory, sub director and
file
Root directory
Sub directory (sub folder) Files
The file processing operations deal with the various activities which are performed on
the file. These operations are briefly described as shown below;
File creation: The process of bringing file into existence is called file creation.
108
109
Writing: Writing is the act of recording data onto some form of storage.
Deleting: This means removing a record or item of data from a storage medium
such as disk/tape.
File updating: This is an act of changing values in one or more records of a file
without changing the organization of the file. That is making the file modern by
adding most recent data to the file.
File merging: Combining multiple sets of data files or records to produce only one
set, usually in an ordered sequence is referred to as file merging.
Reporting: Reporting is a file processing operation that deals with the production
(printing) of report from the file in a specified format.
File display: The contents of a data file can be displayed either on the computer
screen as soft copy or printed on the paper as hard copy.
109
110
WEEK 11
To understand:
What is a table
What is an array
What is a list
110
111
Table
Table is a collection of records. Each record stores information associated with a
key by which specific records are found or the records may be arranged in an
array so that the index is the key. In commercial applications the word table is
often used as a synonym for matrix or array.
Array
111
112
U2 then L1 is the first lower bound of the array, U1 is the first upper bound, L2 is
the second lower bound and U2 is the second upper bound.
Again it is common practice to take the indexes as integers and to set both L1
and L2 equal to one. An example of two-dimensional array with U1 = m, U2 = n
is give in the diagram below;
: : … :
112
113
WEEK 12
To understand:
What is a stack
What is queue
Compare stack and queue
Examples of file processing techniques
o Batch
o Real-time
o On-line
o Serial
o Sequential
o Indexed sequential
o random
113
114
Stack
A stack is a linear list that can be accessed for either input or output at just one of its
two ends. In stack operations, all accesses involving insertions and removals are made
at one end of the list, called top. This implies access on a last in first out (LIFO) basis
where the most recently inserted item on the list is the first to be removed.
The operations push and pop refer respectively to the insertion and removal of items at
the top of the stack. Stacks occur frequently in computing and in particular are closely
associated with recursion.
One example of this model of access can be found in a stack of plates on a kitchen
shelf or in a spring-loaded dispenser in a cafeteria. In both cases, the only two logical
possibilities are to add a plate to the top of the stack. A stack is also exemplified by a
railroad spur. In this model, we can insert a boxcar from the input to the open end of the
spur, and we can remove a boxcar from the open end of the spur to the output, missing
insertion and deletion as wish. The essence of these examples is that the next object to
be removed will always be the last one that was added, hence the acronym LIFO or
Last-in First-out. Stacks illustrated by railroad spur as in diagrams below;
Output Input
A B C D B C D
Stack
114
115
C D C D
B
A A
B D B C D
A A
QUEUE
115
116
Service
Input source Queue mechanism Served
Process
Queuing System
116
117
Online: In computer science when file is activated and ready for operation;
capable of communicating with or being controlled by a computer. For example,
a printer is online when it can be used for printing; a database is online when it
can be used by a person who connects with the computer on which it is stored.
See also Offline.
Sequential access:
The accessing of pieces of information will be in a serial order, one after the
other; therefore the time to access a particular piece of information depends
upon which piece of information was last accessed. Such characteristic is typical
of off-line storage.
Indexed sequential:
Indexed sequential access: This is an access method for a sequentially
organized file whose records are indexed with their corresponding address. This
access method supports both sequential access and indexed access and the two
types of access that can be comfortably done in this method are sequential
access and random access. Note that this method is only possible with disk files.
i. Sequential access: The access of the index sequential file
makes minimal use of the index. The records can be accessed
serially as they are ordered one after another and from the first to
the last record in the file.
ii. Random access: The random access of the index sequential
makes full use of the index to access the records using the record
address associated with the index. The records are accessed
without any defined sequence of sorting. The method is used
when the transaction is to be processed immediately without
waiting for sorting and its widely used in commercial data
processing because it can be allow both sequential and random
methods of access.
118
119
119
120
WEEK 13
To understand:
120