Internal File Structure: Methods and Design Paradigm
Internal File Structure: Methods and Design Paradigm
File organization is the methodology which is applied to structured computer files. Files
contain computer records which can be documents or information which is stored in a
certain way for later retrieval. File organization refers primarily to the logical
arrangement of data (which can itself be organized in a system of records with correlation
between the fields/columns) in a file system. It should not be confused with the physical
storage of the file in some types of storage media. There are certain basic types of
computer file, which can include files stored as blocks of data and streams of data, where
the information streams out of the file while it is being read until the end of the file is
encountered.
We will examine various ways that files can be stored and organized. Files are presented
to the application as a stream of bytes and then an EOF (end of file) condition.
A program that uses a file needs to know the structure of the file and needs to interpret its
contents.
[edit]
[edit]
However, all things considered the most important considerations might be:
1. Rapid access to a record or a number of records which are related to each other.
2. The Adding, modification, or deletion of records.
3. Efficiency of storage and retrieval of records.
4. Redundancy, being the method of ensuring data integrity.
A file should be organized in such a way that the records are always available for
processing with no delay. This should be done in line with the activity and volatility of
the information.
[edit]
Organizing a file depends on what kind of file it happens to be: a file in the simplest form
can be a text file, (in other words a file which is composed of ascii (American Standard
Code for Information Interchange) text.) Files can also be created as binary or executable
types (containing elements other than plain text.) Also, files are keyed with attributes
which help determine their use by the host operating system.
[edit]
1. Heap (unordered)
2. Sorted
1. Sequential (SAM)
2. Line Sequential (LSAM)
3. Indexed Sequential (ISAM)
3. Hashed or Direct
In addition to the three techniques, there are four methods of organizing files. They are
sequential, line-sequential, indexed-sequential, inverted list and direct or hashed
access organization.
[edit]
Sequential Organization
A sequential file contains records organized in the order they were entered. The order of
the records is fixed. The records are stored and sorted in physical, contiguous blocks
within each block the records are in sequence.
If the order of the records in a file is not important, sequential organization will
suffice, no matter how many records you may have. Sequential output is also useful for
report printing or sequential reads which some programs prefer to do.
[edit]
Line-Sequential Organization
Line-sequential files are like sequential files, except that the records can contain only
characters as data. Line-sequential files are maintained by the native byte stream files of
the operating system.
In the COBOL environment, line-sequential files that are created with WRITE statements
with the ADVANCING phrase can be directed to a printer as well as to a disk.
[edit]
Indexed-Sequential Organization
Key searches are improved by this system too. The single-level indexing structure is the
simplest one where a file, whose records are pairs, contains a key pointer. This pointer is
the position in the data file of the record with the given key. A subset of the records,
which are evenly spaced along the data file, is indexed, in order to mark intervals of data
records.
This is how a key search is performed: the search key is compared with the index keys to
find the highest index key coming in front of the search key, while a linear search is
performed from the record that the index key points to, until the search key is matched or
until the record pointed to by the next index entry is reached. Regardless of double file
access (index + data) required by this sort of search, the access time reduction is
significant compared with sequential file searches.
Let's examine, for sake of example, a simple linear search on a 1,000 record sequentially
organized file. An average of 500 key comparisons are needed (and this assumes the
search keys are uniformly distributed among the data keys). However, using an index
evenly spaced with 100 entries, the total number of comparisons is reduced to 50 in the
index file plus 50 in the data file: a five to one reduction in the operations count!
Life sequential organization the data is stored in physical contiguous box. How ever the
difference is in the use of indexes. There are three areas in the disc storage:
[edit]
Inverted List
In file organization, this is a file that is indexed on many of the attributes of the data
itself. The inverted list method has a single index for each key type. The records are not
necessarily stored in a sequence. They are placed in the are data storage area, but indexes
are updated for the record keys and location.
Here's an example, in a company file, an index could be maintained for all products,
another one might be maintained for product types. Thus, it is faster to search the indexes
than every record. These types of file are also known as "inverted indexes."
Nevertheless, inverted list files use more media space and the storage devices get full
quickly with this type of organization. The benefits are apparent immediately because
searching is fast. However, updating is much slower.
Content-based queries in text retrieval systems use inverted indexes as their preferred
mechanism. Data items in these systems are usually stored compressed which would
normally slow the retrieval process, but the compression algorithm will be chosen to
support this technique.
When querying a file there are certain circumstances when the query is designed to be
modal which means that rules are set which require that different information be held in
the index. Here's an example of this modality: when phrase querying is undertaken, the
particular algorithm requires that offsets to word classifications are held in addition to
document numbers.
[edit]
[edit]
[edit]
With the advent of Microsoft Windows 7 the concept of file organization and
management has improved drastically by way of use of powerful tool called Libraries. A
Library is file organization system to bring together related files and folders stored in
different locations of the local as well as network computer such that these can be
accessed centrally through a single access point. For instance, various images stored in
different folders in the local computer or/and across a computer network can be
accumulated in an Image Library. Aggregation of similar files can be manipulated, sorted
or accessed conveniently as and when required through a single access point on a
computer desktop by use of a Library. This feature is particularly very useful for
accessing similar content of related content, and also, for managing projects using related
and common data.