0% found this document useful (0 votes)
156 views6 pages

Internal File Structure: Methods and Design Paradigm

The document discusses various methods of organizing computer files, including their internal and external structures. It describes sequential, line-sequential, indexed-sequential, inverted list, and direct/hashed access methods of file organization. Key considerations for file organization include rapid access to related records, efficiency of storage and retrieval, and ensuring data integrity. The document also discusses file extensions and how file naming conventions differ between FAT and NTFS file systems.

Uploaded by

dhiraj100
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views6 pages

Internal File Structure: Methods and Design Paradigm

The document discusses various methods of organizing computer files, including their internal and external structures. It describes sequential, line-sequential, indexed-sequential, inverted list, and direct/hashed access methods of file organization. Key considerations for file organization include rapid access to related records, efficiency of storage and retrieval, and ensuring data integrity. The document also discusses file extensions and how file naming conventions differ between FAT and NTFS file systems.

Uploaded by

dhiraj100
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction

File organization is the methodology which is applied to structured computer files. Files
contain computer records which can be documents or information which is stored in a
certain way for later retrieval. File organization refers primarily to the logical
arrangement of data (which can itself be organized in a system of records with correlation
between the fields/columns) in a file system. It should not be confused with the physical
storage of the file in some types of storage media. There are certain basic types of
computer file, which can include files stored as blocks of data and streams of data, where
the information streams out of the file while it is being read until the end of the file is
encountered.

We will look at two components of file organization here:

1. The way the internal file structure is arranged and


2. The external file as it is presented to the O/S or program that calls it. Here we will
also examine the concept of file extensions.

We will examine various ways that files can be stored and organized. Files are presented
to the application as a stream of bytes and then an EOF (end of file) condition.

A program that uses a file needs to know the structure of the file and needs to interpret its
contents.

[edit]

Internal File Structure

[edit]

Methods and Design Paradigm

It is a high-level design decision to specify a system of file organization for a computer


software program or a computer system designed for a particular purpose. Performance is
high on the list of priorities for this design process, depending on how the file is being
used. The design of the file organization usually depends mainly on the system
environment. For instance, factors such as whether the file is going to be used for
transaction-oriented processes like OLTP or Data Warehousing, or whether the file is
shared among various processes like those found in a typical distributed system or
standalone. It must also be asked whether the file is on a network and used by a number
of users and whether it may be accessed internally or remotely and how often it is
accessed.

However, all things considered the most important considerations might be:

1. Rapid access to a record or a number of records which are related to each other.
2. The Adding, modification, or deletion of records.
3. Efficiency of storage and retrieval of records.
4. Redundancy, being the method of ensuring data integrity.

A file should be organized in such a way that the records are always available for
processing with no delay. This should be done in line with the activity and volatility of
the information.

[edit]

Types of File Organization

Organizing a file depends on what kind of file it happens to be: a file in the simplest form
can be a text file, (in other words a file which is composed of ascii (American Standard
Code for Information Interchange) text.) Files can also be created as binary or executable
types (containing elements other than plain text.) Also, files are keyed with attributes
which help determine their use by the host operating system.

[edit]

Techniques of File Organization

The three techniques of file organization are:

1. Heap (unordered)
2. Sorted
1. Sequential (SAM)
2. Line Sequential (LSAM)
3. Indexed Sequential (ISAM)
3. Hashed or Direct

In addition to the three techniques, there are four methods of organizing files. They are
sequential, line-sequential, indexed-sequential, inverted list and direct or hashed
access organization.

[edit]

Sequential Organization

A sequential file contains records organized in the order they were entered. The order of
the records is fixed. The records are stored and sorted in physical, contiguous blocks
within each block the records are in sequence.

Records in these files can only be read or written sequentially.


Once stored in the file, the record cannot be made shorter, or longer, or deleted. However,
the record can be updated if the length does not change. (This is done by replacing the
records by creating a new file.) New records will always appear at the end of the file.

If the order of the records in a file is not important, sequential organization will
suffice, no matter how many records you may have. Sequential output is also useful for
report printing or sequential reads which some programs prefer to do.

[edit]

Line-Sequential Organization

Line-sequential files are like sequential files, except that the records can contain only
characters as data. Line-sequential files are maintained by the native byte stream files of
the operating system.

In the COBOL environment, line-sequential files that are created with WRITE statements
with the ADVANCING phrase can be directed to a printer as well as to a disk.

[edit]

Indexed-Sequential Organization

Key searches are improved by this system too. The single-level indexing structure is the
simplest one where a file, whose records are pairs, contains a key pointer. This pointer is
the position in the data file of the record with the given key. A subset of the records,
which are evenly spaced along the data file, is indexed, in order to mark intervals of data
records.

This is how a key search is performed: the search key is compared with the index keys to
find the highest index key coming in front of the search key, while a linear search is
performed from the record that the index key points to, until the search key is matched or
until the record pointed to by the next index entry is reached. Regardless of double file
access (index + data) required by this sort of search, the access time reduction is
significant compared with sequential file searches.

Let's examine, for sake of example, a simple linear search on a 1,000 record sequentially
organized file. An average of 500 key comparisons are needed (and this assumes the
search keys are uniformly distributed among the data keys). However, using an index
evenly spaced with 100 entries, the total number of comparisons is reduced to 50 in the
index file plus 50 in the data file: a five to one reduction in the operations count!

Hierarchical extension of this scheme is possible since an index is a sequential file in


itself, capable of indexing in turn by another second-level index, and so forth and so on.
And the exploit of the hierarchical decomposition of the searches more and more, to
decrease the access time will pay increasing dividends in the reduction of processing
time. There is however a point when this advantage starts to be reduced by the increased
cost of storage and this in turn will increase the index access time.

Hardware for Index-Sequential Organization is usually Disk-based, rather than tape.


Records are physically ordered by primary key. And the index gives the physical location
of each record. Records can be accessed sequentially or directly, via the index. The index
is stored in a file and read into memory at the point when the file is opened. Also, indexes
must be maintained.

Life sequential organization the data is stored in physical contiguous box. How ever the
difference is in the use of indexes. There are three areas in the disc storage:

• Primary Area:-Contains file records stored by key or ID numbers.


• Overflow Area:-Contains records area that cannot be placed in primary area.
• Index Area:-It contains keys of records and there locations on the disc.

[edit]

Inverted List

In file organization, this is a file that is indexed on many of the attributes of the data
itself. The inverted list method has a single index for each key type. The records are not
necessarily stored in a sequence. They are placed in the are data storage area, but indexes
are updated for the record keys and location.

Here's an example, in a company file, an index could be maintained for all products,
another one might be maintained for product types. Thus, it is faster to search the indexes
than every record. These types of file are also known as "inverted indexes."
Nevertheless, inverted list files use more media space and the storage devices get full
quickly with this type of organization. The benefits are apparent immediately because
searching is fast. However, updating is much slower.

Content-based queries in text retrieval systems use inverted indexes as their preferred
mechanism. Data items in these systems are usually stored compressed which would
normally slow the retrieval process, but the compression algorithm will be chosen to
support this technique.

When querying a file there are certain circumstances when the query is designed to be
modal which means that rules are set which require that different information be held in
the index. Here's an example of this modality: when phrase querying is undertaken, the
particular algorithm requires that offsets to word classifications are held in addition to
document numbers.

[edit]

Direct or Hashed Access


With direct or hashed access a portion of disk space is reserved and a “hashing”
algorithm computes the record address. So there is additional space required for this kind
of file in the store. Records are placed randomly through out the file. Records are
accessed by addresses that specify their disc location. Also, this type of file organization
requires a disk storage rather than tape. It has an excellent search retrieval performance,
but care must be taken to maintain the indexes. If the indexes become corrupt, what is left
may as well go to the bit-bucket, so it is as well to have regular backups of this kind of
file just as it is for all stored valuable data!

[edit]

External File Structure and File Extensions

Microsoft Windows and MS-DOS File Systems


The external structure of a file depends on whether it is being created on a FAT or NTFS
partition. The maximum filename length on a NTFS partition is 256 characters, and 11
characters on FAT (8 character name+"."+3 character extension.) NTFS filenames
keep their case, whereas FAT filenames have no concept of case (but case is ignored
when performing a search under NTFS Operating System). Also, there is the new VFAT
which permits 256 character filenames.

UNIX and Apple Macintosh File Systems


The concept of directories and files is fundamental to the UNIX operating system. On
Microsoft Windows-based operating systems, directories are depicted as folders and
moving about is accomplished by clicking on the different icons. In UNIX, the directories
are arranged as a hierarchy with the root directory being at the top of the tree. The root
directory is always depicted as /. Within the / directory, there are subdirectories (e.g.: etc
and sys). Files can be written to any directory depending on the permissions. Files can be
readable, writable and/or executable.

[edit]

Organizing files using Libraries

With the advent of Microsoft Windows 7 the concept of file organization and
management has improved drastically by way of use of powerful tool called Libraries. A
Library is file organization system to bring together related files and folders stored in
different locations of the local as well as network computer such that these can be
accessed centrally through a single access point. For instance, various images stored in
different folders in the local computer or/and across a computer network can be
accumulated in an Image Library. Aggregation of similar files can be manipulated, sorted
or accessed conveniently as and when required through a single access point on a
computer desktop by use of a Library. This feature is particularly very useful for
accessing similar content of related content, and also, for managing projects using related
and common data.

You might also like