TOPIC THREE-File System
TOPIC THREE-File System
DEFINITIONS
File = An organised collection of RELATED records. e.g. ALL the records of students in a college.
Record = ALL the details about a particular person or thing. e.g. details of one student.
Field = One indivisible item within a record. e.g. Date of Birth
Organisation = The way in which records are held on the file. This will dictate HOW the data can
be accessed.
Access = The method of retrieving a record from file. This is limited by the method of organisation.
In trying to locate a record on file, the record can either be found and read into memory or a
positive statement that the record is not present is output – because the user keys in the wrong
access details or perhaps another clerk has not yet entered the relevant data.
Key field = The fi eld on a file normally used to order that file. Student number might be used to
ensure the records on the student file are in student number order.
Block = Records on all media are grouped in blocks (sometimes called buckets). A single record
is NOT read from the file into memory but the whole block is read. This speeds up file access
because the slowest part of disc reading is locating where a record is. By reading a block, the next
record is also in memory and this is often the next needed.
Hit-rate = This is the proportion of records needing changing compared with the total number of
records on the file. If this is high, serial or sequential organisation and access could be used. If
low, then perhaps indexed-sequential or random organisation is needed.
Master file = This is a main/semi-permanent file holding all the details about a particular business
area. e.g. All employee records would be held on an employee file and separate from the customer
records on the customer file. Much of the data stays the same for a long time. Changes might be
added to the master file from the transaction file.
Transaction file = This is a temporary file used to collect changes until they are updated onto the
master file. This type of file is only used in batch processing. The file would first have to be sorted
so that a new file is produced which is in the same.
Batch Processing = Transaction data is collected as above over a period of time and then perhaps
once per week month, the master file is updated. Apart from keeping the file for a short while for
security purposes, the data is no longer needed. (See below)
File Update = By Copying – All changes to the master file are achieved by creating a new file,
changing those records that need changing but copying the others across, unchanged
(Serial/Sequential organisation) By Overlaying – Changes are implemented immediately by
overwriting existing records (Indexed- Sequential/Random organisation)
WHY FILES?
Data cannot be retained in main memory of a computer because RAM is only temporary and too
small. Using ROM would prevent the data being changed. Hence, secondary storage systems were
developed to store data not currently needed in the computer. If a computer is currently running a
retail system, there would be no need for employee details to be available until a payroll run was
required. Data on file has to be organized.
FILE ORGANISATION
SERIAL (SER)
Records stored one after another in the order that they are entered into the computer. There are
no gaps between the records. The only order is therefore chronological - Customers do not
conveniently place orders in customer number order nor do they order in product number order.
Reading the file – Records can only be read in the order they are on file starting with the first.
To find a particular record, it could be necessary to read the whole file if the required record is
the last one.
Adding a new record to/Amending a record/Deleting a record from an existing file –
Generally these processes require a new file to be created which is an exact copy of the
original but with the one change. Records can be added to the end of a disc file but with
tape, a new file is created.
Uses – Transaction data as it is received (processed LATER) – it may have to be sorted before
processing. SMALL reference files which can be read quickly from the beginning every time it is
accessed. e.g. VAT rates, payment levels for staff .
Media – ALL types of storage media permit Serial file processing.
SEQUENTIAL (SEQ)
Records are stored one after another with no gaps and in KEY FIELD order.
Reading the file – The key fi eld value is entered into the computer. The file is read from the
beginning until the required record is located or until either a key-field value on file is higher than
the required one OR end of file is reached. In either case, the record is not present and should be
reported.
Adding a new record to/Amending a record/Deleting a record from an existing file –
Generally these processes require a new file to be created which is an exact copy of the
original but with the one change and the record placed in its correct key fi eld order.
Uses – Transaction data as it is received (processed LATER) – it may have to be sorted before
processing. SMALL master files which can be read quickly from the beginning every time it is
accessed.
Uses – Reference/Master files where instant access is not needed and batch processing is
suitable.
Media – ALL types of storage media permit Sequential file processing.
Reading the whole file – The file is treated as sequential and follows the same process as above.
Reading the one record - The key field value is entered into the computer. The small INDEX file
is read sequentially until a key value is found which is equal to or greater than the required record.
This then identifies the block where the record is stored. The whole block is read into memory and
searched to find the required record. With very large files, there could be several levels of index
so that the first indicates the cylinder in which the record should appear and the second the block
within that cylinder. See hard disk (below).
Adding a new record to/Amending a record/Deleting a record from an existing file – The
single record is located as for reading above, the change is made in memory and then written back
to the file overwriting the original. Deleted records can be removed from the block to free up space
for other record that should be there including those moved into the overflow area.
Uses – Reference/Master files where instant access is needed but also where the whole file might
be needed for certain processes. i.e. find a customer record but also report on the while file for
customers who have not paid their bills or are to be targeted for a sales promotion.
This is ideal for online access perhaps in telephone ordering system.
Media – Only disc-based media because of the need to overwrite existing records.
RANDOM (R)
Random organization does not mean records are stored randomly - implying they could be
anywhere on the file. The file “appears” to store records randomly. The key field is used to identify
where the record is stored. The process is:
Key field -> algorithm - >block number
So, a mathematical process is applied to the key and the resulting number that comes out gives the
block number. The size of the available storage space usually dictates the formula. If 1000 blocks
of disc space are available, a simple algorithm could be DIVIDE key value by 1000 and use the
remainder is used as the block address. This remainder would therefore be in the range 0 to 999.
It follows that consecutive key values would be located in different blocks and helps to spread the
records over the fi le. An overflow area is again needed where some bunching occurs and this
would be totally unpredictable. Tags are used as with Indexed Sequential files.
Reading the whole file – This organization is not suited to applications where the whole fi le
would need to be read because of the apparent spread of the records because there would be no
order in the records accessed.
Reading one record - The key field value is entered into the algorithm, the block address
calculated and the record then read as for an indexed sequential block read.
Adding a new record to/Amending a record/Deleting a record from an existing fi le – The
single record is located using the algorithm as above, the change is made in memory and then
written back to the fi le overwriting the original.
Deleted records can be removed from the block to free up space for other records that should be
there but are located in the overflow area.
Uses – Reference/Master fi les where instant access is needed but where reading the whole file is
NOT appropriate. This is also ideal for online access perhaps in telephone ordering system.
Media – Only disc-based media because of the need to overwrite existing records.
Parity Check
A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are used as the
simplest form of error detecting code. Parity bits are generally applied to the smallest units of a
communication protocol, typically 8-bit octets (bytes), although they can also be applied
separately to an entire message string of bits.
The parity bit ensures that the total number of 1-bits in the string is even or odd.[1] Accordingly,
there are two variants of parity bits: even parity bit and odd parity bit. In the case of even parity,
for a given set of bits, the occurrences of bits whose value is 1 are counted. If that count is odd,
the parity bit value is set to 1, making the total count of occurrences of 1s in the whole set
(including the parity bit) an even number. If the count of 1s in a given set of bits is already even,
the parity bit's value is 0. In the case of odd parity, the coding is reversed. For a given set of bits,
if the count of bits with a value of 1 is even, the parity bit value is set to 1 making the total count
of 1s in the whole set (including the parity bit) an odd number. If the count of bits with a value of
1 is odd, the count is already odd so the parity bit's value is 0. Even parity is a special case of
a cyclic redundancy check (CRC), where the 1-bit CRC is generated by the polynomial x+1.
If a bit is present at a point otherwise dedicated to a parity bit but is not used for parity, it may be
referred to as a mark parity bit if the parity bit is always 1, or a space parity bit if the bit is always
0. In such cases where the value of the bit is constant, it may be called a stick parity bit even though
its function has nothing to do with parity. The function of such bits varies with the system design,
but examples of functions for such bits include timing management or identification of a packet as
being of data or address significance. If its actual bit value is irrelevant to its function, the bit
amounts to a don't-care term.
Error detection
If an odd number of bits (including the parity bit) are transmitted incorrectly, the parity bit will be
incorrect, thus indicating that a parity error occurred in the transmission. The parity bit is only
suitable for detecting errors; it cannot correct any errors, as there is no way to determine which
particular bit is corrupted. The data must be discarded entirely, and re-transmitted from scratch.
On a noisy transmission medium, successful transmission can therefore take a long time, or even
never occur. However, parity has the advantage that it uses only a single bit and requires only a
number of XOR gates to generate. See Hamming code for an example of an error-correcting code.
Parity bit checking is used occasionally for transmitting ASCII characters, which have 7 bits,
leaving the 8th bit as a parity bit.
For example, the parity bit can be computed as follows. Assume Alice and Bob are communicating
and Alice wants to send Bob the simple 4-bit message 1001.
This mechanism enables the detection of single bit errors, because if one bit gets flipped due to
line noise, there will be an incorrect number of ones in the received data. In the two examples
above, Bob's calculated parity value matches the parity bit in its received value, indicating there
are no single bit errors. Consider the following example with a transmission error in the second bit
using XOR:
...TRANSMISSION ERROR...
Error in the second bit
Bob receives: 11010
There is a limitation to parity schemes. A parity bit is only guaranteed to detect an odd number of
bit errors. If an even number of bits have errors, the parity bit records the correct number of ones,
even though the data is corrupt. (See also error detection and correction.) Consider the same
example as before with an even number of corrupted bits:
Bob observes even parity, as expected, thereby failing to catch the two bit errors.
Concept of master and transaction file
Master File:
Transaction File:
Integration: Transaction files are used to update master files. Each record in the
transaction file corresponds to a change in the master file.
Processing: An update program reads the transaction file and applies its changes to the
master file, ensuring data accuracy and integrity.
Purpose: Master files maintain stable data, while transaction files track ongoing activities
and changes.
Example:
How it works
1. A transaction file is created that records all transactions, such as sales, over a period of
time.
4. The transaction file is cleared once it has been used to update the master file.
Types of transactions
2. If the transaction file key is less than the master file key, the transaction is added to the
new master file.
3. If the transaction file key is equal to the master file key, the master file record is changed
or deleted.
4. If the transaction file key is greater than the master file key, the old master file record is
written to the new master file.
Diagram of a master file update
1. Fields
Fields are the individual pieces of data within a file. These could be:
o Text (e.g., names, addresses)
o Numbers (e.g., account balances, ages)
o Dates (e.g., birthdates, timestamps)
o Other data types specific to the file’s purpose
2. Data Types
Data Types define the nature of the data. Common data types include:
o String/Text
o Integer
o Floating-point (decimal numbers)
o Date/Time
o Boolean (true/false)
3. Sizes
Sizes refer to the amount of space each field occupies. This can vary based on:
o The type of data (e.g., text fields can vary in length)
o File format and system specifications
4. Purpose of the File
Ensuring the security of your data files is essential in our digital age. Here are some key practices
to help keep your files secure:
Ensure your passwords are complex and unique. Use a combination of letters, numbers,
and special characters.
Avoid using easily guessable passwords like "123456" or "password."
3. Regular Backups
Regularly back up your data to an external hard drive or a secure cloud service.
This ensures you have a copy of your data in case of loss or theft.
Install reputable antivirus software to protect your system from malware and viruses.
Keep your antivirus software up to date.
Avoid accessing sensitive data over public Wi-Fi networks, as they are often insecure.
Use a Virtual Private Network (VPN) to encrypt your internet connection if you must use
public Wi-Fi.
Regularly check for any unusual activity or unauthorized access to your data.
Set up alerts for suspicious activities if your system supports it.
Certainly! Different types of files serve various purposes and have distinct characteristics. Here’s
a breakdown:
1. Program Files
2. Data Files
4. Parameter Files