CSC 216 - File Organization and Data Processing
CSC 216 - File Organization and Data Processing
FILE ORGANIZATION
AND
DATA PROCESSING
1 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
CSC 216
File Organization and Data Processing
1.1.1 The term "file organization" refers to the way in which data is stored in a file and,
consequently, the method(s) by which it can be accessed.
File organization refers to the way data is stored in a file. It can also be referred to as the
logical arrangement of data organized in a system of records.
File organization is very important because it determines the methods of access, efficiency,
flexibility and storage devices to use.
File organization addresses four (4) major issues in computing: space management, high
performance, security and memory management.
1.1.2 A file is a collection of data, usually stored on disk. As a logical entity, a file enables
you to divide your data into meaningful groups.
A File, by definition can also be referred to as a sequence of records. Files contain computer
records which can be documents or information which is stored in a certain way for later
retrieval.
Files stored on magnetic media can be organised in a number of ways, just as in a manual
system. There are advantages and disadvantages to each type of file organisation, and the
method chosen will depend on several factors such as:
• how the file is to be used;
• how many records are processed each time the file is updated;
• whether individual records need to be quickly accessible.
2 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
Files have three important characteristics that has a link with the following:
1. Whether the file is permanent or temporary
2. The way the records are organised – sequential, or serial
3. Method of access or location – sequential or direct ace
1.1.4 A field, by definition is a single data item, and many fields make up a record. Each
field has a name and one key field called the primary key which is used to identify the
record.
1.1.5 A data file, by definition is a collection of records holding the same type of
information but about different objects or individuals.
3 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
Master file contains data of a permanent nature. The value can change during transaction,
but it stores the main information. The master file contains two types of data: (• Permanent
data such as personal files, payroll data, employee status (contract, permanent or
temporary) and job title., and • Less permanent data such as taxes deducted, hours worked,
bonuses received).
This is a temporary working file which is used to update the master file after a certain time;
usually at the end of the day or at the end of the week.
A primary key is normally used to identify the record you want to update or delete. The
primary key is usually a field in the record whose value is unique to that record. Examples
of primary keys include StudentID, Passport Number, account number. It means that if you
do not have the primary key, you cannot update or delete any record.
1.2.3 Back-up File:
The back-up files are copies of Master and Transaction files held for security purposes.
Usually held on tape and kept in secure locations.
4 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
1.3 Types of File Organization/ Memory Access Methods:
There are a large number of methods in which records can be organized on disk or tape. The
main methods of file organization used for files are: Serial, Sequential, Indexed Sequential
and Random (or Direct) methods.
Records received are stored in the next available storage position. Serial organization is
usually the method used for creating Transaction files (unsorted), work and dump files. In
general, it is only used on a serial medium such as magnetic tape.
5 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
Examples of serial files
1. Unsorted invoices for customers
2. Collection of student marks
3. Shopping list
6 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
Examples of sequential files
1. Invoices for customers sorted on customer number
2. Class registers sorted on last name
Advantages:
1. Simple to understand
2. Easier to organize & maintain
3. Economical
4. Error in files remain localized
Disadvantages:
1. Transactions must be sorted in a particular sequence before processing
2. Time consuming when searching
3. High data redundancy
4. Random enquiries are not possible to handle
7 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
(iii). Limit Index: This index groups the records (keys) and only provides the location of
the highest key in the group. Generally, they form a hierarchical index. Data records
are blocked before being written to disk. An index may consist of the highest key in
each block, (or on each track). See the example in the block in table 3.
A0037
A0038 Block 2
A0053
A0064
A0073 Block 3
A0075
In the above example, data records are shown as being three (3) to a block. The index, then,
holds the key of the highest record in each block. (An essential element of the index, which
has been omitted from the diagram for simplicity, is the physical address of the block of
data records). If we wish to access record 5, whose key is A0038, we can quickly determine
from the index that the record is held in block 2, since this key is greater than the highest
8 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
key in block 1. By way of the index, we can go directly to the record we wish to retrieve,
DATA AREA
INDEX
Emp # Name
1001 3
1 1004 Adam Brown
1002 5 2 1005 Alao Damola
3 1001 Alan Dickens
1003 4
4 1003 Olotu Bimbo
5 1002 Jane Hengis
1004 1
7 1011 Ray Bross
1005 2
1006 6
Advantages:
Disadvantages:
(d) Direct Access File Organization (Random Access or relative file organization):
Random (or Direct).
A randomly organized file contains records arranged physically without regard to the
sequence of the primary key. Records are loaded to disk by establishing a direct relationship
between the Key of the record and its address on the file, normally by use of a formula (or
9 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
algorithm) that converts the primary Key to a physical disk address. This relationship is
also used for retrieval. The use of a formula (or algorithm) is known as 'Key
Transformation' and there are several techniques that can be used: such as Hashing and
Radix Conversion. These methods are often mixed to produce a unique address (or location)
for each record (key). The production of the same address for two different records is
known as a synonym.
- Files are viewed as numbered sequence of blocks or records for direct access.
- These blocks or records are taken as key for accessing the desired information randomly.
Advantages:
1. Immediate Access of the desired records.
2. No sorting of the records is required.
3. Faster updating of several files.
4. Helps in online transaction processing system like online reservation systems.
Disadvantages:
1. Data may be accidentally erased or over-written unless special precautions are taken
2. Backup facility is needed
3. Expensive- hard disks are needed to store the records; it is expensive
4. Less efficient as compared to sequential file organization in the use of storage space
5. Only one key is used.
6. Cannot be accessed sequentially.
10 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
1.4 FILE ORGANIZATION AND ACCESS METHODS
Each file organisation can be accessed or processed in different ways, often combing the
advantages of one organisation with the advantages of another.
1.5 Assignment
You can use Access database, Excel Worksheet or Microsoft Word to do this
11 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
1.6 Basic Physical Characteristics of I/O and Auxiliary Storage Device
I/O devices are the pieces of hardware used by a human (or other system) to communicate
with a computer. For instance, a keyboard or computer mouse is an input device for a
computer, while monitors and printers are output devices. Devices for communication
between computers, such as modems and network cards, typically perform both input and
output operations.
In addition to your hard drive(s), you almost certainly will want to install other drives on
your homebuilt computer. I'm calling these "auxiliary drives" to distinguish them from the
hard drive.
Definition of auxiliary drives include any drives or drive-like devices other than the system
drive or additional hard drives or SSD drives. Basically, any device that is not strictly
12 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
needed to build a computer, and that can read, write, and/or store data, would be included
in that definition. A few of the more popular examples are:
In theory, optical drives include CD-ROM drives, CD-RW drives ("burners"), DVD-ROM
drives, DVD+/-RW drives, and Blu-Ray (BD-ROM) drives. They use lasers to read and/or
write data.
In practice, most optical drives available today combine many features and the ability to
read and write to different types of media including writeable CD's and DVD's. Top-of-the-
line optical drives can even play and write to Blu-ray disks. I doubt any manufacturers even
make plain CD drives anymore.
For quite some time, optical drives were considered a necessity because they were the only
practical way to install an operating system on a homebuilt computer. Operating systems
came on CD's or DVD's; so even if you never used the optical drive again, you needed it that
one time to install the OS.
Nowadays, that's not so true anymore. Most operating systems can now be obtained as
digital downloads and "burned" to a USB flash drive for installation. Most software is also
available by digital download. Consequently, many computers all over the world have
optical drives that literally have never been used.
13 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
As optical drives decline in popularity, card readers are taking their place -- quite literally.
Most internal card readers are sized to fit in a standard optical drive bay.
Although commonly called "readers," all card readers can both read and write. Most are
capable of reading from and writing to all popular flash media formats. Most also have USB
ports. Some have FireWire ports or eSATA ports, and a few can even read and write to SIM
cards used in mobile phones.
One important decision to make when buying a card reader is the type of internal interface
it uses. Most use USB 3.0, many use SATA, and a few use PCI-e. All three interfaces have
good speed, so which one you choose mainly depends on how many headers or slots you
have available on your motherboard. If all of your USB 3.0 headers will be used by the
computer's front-panel USB ports, then you may want to choose SATA. If all of your SATA
ports will be used by other drives, you can use PCI-e.
14 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
The venerable floppy drive has seen better days. Most new computers don't even come with
floppy drives any more, and many new motherboards don't even have headers for them
anymore. They are officially and unquestionably obsolete.
ZIP drives are magnetic drives whose disks can hold 100 MB, 250 MB, or 750 MB of data,
depending on the drive model.
When CD-writers first came out, many predicted the rapid demise of ZIP drives. But the
ZIP format defied the odds for years and remained a popular removable storage medium.
They were much more convenient to use than early CD burners and had a much lower
failure rate than optical drives .
Though the ZIP drive is not writeable CD's, nor writeable DVD's. But the ZIP format still
had the upper hand over optical drives in terms of convenience and device reliability. They
remained a popular format with photographers, in particular, until rather recently.
15 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
As with floppy drives, I can think of no-good reason to install a ZIP drive in your computer
unless you have a need to access data stored on ZIP disks.
Tape drives use the same interfaces as any other drives. Most internal tape drives use SATA
nowadays. External ones usually use USB 3.x, FireWire, or eSata.
A hierarchical file system is how drives, folders, files, and other storage devices are
organized and displayed on an operating system. In a hierarchical file system, the drives,
folders, and files are displayed in groups, which allows the user to see only the files they're
interested in seeing.
16 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
MODULE TWO
DATA PROCESSING
2.1 Definition of Data Processing
Data processing is defined as collection, manipulation, and processing of collected data for
the required use. It is a technique normally performed by a computer; the process includes
retrieving, transforming, or classification of information.
Data processing can also be simply defined as the conversion of raw data to meaningful
information through a process or the conversion of data into usable and desired form. This
conversion or “processing” is carried out using a predefined sequence of operations either
manually or automatically. Most of the data processing is done by using computers and
thus done automatically.
From this Data is defined as a raw fact, while information is from a processed data
The output or “processed” data can be obtained in different forms like image, graph, table,
vector file, audio, charts or any other desired format depending on the software or method
of data processing used.
Technical skills
Time constraints
17 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
2.2 Stages of Data Processing
Data processing is undertaken by any activity which requires a collection of data. This data
collected needs to be stored, sorted, processed, analyzed and presented. This complete
process can be divided into 6 simple primary stages which are:
1. Data collection
2. Data Preparation or transformation
3. Input
4. Processing of data
5. Output and Interpretation
6. Storage
1) Data Collection is the first stage of the cycle, and is very crucial, since the quality of data
collected will impact heavily on the output. The collection process needs to ensure that the
data gathered are both defined and accurate, so that subsequent decisions based on the
findings are valid. This stage provides both the baseline from which to measure, and a target
on what to improve.
2) Preparation is the manipulation of data into a form suitable for further analysis and
processing. Raw data cannot be processed and must be checked for accuracy. Preparation is
about constructing a data set from one or more data sources to be used for further
exploration and processing. Analyzing data that has not been carefully screened for
problems can produce highly misleading results that are heavily dependent on the quality of
data prepared.
3) Input is the task where verified data is coded or converted into machine readable form so
that it can be processed through an application. Data entry is done through the use of a
keyboard, scanner, or data entry from an existing source. This time-consuming process
requires speed and accuracy. Most data need to follow a formal and strict syntax since a
18 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
great deal of processing power is required to breakdown the complex data at this stage. Due
to the costs, many businesses are resorting to outsource this stage.
4) Processing is when the data is subjected to various means and methods of powerful
technical manipulations using Machine Learning and Artificial Intelligence algorithms to
generate an output or interpretation about the data. The process may be made up of multiple
threads of execution that simultaneously execute instructions, depending on the type of data.
There are applications like Anvesh available for processing large volumes of heterogeneous
data within very short periods.
5) Output and interpretation is the stage where processed information is now transmitted
and displayed to the user. Output is presented to users in various report formats like
graphical reports, audio, video, or document viewers. Output needs to be interpreted so that
it can provide meaningful information that will guide future decisions of the company.
6) Storage is the last stage in the data processing cycle, where data, and metadata
(information about data) are held for future use. The importance of this cycle is that it allows
quick access and retrieval of the processed information, allowing it to be passed on to the
next stage directly, when needed. Anvesh use special security and safety standards to store
data for future use.
The Data Processing Cycle is a series of steps carried out to extract useful information from
raw data. Although each step must be taken in order, the order is cyclic. The output and
storage stage can lead to the repeat of the data collection stage, resulting in another cycle of
data processing.
The cycle provides a view on how the data travels and transforms from collection to
interpretation, and ultimately, used in effective business decisions.
19 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
2.3 Methods of Data Processing
1. Manual Data Processing: In this method data is processed manually without the use
of a machine, tool or electronic device. Data is processed manually, and all the
calculations and logical operations are performed manually on the data.
2. Mechanical Data Processing – Data processing is done by use of a mechanical device
or very simple electronic devices like calculator and typewriters. When the need for
processing is simple, this method can be adopted.
3. Electronic Data Processing – This is the modern technique to process data. The
fastest and best available method with the highest reliability and accuracy. The
technology used is latest as this method used computers and employed in most of the
agencies. The use of software forms the part of this type of data processing. The data is
processed through a computer; Data and set of instructions are given to the computer
as input, and the computer automatically processes the data according to the given set
of instructions. The computer is also known as electronic data processing machine.
1. Batch Processing
2. Real-time processing
3. Online Processing
4. Multiprocessing
5. Time-sharing
These are systems designed to deal with dynamic situations in order to control a critical
operation such as an airline reservation system which must be continually updated as
events occur. Seat reservation in flight operations require communication-oriented Server
computers supported by network of terminals or PCs serving as clients. These facilitate
response to enquiries on seat reservations and ensure that the master file is updated as soon
as transactions are completed. The systems ensure that enquires on available seat are
responded to instantaneously and prevents double, or overbooking of seat in the aircraft.
This is a method of data processing method whereby data about a single transaction is
processed immediately it is captured. This method of processing allows transactions to be
entered directly to the system via terminals, PCs or workstations as they take place thereby
updating the master file immediately as the transactions occur. The point of entry may be
remote from the location at which update his made. For example, when you withdraw cash
from an automated teller machine, your withdrawal is instantly processed and your account
balance updated. Other online processing systems include:
• Visa processing
• Result checking
• Banking (account enquiry)
21 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
• Air seat reservation
• Application processing
• Examination
Because of the prevalent of PC in data processing we rarely hear the term online
processing. The term client / server computing is more popular where the PC is the client.
This technique provides facility to store and execute more than one program in the Central
Processing Unit (CPU) simultaneously. Further, the multiple programming technique increases
the overall working efficiency of the respective computer.
This is another form of online data processing that facilitates several users to share the resources
of an online computer system. This technique is adopted when results are needed swiftly.
Moreover, as the name suggests, this system is time based.
This is a specialized data processing technique in which various computers (which are
located remotely) remain interconnected with a single host computer making a network of
computer.
Many organizations that were used to centralized systems for data processing are now able
to adopt Distributed data processing because of advances in computing technologies. A
22 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
centralized system consists of a central multi-user computer (usually mainframe) which
hosts all components of a data processing system. The users interact with this host via
terminals or PCs serving as client, but virtually all of the actual processing and work is done
on the host computer. All the devices in the centralized system such as PCs, terminals,
network devices, and printer converge on one central computer, even though the users
many works from remote locations via terminals.
All processing and storage take place at the central location. On the other hand, a
distributed system allows the components of data processing system to be made available at
multiple locations in a computer network. Which means that, the processing workload
required for supporting the components is also distributed across multiple computers on
the network. In addition, the computers, storage devices, and even some computer
personnel may need to be distributed to separate locations throughout the organization for
the efficiency of the system. Distributed data processing allows data processing and storage
to occur at several locations in the computer system. There are advantages and
disadvantaged associated with adopting distributed data processing in an organization.
And these are as follow:
23 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)
All these computer systems remain interconnected with a high speed communication network. This
facilitates in the communication between computers. However, the central computer system
maintains the master data base and monitors accordingly.
24 | LECTURE NOTES ON CSC 216 – FILE ORGANIZATION & DATA PROCESSING – Fola Aranuwa (Ph.D)