Data Processing
Data Processing
Introduction
Data Processing Cycle
o Data Collection
o Data Input
o Processing
o Output
Description of Errors in Data Processing
o Transcription Errors
o Computational Errors
o Algorithm or Logical Errors
Data Integrity
o Accuracy
o Timeliness
o Relevance
o Threat to Data Integrity
Data Processing Methods.
o Manual Data Processing
o Mechanical Data Processing
o Electronic Data Processing
Computer Files
o Elements of a Computer File
o Logical and Physical Files
Types of Computer Processing Files
o Master File
o Transaction (Movement) File
o Reference File
o Backup File
o Report File
o Sort File
File Organization Methods
o Sequential File Organisation
o Random or Direct File Organisation
o Serial File Organisation
o Indexed-sequential File Organisation Method
Electronic Data Processing Modes
o On-line Processing
o Real-time Processing
o Distributed Data Processing
o Time-sharing
o Batch Processing
o Multiprocessing
o Multiprogramming
o Interactive Processing
Introduction
Data refers to the raw facts that do not have much meaning to the user and may include
numbers, letters, symbols, sound or images.
Information, on the other hand, refers to the meaningful output obtained after processing the data.
Therefore the data processing refers to the process of transforming raw data into meaningful
output i.e. information.
Data processing can be done manually using pen and paper, mechanically using simple devices
like typewriters or electronically using modem data processing tools such as computers.
Electronic data processing has become so popular that manual and mechanical methods are
being pushed to obsolescence.
Data Collection
Data collection is also referred to as data gathering or fact-finding.
It involves looking for crucial facts needed for processing.
Processing
This is the transformation of input data by the central processing unit (CPU) to a more meaningful
output (information).
Some of the operations performed on data include calculations, comparing values and sorting.
Output
The final activity in data processing cycle is producing the desired output also referred to as
information.
The information can then be distributed to the target group or stored for future use.
Distribution is making the information available to those who need it and is sometimes called
information dissemination.
This process of dissemination may involve electronic presentation over radio or television,
distribution of hard copies, broadcasting messages over the Internet or mobile phones etc.
Transcription Errors
Transcription errors occur during data entry. Such errors include misreading and transposition
errors.
Misreading errors
o Incorrect reading of the source document by the user and hence entering wrong values
bring about misreading errors. For example, a user may misread a hand written figure
such as 589 and type S86 instead i.e. confusing 5 for S.
Transposition errors
Computational Errors
Computational errors occur when an arithmetic operation does not produce the expected results.
The most common computation errors include overflow, truncation and rounding errors.
Overflow errors
An overflow occurs if the result from a calculation is too large to be stored in the allocated
memory space. For example if a byte is represented using 8 bits, an overflow will occur if the
result of a calculation gives a 9-bit number.
Truncation errors
Truncation errors result from having real numbers that have a long fractional part that cannot fit in
the allocated memory space. The computer would truncate or cut off the extra characters from the
fractional part. For example, a number like 0.784969 can be truncated to four digits to become
0.784. The resulting number is not rounded off.
Rounding errors
Rounding errors results from raising or lowering a digit in a real number to the required rounded
number. For example, to round off 30 666 to one decimal place, we raise the first digit after the
decimal point if its successor is more than 5. In this case, the successor is 6 therefore 30.666
rounded up to one decimal place is 30.7. If the successor is below 5, e.g. 30.635, we round down
the number to 30.6.
Data Integrity
Data integrity refers to the accuracy and completeness of data entered in a computer or received
from the information system. Integrity is measured in terms of accuracy, timeliness and relevance
of data.
Accuracy
Accuracy refers to how close an approximation is to an actual value. As long as the correct
instructions and data are entered, computers produce accurate results efficiently. In numbers, the
accuracy of a real number depends on the number. For example 72.1264 is more accurate than
72.13.
Timeliness
Timeliness of data and information is important because data and information have a time value
attached to them. If received late, information may have become meaningless to the user. For
example, information on the newspaper that is meant to invite people for a meeting or occasion
must be printed prior to the event and not later.
Relevance
Data entered into the computer must be relevant in order to get the expected output. In this case,
relevance means that the data entered must be pertinent to the processing needs at hand and
must meet the requirements of the processing cycle. The user also needs relevant information for
daily operations or decision making.
Computer Files
A file can be defined as a collection of related records that give a complete set of information
about a certain item or entity. A file can be stored manually in a file cabinet or electronically in
computer storage devices. Computerized storage offers a much better way of holding information
than the manual filing systems, which heavily rely on the concept of the file cabinet.
Some of the advantages of computerized filing system include:
1. Information takes up much less space than the manual filing.
2. It is much easier to update or modify information.
3. It offers faster access and retrieval of data.
4. It enhances data integrity and reduces duplication.
Logical files
A logical file is a type of file viewed in terms of what data items it contains and details of what
processing operations may be performed on the data items. It does not have implementation
specific information like field, data types, size and file type. Logical files are discussed in system
design later in the book.
Physical files
As opposed to a logical file, a physical file is one that is viewed in terms of how data is stored on
a storage media and how the processing operations are made possible. Physical files have
implementation specific details such as characters per field and data type for each field. Physical
files are discussed later in system implementation and operation in this book.
Master File
A master file is the main file that contains relatively permanent records about particular items or
entries. For example a customer file will contain details of a customer such as customer ID, name
and contact address.
Transaction (Movement) File
A transaction file is used to hold input data during transaction processing. The file is later used to
update the master file and audit daily, weekly or monthly transactions. For example in a
busy supermarket, daily sales are recorded on a transaction file and later used to update the
stock file. The file is also used by the management to check on the daily or periodic transactions.
Reference File
A reference file is mainly used for reference or look-up purposes. Lookup information is that
information which is stored in a separate file but is required during processing. For example, in a
point of sale terminal, the item code entered either manually or using a bar code reader looks up
the item description and price from a reference file stored on a storage device.
Backup File
A backup file is used to hold copies (backups) of data or information from the computers fixed
storage (hard disk). Since a file held on the hard disk may be corrupted, lost or changed
accidentally, it is necessary to keep copies of the recently updated files. In case of the hard disk
failure, a backup file can be used to reconstruct the original file.
Report File
A report file is used to store relatively permanent records extracted from the master file or
generated after processing. For example you may obtain a stock levels report generated from an
inventory system while a copy of the report will be stored in the report file.
Sort File
A sort file is mainly used where data is to be processed sequentially. In sequential processing,
data or records are first sorted and held on a magnetic tape before updating the maste file.
On-line Processing
In online data processing data is processed immediately it is received the computer is connected
directly to the data input unit via a communication link. The data input may be a network terminal
or an online input device attached to the computer.
Real-time Processing
In a real-time data processing, computer processes the incom111g data as soon as it occurs, up-
dates the transaction file and gives an immediate response that would affect the events as they
happen. This is different from online in that for the latter an immediate response may not be
required. The main purpose of a real-time processing is to provide accurate, up-to-date
information hence better services based on a true (real) situation. An example of real-time
processing is making a reservation for airline seats. A customer may request for an airline
booking information through a remote terminal and the requested information will be given out
within no time by the reservation system. If a booking is made, the system immediately
updates the reservations file to avoid double booking and sends the response back to the
customer immediately.
Distributed Data Processing
Distributed data processing refers to dividing (distributing) processing tasks to two or more
computers that are located on physically separate sites but connected by data transmission
media. For example, a distributed database will have different tables of the same database
residing on separate computers and processed there as need arises. The users of the distributed
database will be completely unaware of the distribution and will interact with the database as if all
of it was on their computer.. This distribution of processing power increases efficiency and speed
of processing. An example is in the banking industry where customers' accounts are operated on
servers in the branches but all the branch accounts can be administered centrally from the main
server as if they resided on it. In this case, we say that the distributed database is transparent to
the user because the distribution is hidden from the user's point of view.
Time-sharing
In a time-sharing processing, many terminals connected to a central computer are given access
to the central processing unit apparently at the same time. However in actual sense, each user is
allocated a time slice of the CPU in sequence. The amount of time allocated to each user is
controlled by a multi-user operating system. If a user's task is not completed during the allocated
time slice, he/she is allocated another time slice later in a round robin manner.
Batch Processing
In batch processing, data is accumulated as a group (batch) over a specified period of time e.g.
daily, weekly or monthly. The batch is then processed at once. For example in a payroll
processing system, employees' details concerning number of hours worked, rate of pay, and
other details are collected for a period of time, say one month. These details are then used to
process the payment for the duration worked. Most printing systems use the batch processing to
print documents.
Multiprocessing
Multiprocessing refers to the processing of more than one task at the same time on different
processors of the same computer. This is possible in computers such as mainframes and network
servers. In such systems, a computer may contain more than one independent central processing
unit, which works together in a coordinated way. At a given time, the processors may execute
instructions from two or more different programs or from different parts of one program
simultaneously. This coordination is made possible by a multiprocessing operating system that
enables different processors to operate together and share the same memory.
Multiprogramming
Multiprogramming, also referred to as multi-tasking refers to a type of processing where more
than one programs are executed apparently at the same time by a single central processing unit.
It is important to note that, as opposed to multiprocessing. In multiprogramming, a computer has
only one central processing unit. The operating system allocates each program a time slice
and decides what order they will be executed. This scheduling is done so quickly that the user
gets the impression that all programs are being executed at the same time.
Interactive Processing
In interactive data processing, there is continuous dialogue between the user and the computer.
As the program executes, it keeps on prompting the user to provide input or respond to prompts
displayed on the screen.