Data Processing
Data Processing
Introduction
Data refers to the raw facts that do not have much meaning to the user and may
include numbers, letters, symbols, sound or images.
Information refers to the meaningful output obtained after processing the data.
Data processing therefore refers to the process of transforming raw data into
meaningful output i.e., information.
Data processing can be done manually using pen and paper. Mechanically using
simple devices like typewriters or electronically using modern data processing tools
such as computers.
Data processing cycle
It refers to the sequence of activities involved in data transformation from its
row form to information. it is often referred to as cycle because the output
obtained can be stored after processing and may be used in future as input.
The four main stages of data processing cycle are:
Data collection
Data input
Data processing
Data output
1. Data collection
Also referred to as data gathering or fact finding, it involves looking for crucial
facts needed for processing.
Methods of data collection
Include interviews; use of questionnaires, observation, etc.in most cases the
data is collected after sampling.
Sampling is the process of selecting representative elements (e.g., people,
organizations) from an entire group (population) of interest. Some of the tools
that help in the data collection include source documents such as forms, data
capture devices such as digital camera etc.
Stages of data collection
The process of data collection may involve a number of stages depending on
the method used. These include:
o Data creation: this is the process of identification and putting together
facts in an organized format. This may be in the form of manually
prepared document or captured from the source using a data capture
device such as a barcode reader and be inputted easily in a computer
o Data preparation: this is the transcription (conversion) of data from
source document to machine readable form. This may not be the case
for all input devices. Data collected using devices that directly capture
data in digital form do not require transcription.
o Data transmission: this will depend on whether data need to be
transmitted via communication media to the central office.
1. Data input:
o Refers to the process where the collected data is converted from
human readable from to machine readable form (binary form). The
conversion takes place in the input device.
o Media conversion: data may need to be transmitted from one medium
to another e.g., from a floppy disk to a computer’s hard disk for faster
input.
o Input validation: data entered into the computer is subjected to validity
checks by a computer program before being processed to reduce errors
as the input.
o Sorting: in case the data needs to be arranged in a predefined order, it
is first sorted before processing.
2. Processing
o This is the transformation of the input data by the CPU to a more
meaningful output (information). Some of the operations performed on
the data include calculations, comparing values and sorting.
3. Output
o The final activity in the data processing cycle is producing the desired
output also referred to as information. This information can be
distributed to the target group or stored for future use. Distribution is
making information available to those who need it and is sometimes
called information dissemination. This process of dissemination may
involve electronic presentation over the radio or television, distribution
of hard copies, broadcasting messages over the internet or mobile
phones etc.
Description of errors in data processing
1. Computational errors
o Occurs when an arithmetic operation does not produce the expected
results. The most common computation errors include overflow,
truncation and rounding
o Overflow errors
Occurs if the result from a calculation is too large to be stored in
the allocated memory space. For example, if a byte is
represented using 8 bits, an overflow will occur if the result of a
calculation gives a 9-bit number.
o Truncation errors
Result from having real numbers that have a long fractional part
which cannot fit in the allocated memory space. The computer
would truncate or cut off the extra characters from the fractional
part. For example, a number like 0.784969 can be truncated to
four digits to become 0.784
The accuracy of the computer output is critical. As the saying goes garbage in
garbage out (GIGO), the accuracy of the data entered in the computer directly
determines the accuracy of the information given out.
Some of the errors that influence the accuracy of data input and information
output include
Transcription,
Computation and
Algorithm or logical errors.
2.Transcription errors
Occurs during data entry. Such errors include misreading and transposition
errors
Misreading errors
Are brought about by the incorrect reading of the source by the user and
hence entering wrong values. For example, a user may misread a handwritten
figure such as 589 and type S89 instead i.e., confusing 5 for S.
Transposition errors
Results from incorrect arrangement of characters i.e., putting characters in
the wrong order. For example, the user might enter 396 instead of 369.
These errors may be avoided by using modern capture devices such as bar
code readers, digital cameras etc. which enter data with the minimum user
intervention.
Rounding errors
Results from raising or lowering a digit in a real number to the required
rounded number. for example, to round off 30.666 to one decimal place we
raise the first digit after the decimal point if its successor is more than or
equal to five. In this case the successor is 6 therefore 30.666 rounded up to
one decimal place is 30.7. if the successor is below five,e.g.30.635,we round
down the number to 30.6
1. Algorithm or logical errors
o An algorithm is a set of procedural steps followed to solve a given
problem. Algorithms are used as design tools when writing programs.
Wrongly designed programs would result in a program that runs but
gives erroneous output. Such errors that result from wrong algorithm
design are referred to as algorithm or logical errors.
Data integrity
Data integrity refers to the accuracy and completeness of data entered in a
computer or received from the information system. Integrity is measured in
terms of accuracy, timeliness and relevance of data.
Accuracy
It refers to how close an approximation is to an actual value. As long as the
correct instructions and data are entered, computers produce accurate results
efficiently. In numbers, the accuracy of a real number depends on the
number. For example, 72.1264 is more accurate than 72.13.
Timeliness
This is the relative accuracy of data in respect to the current state of affairs
for which it is needed.
This is important because data and information have a time value attached to
them. If received late, the information may have become useless to the user.
For example, information in the newspaper that is meant to invite people for a
meeting or occasion must be printed prior to the event and not later.
Relevance
Data entered into the computer must be relevant so as to get the expected
output. In this case, relevance means that the data entered must be pertinent
to the processing needs at hand and must meet the requirements of the
processing cycle. The user also needs relevant information for daily operations
or decision making.
Threat to data integrity
Threats to data integrity can be minimized through the following ways:
Backup data preferably on external storage media.
Control access to data by enforcing security measures.
Design user interfaces that minimize chances of invalid data entry.
Using error detection and correction software when transmitting data
Using devices that directly capture data from the source such as bar code
readers, digital cameras, and optical scanners.
Data processing methods
As mentioned earlier, data can be processed manually, mechanically and
electronically.
1. Manual data processing
In manual data processing, most tasks are done manually with a pen and a
paper. For example, in a busy office, incoming tasks (input) are stacked in the
“tray” (output). The processing of each task involves a person using the brain
in order to respond to queries.
The processed information from the out tray is then distributed to the people
who need it or stored in a file cabinet.
1. Mechanical data processing
Manual is cumbersome and boring especially repetitive tasks. Mechanical
devices were developed to help in automation of manual tasks. Examples of
mechanical devices include the typewriter, printing press, and weaving looms.
Initially, these devices did not have electronic intelligence.
1. Electronic data processing
For a long time, scientists have researched on how to develop machine or
devices that would stimulate some form of human intelligence during data and
information processing. This was made possible to some extent with the
development of electronic programmable devices such as computers.
The advent of microprocessor technology has greatly enhanced data
processing efficiency and capability. Some of the microprocessor-controlled
devices include computers, cellular(mobile) phones, calculators, fuel pumps,
modern television sets, washing machines etc
Computer files
A file can be defined as a collection of related records that give a complete set
of information about a certain item or entity. A file can be stored manually in
a file cabinet or electronically in computer storage devices.
Computerized storage offers a much better way of holding information than
the manual filing system which heavily relies on the concept of the file
cabinet.
Some of the advantages of computerized filing system include:
1. information takes up much less space than the manual filing
2. it is much easier to update or modify information
3. it offers faster access and retrieval of data
4. It enhances data integrity and reduces duplication
5. It enhances security of data if proper care is taken to secure it.
Elements of computer file
A computer file is made up of three elements: characters, fields and records.
Characters
o A character is the smallest element in a computer file and refers to
letter, number or symbol that can be entered, stored and output by a
computer. A character is made up of seven or eight bits depending on
the character coding scheme used.
Field
o A field is a single character or collection of characters that represents a
single piece of data. For example, the student’s admission number is an
example of a field.
Records
o A record is a collection of related fields that Represents a single entity,
e.g., in a class score sheet, detail of each student in a row such as
admission number, name, total marks and position make up a record.
Logical and physical files
Computer files are classified as either physical or logical
Logical files
o A computer file is referred to as logical file if it is viewed in terms of
what data item it contains and details of what processing operations
may be performed on the data items. It does not have implementation
specific information like field, data types, size and file type.
Physical files
o As opposed to a logical file, a physical file is viewed in terms of how
data is stored on a storage media and how the processing operations
are made possible. Physical files have implementation specific details
such as characters per field and data type for each field.
Types of Computer Processing Files
There are numerous types of files used for storing data needed for processing,
reference or back up. The main common types of processing files include
Master files,
Transaction,
Reference,
Backup, report and
Sort file.
1. Master file
A master file is the main that contains relatively permanent records about particular
items or entries. For example, a customer file will contain details of a customer such
as customer ID, name and contact address.
1. Transaction (movement) file
A transaction file is used to hold data during transaction processing. The file is later
used to update the master file and audit daily, weekly or monthly transactions. For
example, in a busy supermarket, daily sales are recorded on a transaction file and
later used to update the stock file. The file is also used by the management to check
on the daily or periodic transactions.
Reference file
A reference file is mainly used for reference or look-up purposes. Look-up
information is that information that is stored in a separate file but is required during
processing. For example, in a point-of-sale terminal, the item code entered either
manually or using a barcode reader looks up the item description and price from a
reference file stored on a storage device.
Backup file
A backup files is used to hold copies (backups) of data or information from the
computers fixed storage (hard disk). Since a file held on the hard disk may be
corrupted, lost or changed accidentally, it is necessary to keep copies of the recently
updated files. In case of the hard disk failure, a backup file can be used to
reconstruct the original file.
Report file
Used to store relatively permanent records extracted from the master file or
generated after processing. For example, you may obtain a stock levels report
generated from an inventory system while a copy of the report will be stored in the
report file.
Sort file
It stores data which is arranged in a particular order.
Used mainly where data is to be processed sequentially. In sequential processing,
data or records are first sorted and held on a magnetic tape before updating the
master file.
File organization methods DBMS
File organization refers to the way data is stored in a file. File organization is
very important because it determines the methods of access, efficiency,
flexibility and storage devices to use. There are four methods of organizing
files on a storage media. This includes:
sequential,
random,
serial and
indexed-sequential