INFORMATION PROCESSING FUNDAMENTALS Handout by Paul Gordon
INFORMATION PROCESSING FUNDAMENTALS Handout by Paul Gordon
Data is a set of raw facts and figures that a computer processes by following a set of instructions
called a program, while information is the processed data which is meaningful and useful.
For example, a data input sheet might contain dates as raw data in many forms: "31st January
1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, this raw data may be
processed and stored as a single format.
This single format is referred to as information which is an organized collection, storage, and
presentation system of data and other knowledge for decision making, progress reporting, and for
planning and evaluation of programs.
The term information processing refers to the collection, storing, interpretation and
retrieving of data. Depending on the data inputted a particular output is provided. Many
of the devices we use today involves the processing and interpretation of a particular
input (data) such as an electric kettle where once the water is boiled at a desired
temperature the sensor will activate a switch to have the kettle turn off. The use of an
ATM/ABM, where based on your input you can either make a deposit, do a withdrawal,
top up your phone with credit etc.
Sources of Data
Source documents
Data that is stored in a particular database or information system will need to be accurate, up to
date structured in a way that makes it possible to search for specific data and stored on a suitable
storage medium.
A source document is any document where its content (data that has been captured) is keyed in
by an employee namely a data entry clerk into a computer system. Data can be ascertained by
two means namely: by machine or human readable documents. When the necessary data has
been entered on a form for example, a questionnaire, it is normally keyed into a computer system
for future use and update.
2. Electronic Sources
3. Personal Contacts
1. Print Materials
1.1. Books
1.1.1. Reference Books
(a) Encyclopedias
(b) Dictionaries
(c) Directories
1.1.2. Textbooks
1.1.3. General Fiction and Non-Fiction Books
1.2. Periodicals
1.2.1. Journals
1.2.2. Magazines
1.2.3. Newspapers
1.3. Pamphlets
2. Electronic Sources
2.1. Local Sources
2.1.1. CD ROMs
2.1.2. Electronic Databases
2.2. Remote Sources
2.2.1. Online Databases
2.2.2. World Wide Web
2.2.3. Digital Libraries
3. Personal Contacts
3.1. Word of Mouth
3.2. Contact by mail (incl. e-mail)
3.3. Interviews
3.4. Observations
Document Types
These are documents that are normally filled out by humans and can be read
by humans. This document is usually built and structured to facilitate the
filling out of data by hand. Acquiring data by this means can prove to be
challenging for several reasons. The person filling out the document may
misunderstand the questions asked, their hand writing may be difficult to
read and understand and there is also the possibility of someone leaving out
some sections of the document.
3
Some of these problems could be alleviated by instructing the individual to write using capital
letters, have a series of boxes place on the document to allow for the separation of letters or
numbers. Such documents can be seen at the bank for making cash deposits, where you have to
write the account number in subsequent boxes or when you fill out the form to collect money
from a Western Union agency.
Turnaround Documents
Forms of document:
4
1. Hard Copy
A hard copy is a printed copy of information from a computer. Sometimes referred to as
a printout, a hard copy is so-called because it exists as a physical object.
2. Soft Copy
A soft copy is a document saved on a computer. It is the electronic version of a document, which
can be opened and edited using a software program. The term soft copy is most often used in
contrast to hard copy, which is the printed version of a document.
Information received from either electronic or manual sources must be evaluated and verified for
its authenticity, currency, relevance and bias, which determines its quality. The degree of
confidence you have based on the credibility of an information source will ultimately determine
whether you use information from that source in decision making.
When determining which information sources should be used and whether the information found
is appropriate to use, it is important to consider:
task requirements
quality of the content
Various criteria can be used to evaluate the quality of what has been found. The following are
some of the characteristics of information sources:
Reliability – the consistency with which the information is accurate. Information that is
consistently correct is reliable.
Comprehensibility – the ability of the data to be made useful. This type of data one can make
sense of.
Timeliness – the availability of information to users in time to make relevant decisions. Data and
information have lifespan during which they are useful. At the end of the lifespan, the
information is no longer useful in decision making. The lifespan involves generation (or
collection) of data, transformation of data into information and reporting information to the user
in time for appropriate decision making.
Security - the accessibility of data by authorized users and the prevention of any unauthorized
users from accessing the data. The security of information determines its availability to users for
problem solving and decision making.
Confidentiality – the availability of data only to a very restricted set of users. It must not be
viewed by anyone for whom it is not intended. The data must be secure to ensure its
confidentiality.
Value – the usefulness of information to facilitate problem solving and decision making and to
enable the organization to gain advantage over its competitors. Since information has its value it
is a saleable commodity and it is shareable. It may be presented on different media and in a
variety of formats.
Distortion – the presentation of data to induce a particular presentation. Data may also be
disguised in order to discourage certain interpretations.
Before we examine the different methods of verification and validation of data, we need to
examine some errors that may occur during the entry of data into a computer system or the
sending of data.
Transmission Errors
Example
yyyyyoooo is transferred as x&*`^$yyoo
Typographical Errors
These are errors made typically by humans when we are typing data. This can also be said to be
an accidental error (errors that are not made on purpose). For example, typing in a wrong date of
birth.
Example 2
I forgot my password to myf acebook account.
6
Typographical error
Transposition Errors
These are errors made when numbers or characters are placed in the wrong order. An example
of this could be when we are typing a date of birth for someone who was born on the 12th of
September 1998 and you typed 09/12/98 instead of the 12/09/98.
Example 2
I swa my favourite movie today.
Transpositional error
Some errors can also be considered to be deliberate where errors are made by humans
intentionally for personal gain or just to create disruption. For example, someone may falsify a
document to gain acceptance in an institution or for a scholarship.
There are two ways of preventing errors made by humans and they are data verification and
data validation. Data verification is a process carried out by humans, whereas data validation is
an automatic process carried out by software.
Data Verification
The errors we examined in the previous lesson would warrant the need for data verification.
Data verification is the process of checking for errors that might have been entered in the
computer from a source document or when data is copied from one medium or device to another.
Two methods of data verification are double entry and proof reading/visual checks.
The double entry method is the process of entering data more than
once using a program that checks each second entry against the
first. If the data entered is not the same it will not get processed and
the system will allow for the re-entry of data to ensure the data
entered is accurate. An example of this process would be when you
are required to enter your password twice when setting up your
email to confirm your password.
Proofreading on the other hand checks the data entered against the data on the original source
document. This method can be time consuming as it requires the user to read the information
from the source document and checks it against what was entered in the system.
7
Visual checks utilize on-screen prompts. When a set of data is entered is redisplayed on the
screen. The user is prompted to read it and give a confirmation that the data entered is correct. If
the data is incorrect the data is re-entered.
Data Validation
Data validation employs several ways of checking for the accuracy and completeness of data.
Let us examine the different method you can use to validate data.
Range checks
Range check ensures that the data entered is within a particular range. Examples of such a
check would be data pertaining to the number corresponding to the months of a year and not
exceeding the number of hours in a day.
Reasonableness checks
Reasonableness checks ensure that data is reasonable, that is, the data entered is realistic. For
example, a student enrolled in first form with a particular date of birth, his or her age when
calculated by the computer system should correspond (say age 11). Thus, a child who is
seventeen years old cannot be linked to a date of birth younger.
8
Inconsistency checks
Consistency/inconsistency checks compare data you have entered against other data you have
entered. If you enter a person’s year of birth and their age in separate fields, a consistency check
will ensure that the two fields correspond with each other. Thus if the age of an individual is not
in line with his/her date of birth then this would be considered to be inconsistent.
Presence checks
This check ensures that required data is always present. For example, if in a database
information is stored on a set of employees and each employee must have an id number. A
presence check will ensure that the id field is not left blank. On the other hand, there are cases
where some fields in a database may be optional, for example, not everybody may have a house
number but they have a cell phone. So, the field which stores a customer house number may be
left blank.
9
Format Check - A format check is a validation check which ensures that entered data is in a
particular format. The format that data must be in is specified using an input mask. The input
mask is made up of special characters which indicate what characters may be typed where.
Length check - most databases will automatically perform length checks on any entered data.
The length check is a validation check which ensures that the data entered is no longer than a
specified maximum number of characters.
File organization and access relates to the use of records, fields and files stored in a database.
You would have been exposed to all three terms when you did the productivity tool: database in
class.
The manner in which the records of the file are organized on a secondary storage device
(file organization)
The manner in which records are accessed
This is where records are stored in a logical or sorted order. Records can be arranged according
to name, date, size or any other field in ascending order.
10 20 30 40 50
This is similar to sequential file ordering, except the records are not stored in any particular order
(unordered). They are simply stored one after the other as they are added, similar to new items
on a shopping list where as you go along you add what you need. This type of ordering is often
used to capture transactions as they occur during the day.
50 20 44 60 15
Sequential access means that records are accessed one by one in the order they are stored until
the right one is located. Serial access is similar as you can access the records in the same
manner in which they were stored.
Files are stored in any order using a key. The file is organized like a one-dimensional array
where each array element has an index/subscript to mark its location. Random access or direct
access allows you to access the record you want without having to go through any others unlike
sequential access. The computer locates the data item using the indices.
By using both methods (sequential and direct), the following can be done: you can go through
each record sequentially (one after each other) and you can access a specific file directly (there is
no need to go through any previous data (random)).