0% found this document useful (0 votes)
9 views

Data Processing

Uploaded by

Enia 14
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Processing

Uploaded by

Enia 14
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

DATA PROCESSING

 Data refers to the raw facts that do not have much meaning to the user and may
include numbers, letters, symbols, sound or images.
 Information refers to the meaningful output obtained after processing the data.
 Data processing therefore refers to the process of transforming raw data into
meaningful output i.e. information.
 Data processing can be done manually using pen and paper. Mechanically using
simple devices like typewriters or electronically using modern data processing tools
such as computers.
Data processing cycle
o It refers to the sequence of activities involved in data transformation from its
row form to information. it is often referred to as cycle because the output
obtained can be stored after processing and may be used in future as input.
o The four main stages of data processing cycle are:
 Data collection
 Data input
 Data processing
 Data output
Data collection
 Also referred to as data gathering or fact finding ,it involves looking for crucial facts
needed for processing.
Methods of data collection
 Include interviews; use of questionnaires, observation, etc.in most cases the data is
collected after sampling.
 Sampling is the process of selecting representative elements (e.g. people,
organizations) from an entire group (population) of interest. Some of the tools that
help in the data collection include source documents such as forms, data capture
devices such as digital camera etc.
Stages of data collection
 The process of data collection may involve a number of stages depending on the
method used. These include:
o Data creation: this is the process of identification and putting together facts
in an organized format. This may be in the form of manually prepared
document or captured from the source using a data capture device such as a
barcode reader and be inputted easily in a computer.
o Data preparation: this is the transcription (conversion) of data from source
document to machine readable form. This may not be the case for all input
devices. Data collected using devices that directly capture data in digital form
do not require transcription.
o Data transmission: this will depend on whether data need to be transmitted
via communication media to the central office.
4. Data input:
o Refers to the process where the collected data is converted from human
readable from to machine readable form (binary form). The conversion takes
place in the input device.
o Media conversion: data may need to be transmitted from one medium to
another e.g. from a floppy disk to a computer’s hard disk for faster input.
o Input validation: data entered into the computer is subjected to validity
checks by a computer program before being processed to reduce errors as
the input.
o Sorting: in case the data needs to be arranged in a predefined order, it is first
sorted before processing.
9. Processing
o This is the transformation of the input data by the CPU to a more meaningful
output (information).Some of the operations performed on the data include
calculations, comparing values and sorting.
11. Output
o The final activity in the data processing cycle is producing the desired output
also referred to as information. This information can be distributed to the
target group or stored for future use. Distribution is making information
available to those who need it and is sometimes called information
dissemination. This process of dissemination may involve electronic
presentation over the radio or television, distribution of hard copies,
broadcasting messages over the internet or mobile phones etc.
o
Description of errors in data processing

Computational errors
Occurs when an arithmetic operation does not produce the expected results. The most
common computation errors include overflow, truncation and rounding

Overflow errors
Occurs if the result from a calculation is too large to be stored in the allocated memory
space. For example if a byte is represented using 8 bits, an overflow will occur if the result of
a calculation gives a 9-bit number.

Truncation errors
Result from having real numbers that have a long fractional part which cannot fit in the
allocated memory space. The computer would truncate or cut off the extra characters from
the fractional part. For example, a number like 0.784969 can be truncated to four digits to
become 0.784

The accuracy of the computer output is critical. As the saying goes garbage in garbage out
(GIGO),the accuracy of the data entered in the computer directly determines the accuracy
of the information given out.
Some of the errors that influence the accuracy of data input and information output
include
Transcription,
Computation and
Algorithm or logical errors.
Transcription errors
Occurs during data entry. Such errors include misreading and transposition errors
Misreading errors
Are brought about by the incorrect reading of the source by the user and hence entering
wrong values. For example a user may misread a handwritten figure such as 589 and type
S89 instead i.e. confusing 5 for S.

Transposition errors
Results form incorrect arrangement of characters i.e. putting characters in the wrong order.
For example the user might enter 396 instead of 369.
These errors may be avoided by using modern capture devices such as bar code readers,
digital cameras etc which enter data with the minimum user intervention.

Rounding errors
Results from raising or lowering a digit in a real number to the required rounded number.
for example, to round off 30.666 to one decimal place we raise the first digit after the
decimal point if its successor is more than or equal to five. In this case the successor is 6
therefore 30.666 rounded up to one decimal place is 30.7.if the successor is below
five,e.g.30.635,we round down the number to 30.6

Algorithm or logical errors


An algorithm is a set of procedural steps followed to solve a given problem. Algorithms are
used as design tools when writing programs. Wrongly designed programs would result in a
program that runs but gives erroneous output. Such errors that result from wrong algorithm
design are referred to as algorithm or logical errors.

Data integrity
Data integrity refers to the accuracy and completeness of data entered in a computer or
received from the information system. Integrity is measured in terms
of accuracy, timeliness and relevance of data.

Accuracy
It refers to how close an approximation is to an actual value. As long as the correct
instructions and data are entered, computers produce accurate results efficiently. In
numbers, the accuracy of a real number depends on the number. For example, 72.1264 is
more accurate than 72.13.
Timeliness
This is the relative accuracy of data in respect to the current state of affairs for which it is
needed.
This is important because data and information have a time value attached to them. If
received late, the information may have become useless to the user. For example,
information in the newspaper that is meant to invite people for a meeting or occasion must
be printed prior to the event and not later.
Relevance
Data entered into the computer must be relevant so as to get the expected output.In this
case, relevance means that the data entered must be pertinent to the processing needs at
hand and must meet the requirements of the processing cycle. The user also needs relevant
information for daily operations or decision making.

Threat to data integrity


 Threats to data integrity can be minimized through the following ways:
 Backup data preferably on external storage media.
 Control access to data by enforcing security measures.
 Design user interfaces that minimize chances of invalid data entry.
 Using error detection and correction software when transmitting data
 Using devices that directly capture data from the source such as bar code readers,
digital cameras, and optical scanners.

Data processing methods


As mentioned earlier, data can be processed manually, mechanically and electronically.
Manual data processing
 In manual data processing, most tasks are done manually with a pen and a paper.
For example in a busy office, incoming tasks (input) are stacked in the “tray”
(output). The processing of each task involves a person using the brain in order to
respond to queries.
 The processed information from the out tray is then distributed to the people who
need it or stored in a file cabinet.
Mechanical data processing
 Manual is cumbersome and boring especially repetitive tasks. Mechanical devices
were developed to help in automation of manual tasks. Examples of mechanical
devices include the typewriter, printing press, and weaving looms. Initially, these
devices did not have electronic intelligence.
Electronic data processing
 For a long time, scientists have researched on how to develop machine or devices
that would stimulate some form of human intelligence during data and information
processing. This was made possible to some extent with the development of
electronic programmable devices such as computers.
 The advent of microprocessor technology has greatly enhanced data processing
efficiency and capability. Some of the micro processor controlled devices include
computers, cellular(mobile) phones, calculators, fuel pumps, modern television sets,
washing machines etc

Computer files
A file can be defined as a collection of related records that give a complete set of
information about a certain item or entity. A file can be stored manually in a file cabinet or
electronically in computer storage devices.
Computerized storage offers a much better way of holding information than the manual
filing system which heavily relies on the concept of the file cabinet.
Some of the advantages of computerized filing system include:
1. information takes up much less space than the manual filing
2. it is much easier to update or modify information
3. it offers faster access and retrieval of data
4. It enhances data integrity and reduces duplication
5. It enhances security of data if proper care is taken to secure it.

Elements of computer file


A computer file is made up of three elements: characters, fields and records.
Characters
 A character is the smallest element in a computer file and refers to letter, number or
symbol that can be entered, stored and output by a computer. A character is made
up of seven or eight bits depending on the character coding scheme used.
Field
 A field is a single character or collection of characters that represents a single piece
of data. For example, the student’s admission number is an example of a field.

Records
 A record is a collection of related fields that Represents a single entities, e.g. in a
class score sheet, detail of each student in a row such as admission number, name,
total marks and position make up a record.
Logical and physical files
Computer files are classified as either physical or logical
Logical files
 A computer file is referred to as logical file if it is viewed in terms of what data item it
contains and details of what processing operations may be performed on the data
items. It does not have implementation specific information like field, data types,
size and file type.
Physical files
 As opposed to a logical file, a physical file is viewed in terms of how data is stored on
a storage media and how the processing operations are made possible. Physical files
have implementation specific details such as characters per field and data type for
each field
Types of Computer Processing Files
There are numerous types of files used for storing data needed for processing, reference or
back up. The main common types of processing files include
 Master files,
 Transaction,
 Reference,
 Backup, report and
 Sort file.
Master file
A master file is the main that contains relatively permanent records about particular items
or entries. For example a customer file will contain details of a customer such as customer
ID, name and contact address.
Transaction (movement) file
A transaction file is used to hold data during transaction processing. The file is later used to
update the master file and audit daily, weekly or monthly transactions. For example in a
busy supermarket, daily sales are recorded on a transaction file and later used to update the
stock file. The file is also used by the management to check on the daily or periodic
transactions.
Reference file
A reference file is mainly used for reference or look-up purposes. Look-up information is
that information that is stored in a separate file but is required during processing. For
example, in a point of sale terminal, the item code entered either manually or using a
barcode reader looks up the item description and price from a reference file stored on a
storage device.
Backup file
A backup files is used to hold copies (backups) of data or information from the computers
fixed storage (hard disk). Since a file held on the hard disk may be corrupted, lost or
changed accidentally, it is necessary to keep copies of the recently updated files. Incase of
the hard disk failure, a backup file can be used to reconstruct the original file.
Report file
Used to store relatively permanent records extracted from the master file or generated after
processing. For example you may obtain a stock levels report generated from an inventory
system while a copy of the report will be stored in the report file.
Sort file
It stores data which is arranged in a particular order.
Used mainly where data is to be processed sequentially. In sequential processing, data or
records are first sorted and held on a magnetic tape before updating the master file.
File organization methods
File organization refers to the way data is stored in a file. File organization is very important
because it determines the methods of access, efficiency, flexibility and storage devices to
use. There are four methods of organizing files on a storage media. This include:
 sequential,
 random,
 serial and
 indexed-sequential
Sequential file organization
 Records are stored and accessed in a particular order sorted using a key field.
 Retrieval requires searching sequentially through the entire file record by record to
the end.
 Because the record in a file are sorted in a particular order, better file searching
methods like the binary search technique can be used to reduce the time used for
searching a file .
 Since the records are sorted, it is possible to know in which half of the file a
particular record being searched is located, Hence this method repeatedly divides
the set of records in the file into two halves and searches only the half on which the
records is found.
 For example, of the file has records with key fields 20, 30, 40, 50, 60 and the
computer is searching for a record with key field 50, it starts at 40 upwards in its
search, ignoring the first half of the set.
Advantages of sequential file organization
 The sorting makes it easy to access records.
 The binary chop technique can be used to reduce record search time by as much as
half the time taken.
Disadvantages of sequential file organization
 The sorting does not remove the need to access other records as the search looks for
particular records.
 Sequential records cannot support modern technologies that require fast access to
stored records.
 The requirement that all records be of the same size is sometimes difficult to
enforce.
Random or direct file organization
 Records are stored randomly but accessed directly.
 To access a file stored randomly, a record key is used to determine where a record is
stored on the storage media.
 Magnetic and optical disks allow data to be stored and accessed randomly.
Advantages of random file access
 Quick retrieval of records.
 The records can be of different sizes.

Serial file organization


 Records in a file are stored and accessed one after another.
 The records are not stored in any way on the storage medium this type of
organization is mainly used on magnetic tapes.
Advantages of serial file organization
 It is simple
 It is cheap
Disadvantages of serial file organization
 It is cumbersome to access because you have to access all proceeding records before
retrieving the one being searched.
 Wastage of space on medium in form of inter-record gap.
 It cannot support modern high speed requirements for quick record access.

Indexed-sequential file organization method


 Almost similar to sequential method only that, an index is used to enable the
computer to locate individual records on the storage media. For example, on
a magnetic drum, records are stored sequential on the tracks. However, each record
is assigned an index that can be used to access it directly.
Electronic data processing methods
There are several ways in which a computer, under the influences of an operating system is
designed to process data.
Examples of processing modes are:
Online processing
 In online data processing data is processed immediately it is received. The computer
is connected directly to the data input unit via a communication link. The data input
may be a network terminal or online input devices attached to the computer.
Real-time processing
 Computer processes the incoming data as soon as it occurs, updates the transaction
file and gives an immediate response that would affect the events as they happen.
 This is different from online in that for the latter an immediate response may not be
required.
 The main purpose of a real-time processing is to provide accurate, up-to-date
information hence better services based on a true (real) situation.
 An example of a real time processing is making a reservation for airline seats. A
customer may request for an airline booking information through a remote terminal
and the requested information will be given out within no time by the reservation
system. If a booking is made, the system immediately updates the reservation file to
avoid double booking and sends the response back to the customer immediately.

Distributed data processing


 Multiprogramming, also referred to as multitasking refers to a type of processing
where more than one program are executed apparently at the same time by a single
central processing unit.Distributed data processing refers to dividing (distributing)
processing tasks to two or more computer that are located on physically separate
sites but connected by data transmission media.
MultiprogrammingFor example, a distributed database will have different tables of the
same database residing on separate computers and processed there as need arises.
This distribution of processing power increases efficiency and speed of processing. An
example is in the banking industry where customers’ accounts are operated on servers in
the branches but all the branch accounts can be administered centrally from the main
server as if they resided on it. In this case, we say that the distributed database is
transparent to the user because the distribution is hidden from the user’s point of view.

Time sharing
In a time sharing processing, many terminals connected to a central computer are given
access to the central processing unit apparently at the same time. However in actual sense,
each user is allocated a time slice of the CPU in sequence. The amount of time allocated to
each user is controlled by a multi-user operating system. If a user’s task is not completed
during the allocated time slice, he/she is allocated another time slice later in a round robin

Batch processing
Data is accumulated as a group (batch) over a specified period of time e.g. daily, weekly or
monthly. The batch is then processed at once.
For example in payroll processing system, employees details concerning the number of
hours worked, rate of pay, and other details are collected for a period of time say, one
month. These details are then used to process the payment for the duration worked. Most
printing systems use the batch processing to print documents.

Multiprocessing
Refers to the processing of more than one task at the same time on different processors of
the same computer. This is possible in computers such as mainframes and network servers.
In such systems a computer may contain more than one independent central processing
unit which works together in a coordinated way.
At a given time, the processors may execute instructions from two or more different
programs or from different parts of one program simultaneously.
This coordination is made possible by a multi-processing operating system that enables
different processors to operate together and share the same memoryThe users of the
distributed database will be completely unaware of the distribution and will interact with
the database as if all of it was in their computer

You might also like