0% found this document useful (0 votes)
5 views

DATAPROCESSINGlecturenotes

The document provides an overview of data processing, defining data as raw, unorganized facts and information as processed data that carries meaning. It outlines the differences between data and information, types of data in statistics, and the data processing cycle, which includes steps such as collection, preparation, input, processing, output, and storage. Additionally, it discusses various data processing methods and emphasizes the importance of analytics and cloud computing in the future of data processing.

Uploaded by

parassinghal055
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

DATAPROCESSINGlecturenotes

The document provides an overview of data processing, defining data as raw, unorganized facts and information as processed data that carries meaning. It outlines the differences between data and information, types of data in statistics, and the data processing cycle, which includes steps such as collection, preparation, input, processing, output, and storage. Additionally, it discusses various data processing methods and emphasizes the importance of analytics and cloud computing in the future of data processing.

Uploaded by

parassinghal055
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

DATA PROCESSING

Compiled by
Professor Olukunmi 'Lanre OLAITAN

D.PHE.(Lag); B.Sc.(Ed.); M.Ed.; Ph.D. (Ilorin); RN (NMCN.); FICC; FNRCS; MARHP


(Minn, USA); MUNSCN(Geneva); MTRCN (Nig.); ECD; EONC; YSRH; FGM/C; GSRH
(USAID/JHU, MD, USA); CIH (UK); CE&IE (Finland)
(Professor, Human Sexuality, Reproductive & Family Health Specialist, including Human
Fertility, HIV/AIDS, Public Health Education & Promotion, Nutrition Education, Nursing,
Women and Adolescent Health, Wellness & Fitness Health Consultant)
University of Ilorin, Ilorin, Nigeria
Email: [email protected] OR: [email protected] Tel: +234(0)8034228042

Introduction

What is Data?
Data is a raw and unorganized fact that required to be processed to make it meaningful. Data can
be simple at the same time unorganized unless it is organized. Generally, data comprises facts,
observations, perceptions numbers, characters, symbols, image, etc.
Data is an individual unit that contains raw materials which do not carry any specific
meaning. Information is a group of data that collectively carries a logical meaning.
Data is the name given to basic facts and entities like names and numbers. Five examples of data
includes:

 weights
 prices and costs
 numbers of items sold
 employee names
 product names.

Differences between data and information


Data is a raw and unorganized fact that required to be processed to make it meaningful. Data can
be simple at the same time unorganized unless it is organized. Generally, data comprises facts,
observations, perceptions numbers, characters, symbols, image, etc.

Data is always interpreted, by a human or machine, to derive meaning. So, data is meaningless.
Data contains numbers, statements, and characters in a raw form.

1
Whereas, information is a set of data which is processed in a meaningful way according to the
given requirement. Information is processed, structured, or presented in a given context to make
it meaningful and useful.

It is processed data which includes data that possess context, relevance, and purpose. It also
involves manipulation of raw data.

Information assigns meaning and improves the reliability of the data. It helps to ensure
undesirability and reduces uncertainty. So, when the data is transformed into information, it
never has any useless details.

Examples of information include:

 transaction processing systems


 decision support systems
 knowledge management systems
 learning management systems
 database management systems

KEY DIFFERENCE

1. Data is a raw and unorganized fact that is required to be processed to make it meaningful
whereas Information is a set of data that is processed in a meaningful way according to
the given requirement.
2. Data does not have any specific purpose whereas Information carries a meaning that has
been assigned by interpreting data.
3. Data alone has no significance while Information is significant by itself.
4. Data never depends on Information while Information is dependent on Data.

2
5. Data measured in bits and bytes, on the other hand, Information is measured in
meaningful units like time, quantity, etc.
6. Data can be structured, tabular data, graph, data tree whereas Information is language,
ideas, and thoughts based on the given data.

Data Vs. Information


Parameters Data Information
Qualitative Or Quantitative Variables
It is a group of data which carries news and
Description which helps to develop ideas or
meaning.
conclusions.
Data comes from a Latin word, datum, Information word has old French and middle
which means “To give something.” English origins. It has referred to the “act of
Etymology
Over a time “data” has become the informing.”. It is mostly used for education or
plural of datum. other known communication.
Data is in the form of numbers, letters,
Format Ideas and inferences
or a set of characters.
It can be structured, tabular data, Language, ideas, and thoughts based on the given
Represented in
graph, data tree, etc. data.
Data does not have any specific It carries meaning that has been assigned by
Meaning
purpose. interpreting data.
Interrelation Information that is collected Information that is processed.
Data is a single unit and is raw. It Information is the product and group of data
Feature
alone doesn’t have any meaning. which jointly carry a logical meaning.
Dependence It never depends on Information It depended on Data.
Measured in meaningful units like time, quantity,
Measuring unit Measured in bits and bytes.
etc.
Support for
It can’t be used for decision making It is widely used for decision making.
Decision making
Contains Unprocessed raw factors Processed in a meaningful way
Knowledge level It is low-level knowledge. It is the second level of knowledge.
Data is the property of an organization
Characteristic and is not available for sale to the Information is available for sale to the public.
public.
Data depends upon the sources for
Dependency Information depends upon data.
collecting data.
Sales report by region and venue. It gives
Example Ticket sales on a band on tour. information which venue is profitable for that
business.
Significance Data alone has no signifiance. Information is significant by itself.
Data is based on records and
Information is considered more reliable than data.
observations and, which are stored in
Meaning It helps the researcher to conduct a proper
computers or remembered by a
analysis.
person.

3
Parameters Data Information
The data collected by the researcher, Information is useful and valuable as it is readily
Usefulness
may or may not be useful. available to the researcher for use.
Information is always specific to the requirements
Data is never designed to the specific and expectations because all the irrelevant facts
Dependency
need of the user. and figures are removed, during the
transformation process.

DIKW (Data Information Knowledge Wisdom)


DIKW is the model used for discussion of data, information, knowledge, wisdom and their
interrelationships. It represents structural or functional relationships between data, information,
knowledge, and wisdom.

Example:

Types of Data in Statistics


There are different types of data in Statistics that are collected, analysed, interpreted and
presented. The data are the individual pieces of factual information recorded, and it is used for
the purpose of the analysis process. The two processes of data analysis are interpretation and
presentation. Statistics are the result of data analysis. Data classification and data handling are
important processes as it involves a multitude of tags and labels to define the data, its integrity
and confidentiality. In this article, we are going to discuss the different types of data in statistics
in detail.

4
The data is classified into majorly four categories:

 Nominal data
 Ordinal data
 Discrete data
 Continuous data
Further, we can classify these data as follows:

Let us discuss the different types of data in Statistics herewith examples.

Qualitative or Categorical Data


Qualitative data, also known as the categorical data, describes the data that fits into the
categories. Qualitative data are not numerical. The categorical information involves categorical
variables that describe the features such as a person’s gender, home town etc. Categorical
measures are defined in terms of natural language specifications, but not in terms of numbers.

Sometimes categorical data can hold numerical values (quantitative value), but those values do
not have a mathematical sense. Examples of the categorical data are birthdate, favourite sport,
school postcode. Here, the birthdate and school postcode hold the quantitative value, but it does
not give numerical meaning.

Nominal Data
Nominal data is one of the types of qualitative information which helps to label the variables
without providing the numerical value. Nominal data is also called the nominal scale. It cannot
be ordered and measured. But sometimes, the data can be qualitative and quantitative. Examples
of nominal data are letters, symbols, words, gender etc.

The nominal data are examined using the grouping method. In this method, the data are grouped
into categories, and then the frequency or the percentage of the data can be calculated. These
data are visually represented using the pie charts.

5
Ordinal Data
Ordinal data/variable is a type of data that follows a natural order. The significant feature of the
nominal data is that the difference between the data values is not determined. This variable is
mostly found in surveys, finance, economics, questionnaires, and so on.

The ordinal data is commonly represented using a bar chart. These data are investigated and
interpreted through many visualisation tools. The information may be expressed using tables in
which each row in the table shows the distinct category.

Quantitative or Numerical Data


Quantitative data is also known as numerical data which represents the numerical value (i.e.,
how much, how often, how many). Numerical data gives information about the quantities of a
specific thing. Some examples of numerical data are height, length, size, weight, and so on. The
quantitative data can be classified into two different types based on the data sets. The two
different classifications of numerical data are discrete data and continuous data.

Discrete Data
Discrete data can take only discrete values. Discrete information contains only a finite number of
possible values. Those values cannot be subdivided meaningfully. Here, things can be counted in
whole numbers.

Example: Number of students in the class

Continuous Data
Continuous data is data that can be calculated. It has an infinite number of probable values that
can be selected within a given specific range.

Example: Temperature range

Data Processing
Data processing occurs when data is collected and translated into usable information. Usually
performed by a data scientist or team of data scientists, it is important for data processing to be
done correctly as not to negatively affect the end product, or data output.

Data in its raw form is not useful to any organization. Data processing is the method of collecting
raw data and translating it into usable information. It is usually performed in a step-by-step
process by a team of data scientists and data engineers in an organization. The raw data is
collected, filtered, sorted, processed, analyzed, stored, and then presented in a readable format.

6
Data processing is essential for organizations to create better business strategies and increase
their competitive edge. By converting the data into readable formats like graphs, charts, and
documents, employees throughout the organization can understand and use the data.

Now that we’ve established what we mean by data processing, let’s examine the data processing
cycle.

The Data Processing Cycle

The data processing cycle consists of a series of steps where raw data (input) is fed into a system
to produce actionable insights (output). Each step is taken in a specific order, but the entire
process is repeated in a cyclic manner. The first data processing cycle's output can be stored and
fed as the input for the next cycle, as the illustration below shows us.

Fig: Data processing cycle

Generally, there are six main steps in the data processing cycle:

Step 1: Collection

The collection of raw data is the first step of the data processing cycle. The type of raw data
collected has a huge impact on the output produced. Hence, raw data should be gathered from
defined and accurate sources so that the subsequent findings are valid and usable. Raw data can
include monetary figures, website cookies, profit/loss statements of a company, user behavior,
etc.

7
Step 2: Preparation

Data preparation or data cleaning is the process of sorting and filtering the raw data to remove
unnecessary and inaccurate data. Raw data is checked for errors, duplication, miscalculations or
missing data, and transformed into a suitable form for further analysis and processing. This is
done to ensure that only the highest quality data is fed into the processing unit.

The purpose of this step to remove bad data (redundant, incomplete, or incorrect data) so as to
begin assembling high-quality information so that it can be used in the best possible way
for business intelligence.

Step 3: Input

In this step, the raw data is converted into machine readable form and fed into the processing
unit. This can be in the form of data entry through a keyboard, scanner or any other input source.

Step 4: Data Processing

In this step, the raw data is subjected to various data processing methods using machine learning
and artificial intelligence algorithms to generate a desirable output. This step may vary slightly
from process to process depending on the source of data being processed (data lakes, online
databases, connected devices, etc.) and the intended use of the output.

Step 5: Output

The data is finally transmitted and displayed to the user in a readable form like graphs, tables,
vector files, audio, video, documents, etc. This output can be stored and further processed in the
next data processing cycle.

Step 6: Storage

The last step of the data processing cycle is storage, where data and metadata are stored for
further use. This allows for quick access and retrieval of information whenever needed, and also
allows it to be used as input in the next data processing cycle directly.

Types of Data Processing

There are different types of data processing based on the source of data and the steps taken by
the processing unit to generate an output. There is no one-size-fits-all method that can be used
for processing raw data.

Type Uses

8
Data is collected and processed in batches. Used for large
amounts of data.
Batch Processing
Eg: payroll system

Data is processed within seconds when the input is given. Used


for small amounts of data.
Real-time Processing
Eg: withdrawing money from ATM

Data is automatically fed into the CPU as soon as it becomes


available. Used for continuous processing of data.
Online Processing
Eg: barcode scanning

Data is broken down into frames and processed using two or


more CPUs within a single computer system. Also known as
Multiprocessing parallel processing.

Eg: weather forecasting

Allocates computer resources and data in time slots to several


Time-sharing
users simultaneously.

Data Processing Methods

There are three main data processing methods - manual, mechanical and electronic.

9
Manual Data Processing

This data processing method is handled manually. The entire process of data collection, filtering,
sorting, calculation, and other logical operations are all done with human intervention and
without the use of any other electronic device or automation software. It is a low-cost method
and requires little to no tools, but produces high errors, high labor costs, and lots of time and
tedium.

Mechanical Data Processing

Data is processed mechanically through the use of devices and machines. These can include
simple devices such as calculators, typewriters, printing press, etc. Simple data processing
operations can be achieved with this method. It has much lesser errors than manual data
processing, but the increase of data has made this method more complex and difficult.

Electronic Data Processing

Data is processed with modern technologies using data processing software and programs. A set
of instructions is given to the software to process the data and yield output. This method is the
most expensive but provides the fastest processing speeds with the highest reliability and
accuracy of output.

Examples of Data Processing

Data processing occurs in our daily lives whether we may be aware of it or not. Here are some
real-life examples of data processing:

 A stock trading software that converts millions of stock data into a simple graph
 An e-commerce company uses the search history of customers to recommend similar products
 A digital marketing company uses demographic data of people to strategize location-specific
campaigns
 A self-driving car uses real-time data from sensors to detect if there are pedestrians and other
cars on the road

Moving From Data Processing to Analytics


If we had to pick one thing that stands out at the most significant game-changer in today’s
business world, it’s big data. Although it involves handling a staggering amount of information,
the rewards are undeniable. That’s why companies that want to stay competitive in the 21st-
century marketplace need an effective data processing strategy.

10
Analytics, the process of finding, interpreting, and communicating meaningful patterns in data, is
the next logical step after data processing. Whereas data processing changes data from one form
to another, analytics takes those newly processed forms and makes sense of them.

But no matter which of these processes data scientists are using, the sheer volume of data and the
analysis of its processed forms require greater storage and access capabilities, which leads us to
the next section!

The Future of Data Processing

The future of data processing can best be summed up in one short phrase: cloud computing.

While the six steps of data processing remain immutable, cloud technology has provided
spectacular advances in data processing technology that has given data analysts and scientists the
fastest, most advanced, cost-effective, and most efficient data processing methods today.

The cloud lets companies blend their platforms into one centralized system that’s easy to work
with and adapt. Cloud technology allows seamless integration of new upgrades and updates to
legacy systems while offering organizations immense scalability.

Cloud platforms are also affordable and serve as a great equalizer between large organizations
and smaller companies.

So, the same IT innovations that created big data and its associated challenges have also
provided the solution. The cloud can handle the huge workloads that are characteristic of big data
operations.

Data processing essentially involves Collection, manipulation, and processing collected data for
the required use is known as data processing. It is a technique normally performed by a
computer; the process includes retrieving, transforming, or classification of information.
However, the processing of data largely depends on the following −
 The volume of data that need to be processed
 The complexity of data processing operations
 Capacity and inbuilt technology of respective computer system
 Technical skills
 Time constraints

Methods of Data Processing

Let us now discuss the different methods of data processing.


 Single user programming
 Multiple programming
 Real-time processing

11
 On-line processing
 Time sharing processing
 Distributed processing

Single User Programming


It is usually done by a single person for his personal use. This technique is suitable even for
small offices.
Multiple Programming
This technique provides facility to store and execute more than one program in the Central
Processing Unit (CPU) simultaneously. Further, the multiple programming technique increases
the overall working efficiency of the respective computer.
Real-time Processing
This technique facilitates the user to have direct contact with the computer system. This
technique eases data processing. This technique is also known as the direct mode or the
interactive mode technique and is developed exclusively to perform one task. It is a sort of online
processing, which always remains under execution.
On-line Processing
This technique facilitates the entry and execution of data directly; so, it does not store or
accumulate first and then process. The technique is developed in such a way that reduces the data
entry errors, as it validates data at various points and also ensures that only corrected data is
entered. This technique is widely used for online applications.
Time-sharing Processing
This is another form of online data processing that facilitates several users to share the resources
of an online computer system. This technique is adopted when results are needed swiftly.
Moreover, as the name suggests, this system is time based.
Following are some of the major advantages of time-sharing processing −
 Several users can be served simultaneously
 All the users have almost equal amount of processing time
 There is possibility of interaction with the running programs

Distributed Processing
This is a specialized data processing technique in which various computers (which are located
remotely) remain interconnected with a single host computer making a network of computer.

12
All these computer systems remain interconnected with a high speed communication network.
This facilitates in the communication between computers. However, the central computer system
maintains the master data base and monitors accordingly.

Useful tools and technologies for Data Processing


Considering the amounts of data generated these days, processing them manually is basically
impossible. This is why different tools are used to automate, accelerate, and simplify the
entire process. The most popular include different programming languages, SQL
language, Business Intelligence or ETL tools, and integration platforms. Each of these
methods has its strengths and weaknesses. Let’s take a quick look at them.

Business Intelligence tools for business analysis


A friendly interface, predefined analytical models, and a wide range of options for presenting
results are the factors that make BI tools massively popular. Users value the clear and
uncomplicated visualizations of complicated analyses and large datasets. These visualizations
can be presented to management boards and customers. They, in turn, can use the insights to
inform their decisions.

13
However, in their output version, BI tools have a relatively small number of available data
source connections. These connections are required for conducting analyses. BI tools also have
limited options for preparing data for further analyses. Therefore, it’s common to use both BI
tools and ETL/ESB solutions.

Tools and software for statistical analyses


These tools allow you to create very precise analyses, e.g., correspondence, reliability, or
cluster analyses. Often, these are the only tools that can provide analyses with the accuracy and
complexity required by companies or institutions in medical or lab research sectors. There is no
alternative to them. The target user group includes specialists in specific fields. On the contrary,
BI systems are dedicated to business representatives and management boards.

The downside of statistical analysis solutions is the high purchase and maintenance costs.
These costs are related to the fact that this kind of tool is often divided into different modules
that each generate additional expenses.

Different programming languages

Using different programming languages is still a common approach. One perk is the option for
creating advanced machine-learning models. But programming techniques aren’t too flexible
compared to other methods. This is the case especially when there’s a need to introduce changes
related to, e.g., dynamically transforming business conditions.

This method also has downsides unrelated to the data analysis itself. It requires qualified data
processing specialists to be skilled in programming languages and possess a vast knowledge of
business processes. This is needed to correctly interpret analysis results and create new
scenarios. Maintaining such a skilled team might be a big challenge.

SQL
SQL consoles that handle queries in the SQL programming language are useful for many
analytical scenarios and for achieving precise feedback.

14
However, queries will only bring satisfying results if data are structured in the right way,
maintaining relations between them.

Growing databases and the need for managing data source accesses may make it challenging for
administrators.

ETL tools and data integration platforms


Integration tools weren’t created to present results or perform very complicated calculations and
analyses. However, an increasing number of companies choose to include them in their data
processing.

The main task of these solutions is creating connections between systems or databases, sending
notifications, verifying data accuracy and completeness, and transforming them while
maintaining crucial attributes and schemes. This maximizes the usefulness of data in future
analyses.
Data integration platforms…can be used by business owners that aren’t qualified data
processing specialists.
A significant advantage of data integration platforms is their no-code/low-code model. They can
be used by business owners that aren’t qualified data processing specialists. The features of
these tools can be expanded with additional scripts in Python or R programming languages. After
acquiring the necessary competencies, the users can successfully expand their solution
environment, limiting the so-called vendor lock – dependency on the software provider.

With integration platforms, they can process tabular, vector, and raster data, as well as databases
and data warehouses. Moreover, they can process data from network services such as WMS or
WFS, different APIs, and information from IoT sensors.
With integration platforms, you can also automate your designed processes. This saves you
time and money. Moreover, the skills of employees who work with data can be used in other
areas.

Deciding on ETL tools or an integration platform, you should analyze your data processing
goals to avoid unnecessary costs. These are complex solutions that offer nearly infinite

15
possibilities. They might be left wasted if it turns out your organization could only use much
simpler tools.
Benefits of Data Processing
As mentioned before, collecting data without processing and analyzing them makes them
useless. Prepared in the right way, data can give you measurable business benefits.

Processing data brings:


Increased productivity and profits. Some data can be processed once and shared with your
organization for different tasks and projects. Correct profiling and categorizing data, as well
as determining their importance and validity, can help you avoid some serious
problems. For example, you may have lots of data but only some of them are truly valuable. The
worthless data excess may in fact negatively impact your process effectiveness.

Better business decisions. Cleaned data are easier to analyze and make it more straightforward
to notice patterns that could get overlooked in the original, unprocessed dataset. You can be
sure you’re making the right decisions if you’re making them based on verified, organized
data.

Limited operational costs. Correct data processing guarantees that your data are high-
quality and can be successfully used in business processes. After data processing, it may turn out
that some data need corrections – you can use this knowledge and avoid using them in your
analyses. They would only bring incorrect results. This saves the time and effort you’d have to
spend searching for errors and repeating analyses. Moreover, it helps you eliminate the risk
of making wrong decisions based on invalid analyses.

Improved data storage, distribution, and reporting. Data are more accessible when saved in
a format preferred by their users. Data saved in a unified format can be still used in many
systems and for different purposes. They don’t need to be transformed over and over again.

16

You might also like