Data Handling
Data Handling
Data
Data is a collection of numerical figures that represents a particular kind of information. The
collection of observations which are gathered initially is called the raw data. Data can be in any
form. It may be words, numbers, measurements, descriptions or observations.
Types of data
Data handling methods can be performed based on the types of data. The data is classified into
two types, such as:
✓ Qualitative Data
✓ Quantitative Data
Quantitative Data
Qualitative Data Numerical data that can be counted or measured and expressed in numbers,
percentages, or other quantifiable units.
Examples: Age, speed, income, temperature, number of website visitors, survey responses with
numerical scales (e.g., rating from 1 to 5).
• The quantitative data is further divided into two. They are discrete data and continuous
data.
• The discrete data can take only certain values such as whole numbers.
• The continuous data can take a value within the provided range.
Qualitative Data
Qualitative data that is non-numerical and focuses on qualities, experiences, opinions, and feelings.
Examples: Open-ended survey responses, interview transcripts, observations, user reviews, social
media posts.
Small Data
Small data is data that is 'small' enough for human comprehension. It is data in a volume and format
that makes it accessible, informative and actionable.
• Easily understands by human brain
• Small data accumulation is slow
• Store in only MB, GB
• Small data is controlled and structured
Big Data
Big Data, a popular term recently, has come to be defined as a large amount of data that can’t be
stored or processed by conventional data storage or processing equipment.
Needs a machine to be understood
Too complex, exhaustive and unstructured
Store in PB, EB etc
Data accumulation is fast
Data handling
Data handling is the process of securing the research data is gathered, archived or disposed of in a
protected and safe way during and after the completion of the analysis process. Data handling
means collecting the set of data and presenting in a different form.
✓ Problem Identification
In the data handling process, the purpose or problem statement has to be identified and well
defined.
✓ Data Collection
The collected data should be presented in a meaningful manner and it should be easily understood.
It can be done by arranging the collected data in the tally marks, table forms, and so on.
✓ Graphical Representation
Since the visual or graphical representation of the data makes the analysis and understanding
easier, the presented data can be plotted in graphs, charts such as bar graphs, pie charts and so on.
✓ Data Analysis
The data should undergo data analysis so that the necessary information can be concluded from
the data, which helps in taking further actions.
✓ Conclusion
From the analysis of the data, we can derive the solution to our problem statement. The data can
be usually represented in any one of the following ways.
They are:
✓ Bar Graph
✓ Line Graphs
✓ Histograms
✓ Dot Plots
✓ Frequency Distribution
Scales of Measurement
In statistics, there are few scales of measurements which are used in order to measure the statistical
variables. Each of these measurements scales does measure a certain type of variable. Each
measurement scales have some fundamental property by which they can be classified.
Properties
Every scale of measurement has few or all of the following properties explained below:
Each and every value on a measurement scale does have a unique identity or meaning
These values usually have some magnitude or an ordered relationship with one another.
We may say that some of the value are smaller, while some are bigger.
The intervals of the values (data points) on scales are equal to one another.
A minimum value of zero. A minimum value of zero means that scale has a true zero point
In statistics, there are four data measurement scales: nominal, ordinal, interval and ratio. These are
simply ways to sub-categorize different types of data.
1) Nominal
Nominal scales are used for labeling variables, without any quantitative value. Notice that all
of these scales are mutually exclusive (no overlap) and none of them have any numerical
significance. A good way to remember all of this is that “nominal” sounds a lot like “name”
and nominal scales are kind of like “names” or labels.
Examples of Nominal Scales
2) Ordinal
An ordinal scale is used in statistics and research to categorize or rank data based on their
relative magnitude or order. In this scale, data are organized into categories that have a specific
order or ranking, but the intervals between the categories may not be uniform or measurable
Example of Ordinal Scales
✓ Grades (A, B, C, D)
✓ Ranking of players
3) Interval
Interval scales are numeric in which we know both the order and the exact differences between
the values. The classic example of an interval scale is Celsius temperature because the
difference between each value is same. For example, the difference between 60 and 50 degrees
is measurable 10 degrees, as is the difference between 80 and 70 degrees. Here’s the problem
with interval scales they don’t have a “true zero.”
Example of Interval Scale
✓ Kids' clothing sizes, where a zero size does not imply that a size does not exist.
✓ Pass and fail, where failing does not imply that the student received no credit.
✓ SAT (scholastic assessment test, is a standardized test widely used for college admissions
in the United States. It assesses a student's readiness for college by measuring their skills in
reading, writing, and mathematics. The SAT consists of multiple-choice questions and an
optional essay section) score.
4) Ratio
In statistics, a ratio scale is a type of measurement scale that not only has the properties of
an interval scale but also has a true zero point. In other words, the values on a ratio scale
not only have a specific order and equal intervals between them but also possess a
meaningful zero point that represents the absence of the measured attribute. Ratio scales
provide a wealth of possibilities when it comes to statistical analysis. These variables can
be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be
measured by mode, median, or mean; measures of dispersion, such as standard deviation
and coefficient of variation can also be calculated from ratio scales.
Examples
✓ Age ✓ Weight
✓ Height ✓ Time
✓ Salary ✓ Distance