Unit 3-Statistics
Unit 3-Statistics
Data
What are data?
Data are plain facts, usually raw numbers. Think of a spreadsheet full of numbers
with no meaningful description. In order for these numbers to become information,
they must be interpreted to have meaning.
Definition of Qualitative Data
Qualitative Data refers to the data that provides insights and understanding about a
particular problem. It can be approximated but cannot be computed. Hence, the
researcher should possess complete knowledge about the type of characteristic, prior
to the collection of data.
The nature of data is descriptive and so it is a bit difficult to analyze it. This type of
data can be classified into categories, on the basis of physical attributes and
properties of the object. The data is interpreted as spoken or written narratives
rather than numbers. It is concerned with the data that is observable in terms of
smell, appearance, taste, feel, texture, gender, nationality and so on. The methods of
collecting qualitative data are: Focus Group, Observation, Interviews and Archival
materials like newspapers.
Definition of Quantitative Data
Quantitative Data, as the name suggests is one which deals with quantity or numbers.
It refers to the data which computes the values and counts and can be expressed in
numerical terms is called quantitative data. In statistics, most of the analysis are
conducted using this data.
Quantitative data may be used in computation and statistical test. It is concerned
with measurements like height, weight, volume, length, size, humidity, speed, age etc.
The tabular and diagrammatic presentation of data is also possible, in the form of
charts, graphs, tables, etc. Further, the quantitative data can be classified as discrete
or continuous data. The methods used for the collection of data are: Surveys,
Experiments, Observations and Interviews
Definition of Discrete/Categorical Data
The term discrete implies distinct or separate. So, discrete data refers to the type of
quantitative data that relies on counts. It contains only finite values, whose
subdivision is not possible. It includes only those values that can only be counted in
whole numbers or integers and are separate which means the data cannot be broken
down into fraction or decimal.
For example, number of cars in the parking lot, the number of computers in a
computer lab, the number of animals in a zoo, etc.
Range: Recall that a range is a description of the difference between the greatest and least
values in a given data set.
How would you describe the following histograms?
1A.
1B.
These two graphs represent the same data – what accounts for their drastic change in
shape?
Bar Chart
Pie Chart
Line graphs.
A line graph is mostly used to show change over time as a series of data points
connected by line segments on the coordinate plane. The line graph therefore helps
to find the relationship between two data sets, with one data set always being
dependent on the other set.
Scales of Measurement
Measurement scales are used to categorize and/or quantify variables.
Properties of Measurement Scales
Each scale of measurement satisfies one or more of the following properties of
measurement.
Identity. Each value on the measurement scale has a unique meaning.
Magnitude. Values on the measurement scale have an ordered relationship to one
another. That is, some values are larger and some are smaller.
Equal intervals. Scale units along the scale are equal to one another. This means, for
example, that the difference between 1 and 2 would be equal to the difference
between 19 and 20.
A minimum value of zero. The scale has a true zero point, below which no values
exist.
Ordinal Scale of Measurement
The ordinal scale has the property of both identity and magnitude. Each value on
the ordinal scale has a unique meaning, and it has an ordered relationship to every
other value on the scale.
An example of an ordinal scale in action would be the results of a horse race,
reported as "win", or "place". We know the rank order in which horses finished the
race. The horse that won finished ahead of the horse that placed, and the horse that
placed finished ahead of the horse that showed. However, we cannot tell from this
ordinal scale whether it was a close race or whether the winning horse won by a mile.
Interval Scale of Measurement
The interval scale of measurement has the properties of identity, magnitude, and
equal intervals.
A perfect example of an interval scale is the Fahrenheit scale to measure
temperature. The scale is made up of equal temperature units, so that the difference
between 40 and 50 degrees Fahrenheit is equal to the difference between 50 and 60
degrees Fahrenheit.
With an interval scale, you know not only whether different values are bigger or
smaller, you also know how much bigger or smaller they are. For example, suppose
it is 60 degrees Fahrenheit on Monday and 70 degrees on Tuesday. You know not
only that it was hotter on Tuesday, you also know that it was 10 degrees hotter.
Ratio Scale of Measurement
The ratio scale of measurement satisfies all four of the properties of measurement:
identity, magnitude, equal intervals, and a minimum value of zero.
The weight of an object would be an example of a ratio scale. Each value on the
weight scale has a unique meaning, weights can be rank ordered, units along the
weight scale are equal to one another, and the scale has a minimum value of zero.
Weight scales have a minimum value of zero because objects at rest can be
weightless, but they cannot have negative weight.
Mean, median, and mode are different measures of center in a numerical data set.
They each try to summarize a dataset with a single number to represent a "typical"
data point from the dataset.
Mean: The "average" number; found by adding all data points and dividing by the
number of data points.
Median: The middle number; found by ordering all data points and picking out the
one in the middle (or if there are two middle numbers, taking the mean of those two
numbers).
Mode: The most frequent number—that is, the number that occurs the highest
number of times. It is classified as unimodal, bimodal, trimodal/multimodal.
Calculating measures of central tendency.
Mean
Where xm is the midpoint of each class, f is the frequency of each class and n is the
∑ 𝑓𝑥
total number of observations. * If there are no intervals use:
𝑛
Median
𝑛 𝑛
**If n is even, use the average of the value of the ( ) and ( +1)th items, where n is the
2 2
total number of observations.
When data is grouped
𝑛+1
***Remember the median class is the interval corresponding to the th
value.
2
Mode
When data is ungrouped…
The most frequent number—that is, the number that occurs the highest number of
times