STATISTIC REVIEWER (INTRODUCTORY V.
QUALITATIVE VARIABLES and
CONCEPTS) QUANTITATIVE VARIABLES
I. MEANING OF STATISTICS
STATISTICS – is the science of conducting
studies to collect, organize, summarize, analyze,
and draw conclusions from data.
STATISTICS are like bikinis. What they reveal
is suggestive, but what they conceal is vital. QUALITATIVE VARIABLES - are variables
that have distinct categories according to some
II. BRANCHES OF STATISTICS characteristic or attribute.
DESCRIPTIVE STATISTICS – used to QUANTITATIVE VARIABLES – are variables
describe, organize and summarize information that can be counted or measured.
about an entire population. (i.e. 90% satisfaction
of all customers) VI. TYPES OF QUANTITATIVE
VARIABLE
INFERENTIAL STATISTICS – used to
generalize about a population based on a
sample of data (i.e. 90% satisfaction of all
customers)
REMEMBER:
Descriptive statistics summarize your current
dataset and Inferential statistics aim to draw
conclusions about an additional population
outside of your dataset. DISCRETE VARIABLES – assume values that
can be counted.
III. POPULATION and SAMPLE Examples:
The number of children in a family
POPULATION The number of students in a classroom
The measurable quality is called a
parameter. CONTINUOUS VARIABLES – can assume an
The population is a complete set. infinite number of values between any two
Reports are a true representation of specific values. They are obtained by
opinion. measuring. They often include fractions and
It contains all members of a specified decimals.
group. Examples:
Temperature
SAMPLE Height
The measurable quality is called a Weight
statistic.
The sample is a subset of the population. VII. LEVELS or SCALES OF
Reports have a margin of error and MEASUREMENT
confidence interval.
It is a subset that represents the entire The NOMINAL LEVEL OF MEASUREMENT
population. classifies data into mutually exclusive (non-
overlapping) categories in which no order or
IV. CONSTANTS and VARIABLE ranking can be imposed on the data.
Examples:
CONSTANT is a characteristic or property of a Gender
population or sample which makes the member Eye color
similar to each other.
The ORDINAL LEVEL OF MEASUREMENT
VARIABLE is a characteristic of interest classifies data into categories that can be ranked;
measurable on each and every individual in the however, precise differences between the ranks
universe denoted by a capital letter in the do not exist.
English alphabet which assumes different Examples:
values or labels. Student letter grades
Ranking of players
The INTERVAL LEVEL OF MEASUREMENT Ungrouped frequency distribution lists the
ranks data, and precise differences between data values with the corresponding number of
units of measure do exist; however, there is no times or frequency count with which each value
meaningful zero. occurs.
Examples: Example:
Temperature The following data represent the number
Standardized exam score of defective bulbs observed each day
over a 25-day period for a manufacturing
The RATIO LEVEL OF MEASUREMENT process. Summarize the information
possesses all the characteristics of interval with a frequency distribution.
measurement, and there exists a true zero. In
addition, true ratios exist when the same
variable is measured on two different members
of the population.
Examples:
Height
Age
Grouped frequency distribution is obtained by
constructing classes (or intervals) for the data,
and then listing the corresponding number of
values (frequency count) in each interval.
To construct a frequency distribution, follow
these rules:
THE FREQUENCY DISTRIBUTION
1. There should be between 5 and 20
classes.
A frequency distribution is the organization of
2. It is preferable but not absolutely
raw data in table form, using classes and
necessary that the class width be an odd
frequencies.
number.
3. The classes must be mutually exclusive.
Types of frequency distributions:
4. The classes must be continuous.
Categorical frequency distribution
5. The classes must be exhaustive.
Ungrouped frequency distribution
6. The classes must be equal in width.
Grouped frequency distribution
Constructing a Grouped Frequency
The frequency or the frequency count for a data Distribution
value is the number of times the value occurs in To construct a frequency distribution, follow
the data set. these steps:
1. Determine the classes.
Categorical frequency distribution represents o Find the highest and lowest
data that can be placed in specific categories. values.
o Find the range.
Example: o Select the number of classes
Twenty-five incoming freshmen were given a desired.
blood test to determine their blood type. The o Find the width by dividing the
data set is range by the number of classes
A B B AB O and rounding up.
O O B AB B o Select a starting point (usually the
B B O A O lowest value or any convenient
A O O O AB number less than the lowest
AB A O B A value); add the width to get the
lower limits.
Construct a frequency distribution for the data. o Find the upper class limits.
o Find the boundaries.
2. Tally the data.
3. Find the numerical frequencies from the
tallies, and find the cumulative
frequencies.
Number of classes Sometimes it is necessary to use a cumulative
Some statisticians use “2𝑘 ” rule. frequency distribution. A cumulative frequency
2𝑘 ≥ 𝒏 distribution is a distribution that shows the
2𝑘 rule is just a guide number of data values less than or equal to a
If the 2𝑘 rule suggests you need 6 classes, specific value (usually an upper boundary).
also consider using 5 or 7 classes … but
certainly not 3 or 9.
NOTE
The class limits should have the same
decimal place value as the data, but the
THE FREQUENCY DISTRIBUTION USING
class boundaries should have one
MS EXCEL
additional place value and end in a 5.
The “frequency function” can be found in
Example: Formulas menu under the statistical category by
The data below represent the record high
following the below steps as follows:
temperatures in degrees Fahrenheit (F) for each
of the 50 cities in the Philippines this April.
✔ Go to Formula menu.
Construct a grouped frequency distribution for
the data, using 7 classes.
✔ Click on More Function.
Find the highest value and lowest value:
H = 134 and L = 100.
Find the range: R = highest value – lowest
value H L,
So R = 134 – 100 = 34
In this case, we will use 7 classes to
construct the frequency distribution.
Find the class width by dividing the ✔ Under Statistical category choose Frequency
range by the number of classes. Function.
✔ We will get the Frequency Function Dialogue
box as shown.
THE MEASURES OF CENTRAL TENDENCY
Measures of Central Tendency
Mean (Arithmetic Mean) of Data Values
Sample mean
Population mean
Mean (Arithmetic Mean)
The Most Common Measure of Central
Tendency
Affected by Extreme Values (Outliers)
Median
Robust Measure of Central Tendency
Not Affected by Extreme Values
In an Ordered Array, the Median is the ‘Middle’
Number
If n or N is odd, the median is the middle
number.
If n or N is even, the median is the
average of the 2 middle numbers.
Mode
A Measure of Central Tendency
Value that Occurs Most Often
Not Affected by Extreme Values
There May Not Be a Mode
There May Be Several Modes
Used for Either Numerical or Categorical
Data
THE MEASURES OF CENTRAL TENDENCY
USING EXCEL
THE MEASURES OF LOCATION
Location or Position
Used to describe the position of a data
value in relation to the rest of the data.
Types:
1. Quartiles
Q1 – Lower Quartile
At most, 25% of data is smaller than Q1.
It divides the lower half of a data set in
half.
Q2 – Median
The median divides the data set in half.
50% of the data values fall below the
median and 50% fall above.
Q3 – Upper Quartile
At most, 25% of data is larger than Q3.
It divides the upper half of the data set in
half.
Interquartile Range
The inter quartile range is Q3-Q1
50% of the observations in the THE MEASURES OF LOCATION USING
distribution are in the inter quartile EXCEL
range.
The following figure shows the
interaction between the quartiles, the
median and the inter quartile range.
2. Deciles
3. Percentiles
MEASURES OF VARIATION
Measuring Variability
Variability can be measured with
o the range
o the interquartile range
o the standard deviation/variance
o Coefficient of variation.
In each case, variability is determined by
measuring distance.
The Range
The range is the total distance covered by the
distribution, from the highest score to the
lowest score.
Limitations of the Range
It is based only on two values and does
not cover all the data values in a data set.
It is subject to wide fluctuations from
sample to sample based on the same
population.
It fails to give any idea about the pattern
of distribution.
It is not possible to compute the range
Properties of the Standard Deviation
when the distribution is open-ended.
If a constant is added to every score in a
distribution, the standard deviation will
The Standard Deviation
not be changed.
Standard deviation measures the standard
If you visualize the scores in a frequency
distance between a score and the mean.
distribution histogram, then adding a
constant will move each score so that the
entire distribution is shifted to a new
location.
The center of the distribution (the mean)
The Variance
changes, but the standard deviation
The population variance is the average of the
remains the same.
squares of the distance each value is from the
mean.
Properties of the Standard Deviation (cont.)
If each score is multiplied by a constant,
the standard deviation will be multiplied
by the same constant.
Multiplying by a constant will multiply
the distance between scores, and because
the standard deviation is a measure of
distance, it will also be multiplied.
The Coefficient of Variation
The coefficient of variation, denoted by
CVar, is the standard deviation divided
by the mean. The result is expressed as a
percentage.
Chebyshev’s Theorem
This theorem states that:
✔ At least three-fourths or 75% of all data values
will fall within 2 standard deviations of the
mean.
✔ At least eight-ninths or 89% of all data values
will fall within 3 standard deviations of the
mean.
The Empirical Rule
✔ Approximately 68% of the data values will
fall within 1 standard deviation of the mean.
✔ Approximately 95% of the data values will
fall within 2 standard deviations of the mean.
✔ Approximately 99.7% of the data values will
fall within 3 standard deviations of the mean.
MEASURES OF VARIATION using EXCEL