Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
A frequency distribution is often used to organize raw data. It is a table that lists observed events and
frequency of occurrence of each observed event.
Illustration: Consider the table which lists the number of laptop computers owned by families in each of
40 homes in a subdivision.
2 0 3 1 2 1 0 4 2 1 1 7 2 0 1 1 0 2 2 1
3 2 2 1 1 4 2 5 2 3 1 2 2 1 2 1 5 0 2 5
Example: Find the range of the numbers of ounces dispensed by Machine 1 and Machine 2.
Another way to measure the dispersion of a data set is called the variance.
Population Variance:
𝒙−𝛍
𝝈𝟐 =
𝑵
Sample Variance:
𝒙 − 𝐱̄
𝒔𝟐 =
𝒏−𝟏
N= number of data values
𝐱̄ = 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 𝐦𝐞𝐚𝐧
Another way to measure the spread of a data set is the standard deviation.
Standard deviation is a measure of the dispersion of a data set from its mean. A large standard
deviation means that the data values are far from the mean and small standard deviation means that
they are closer to the mean.
Population Standard Deviation: Sample Standard deviation:
𝒙−𝛍 𝒙−𝐱̄
𝝈 = √𝝈𝟐 = √ s= √𝒔𝟐 = √
𝑵 𝒏−𝟏
Example: Find the standard deviation of the two example from the Variance.
Measures of Relative Position:
z-scores:
Standard score- derived value that expresses how far a given raw data is from some reference
data point, usually the mean, in terms of standard deviation unit.
Example: z-score, t-score, stanines
The z-score for a given data value, x, is the number of standard deviations that x is above or below the
mean of the data set.
Illustration: Consider data values 6 and 20 in a data set with 12 to be its mean and 4 to be its standard
deviation. Get the z-scores of 6 and 20.
kth Percentile:
A value x is called the kth percentile of a data set such that k% of the are lesser or smaller than x
and (100-k)% of the data are greater or larger than x.
e.g Jay’s score in a Math exam is 65 with 80th percentile rank.
NORMAL DISTRIBUTION
- a bell-shaped curved called normal curve that is symmetric about a vertical line through the
mean of the data set. The normal curve will cover up 3 standard deviations on the right and 3
standard deviations on the left.
The Normal curve has the following characteristics:
1. The curve is symmetric.
2. The value of the mean, median, and mode are the same.
3. The tails are asymptotic to the horizontal line and they extend to infinity.
4. The area under the curve is 1 or the probability under the curve is 100%.
Empirical rule:
1. Almost 99.7% of the data will fall within 3 standard deviations of the mean.
2. 95% of the data will fall within 2 standard deviations of the mean.
3. 68% of the data will fall within 1 standard deviations of the mean.
The Standard Normal Distribution
- a normal distribution that has a mean of 0 and a standard deviation of 1.
Regression Equation: 𝒚 = 𝒂 + 𝒃𝒙
- Used to predict the behavior of a variable
- Explains the amount of variations observable in the independent variable x.
Given n number of data pairs (x,y), to find a and b, we have
(∑ 𝑦)(∑ 𝑥 2 ) − (∑ 𝑥)(∑ 𝑥𝑦)
𝑎=
𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2
𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)
𝑏=
𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2
Example: Find the regression equation of the the following ordered pairs: (1,25), (2,30), (3,32), (4,45),
(5,50).