MMW 6 Data Management Part 3 Central Location Variability PDF
MMW 6 Data Management Part 3 Central Location Variability PDF
Measures of Central Tendency or Central Location are numerical values that tend to locate in some sense the
middle of a set of data when arranged in increasing or decreasing order. The term average is often associated
with these measures. The most important measures of central tendency are (1) the mean, (2) the median, and
(3) the mode.
MEAN, 𝜇 or 𝑥
1. Arithmetic Mean – it is obtained by adding all the observations and dividing the sum by the number of
observations, thus it is called a computational average.
Population mean: If a set of data 𝑥1 , 𝑥2 … 𝑥𝑁 represents a finite population of size 𝑁, then the population
mean 𝜇 is
𝑥𝑖
𝜇=
𝑁
Sample Mean: If a set of data 𝑥1 , 𝑥2 … 𝑥𝑛 represents a finite sample of size 𝑛, then the sample mean 𝑥 is
𝑥𝑖
𝑥=
𝑛
Example 1
Suppose you are to choose ten people who enter the campus and whose ages are as follows:
15 25 18 20 25 18 18 20 25 15
What is the mean age of this sample?
2. Weighted Mean – if the data set 𝑥1 , 𝑥2 … 𝑥𝑘 have assigned weights 𝑤1 , 𝑤2 … 𝑤𝑘 , respectively, then the
weighted mean is computed as follows:
𝑤𝑖 𝑥𝑖
𝑥=
𝑤𝑖
Example 2
The table provides the grades obtained by a student in the different criteria for grading and the
corresponding weight for each criterion. Find his weighted average.
Component Grade Weight
Seatworks and Problem Sets 87 15%
Quizzes 82 30%
Departmental Exam 76 40%
Class Participation and Attendance 97 5%
Homework and Projects 88 10%
MEDIAN, 𝜇 or 𝑥
- a value that divides the distribution into two equal parts (after arranging the values/scores in ascending or
descending order). As such, it is a positional average. The median is defined by
𝑥𝑛 +1 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
𝜇 𝑜𝑟 𝑥 = 𝑥𝑛 + 𝑥𝑛 +1
2 2
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2
Example 3
Find the median: (a) 12, 15, 18, 8, 9,10, 6; (b) 23, 18, 15, 12, 10, 9, 8, 6
Mathematics in the Modern World
MODE, 𝜇 or 𝑥
- the value in the distribution with the highest frequency. It locates the point where the observation values
occur with the greatest density. It can be used for quantitative as well as qualitative data.
Example 4
Find the mode of the following data: 15 12 4 9 6 10 5 15
12 4 12 6 12 5 15 12 4 15 4 6 5
Evidently, a distribution can have no mode, one mode, or more than one mode. Thus, the mode is not a very
reliable measure of central tendency. However, there are instances when no other measure can be used
except the mode. In determining the prevalent gender, civil status, or highest educational attainment, only the
mode can be used because no numerical values can be assigned to these variables.
Remarks
Mean:
1. All the scores or measurements are considered in the computation of the mean.
2. Very high or very low scores or measurements affect the mean.
Median:
1. Only the middle scores or measurements are considered in the computation of the median.
2. Very high or very low scores do not affect the median.
Mode:
1. It is very easy to compute but is seldom used because it is very unstable.
2. It is most appropriate for nominal scale as a measure of popularity.
Exercises
1. Find the mean, median and mode of the following examination scores given in a stem-and-leaf plot.
Exam Scores
4 568
5 34569
6 2356699
7 01133455578
8 122369
2. The numbers of incorrect answers on a true or false competency test for a random sample of 15
students were recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, and 2. Find a. mean, b. median, c.
mode.
3. A student had accumulated 20 credits with the grade of A, 25 credits with B’s, 10 credits with C’s, and 2
credits with D’s. The school uses the grading scale in which A = 4 grade points, B = 3, C = 2 and D = 1.
Determine the grade point average of the student.
4. A student was taking six subjects in college during the first semester. Find his average grade if his final
grades were as follows:
Subject Math Physics English Business Statistics
Grade 1.75 2.50 2.25 1.50 3.0
Units 3 5 3 2 4
Mathematics in the Modern World
The measures of central tendency do not by themselves give an adequate description of the data. It is also
very important for us to know how the observations spread out from the average. The measures of variation
indicate the extent to which individual items in a series are scattered about the average. It is used to determine
the extent of the scatter so that steps may be taken to control the existing variation.
Both samples have the same mean but, it is quite obvious that the measurements for sample A are more
uniform or the values are close to each other as compared to sample B.
RANGE
The range measures the distance between the largest and the smallest values and, as such, gives an idea of
the spread of the data set. However, the range does not use the concept of deviation. It is affected by outliers
but does not consider all values in the data set. Thus it is a not a very useful measure of variability.
𝑅𝑎𝑛𝑔𝑒 (𝑅) = 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 – 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒
Population Variance, 𝜎 2 : Given the finite population 𝑥1 , 𝑥2 … 𝑥𝑁 , the population variance, which is exact, is
𝑥𝑖 − 𝜇 2 𝑁 𝑥𝑖 2 − 𝑥𝑖 2
𝜎2 = 𝑜𝑟 𝜎 2 =
𝑁 𝑁2
Population Standard Deviation, 𝜎: The square root of the population variance
𝑁 𝑥𝑖 2 − 𝑥𝑖 2
𝜎=
𝑁2
Sample Variance, 𝑠 2 : Given a random sample 𝑥1 , 𝑥2 … 𝑥𝑛 , the sample variance is
𝑥𝑖 − 𝑥 2 𝑛 𝑥𝑖 2 − 𝑥𝑖 2
𝑠2 = 𝑜𝑟 𝑠 2 =
𝑛−1 𝑛(𝑛 − 1)
Sample Standard Deviation, 𝑠: The square root of the sample variance
𝑛 𝑥𝑖 2 − 𝑥𝑖 2
𝑠=
𝑛(𝑛 − 1)
If the data are clustered around the mean, then the variance and the standard deviation will be somewhat
small. If, however, the data are widely scattered about the mean, the variance and the standard deviation will
be somewhat large.
Notes:
1. We divide by the quantity 𝑛 − 1 in order to make the sample variance an unbiased estimator of the
population variance. (An estimator is unbiased if its average value is equal to the parameter it is
estimating.)
2. The unit of the standard deviation is the same as that of the raw data, so it is preferable to use the
standard deviation as a measure of variability instead of the variance.
3. The range is a quick but a rough measure of variation since considers only the highest value and the
lowest value of the observations.
Exercises
1. A sample of seven taxicabs from a large fleet of taxicabs used the following amounts of gasoline in one
day: 10.9, 19.3, 14.7, 13.8, 15.3, 11.4, and 12.6 gallons. Compute for the range, variance and standard
deviation of the sample data.
2. The manager of a small dry cleaner employs six people. As part of their personnel file, she asked each
one to record to the nearest one-tenth of a mile the distance they travel one way from home to work.
The six distances are listed below: 17.6, 22.9, 29.8, 29.7, 12.2, and 15.8. Determine the range and the
standard deviation.
3. The numbers of minutes spent in the computer lab by a sample of 20 students working on a project are
given below. Find the mean, range, variance, standard deviation, and coefficient of variation for this
sample.
Numbers of Minutes
30 | 0 2 5 5 6 6 6 8
40 | 0 2 2 5 7 9
50 | 0 1 3 5
60 | 1 3
4. Find the mean, range, variance, standard deviation, and coefficient of variation for the following data
set given in a stem-and-leaf plot.
4 | 568
5 | 34569
6 | 2356699
7 | 01133455578
8 | 12369
9 | 3578
Mathematics in the Modern World
5. The following scores represent the final examination scores for a business statistics course:
23 60 79 32 57 74 52 70 82 36
80 77 81 95 41 65 92 85 55 76
52 10 64 75 78 25 80 98 81 67
41 71 83 54 64 72 88 62 74 43
60 78 89 76 84 48 84 90 15 79
34 67 17 82 69 74 63 80 85 61
Compute the mean, variance, standard deviation and coefficient of variation of the data.
6. The following are the gains and losses (in thousands of pesos) of two commodities for 10 business days.
Commodity 1: 6 4 2 -3 4 0 -2 5 4 5
Commodity 2: 3 2 0 -1 -4 3 5 6 5 5
a.) Calculate the mean and standard deviation of each of the samples.
b.) Which commodity shows the more consistent performance?
7. A written test administered to 2 sections of Statistics classes gave the following mean and standard
deviation.
Section A Section B
Mean 60 76
Standard Deviation 10 12
Determine which of the 2 sections has greater variability of scores.
8. The mean stature of college women is 5’2” with standard deviation of 2.5” while their mean weight is
105 lbs. with a standard deviation of 8 lbs. Which is more variable, height or weight of college women?
9. A study of the effects of smoking on sleep patterns is conducted. The measure observed is the time, in
minutes, that it takes to fall asleep. These data are obtained:
Smokers: 69.3, 56.0, 22.1, 47.6, 53.2, 48.1, 52.7, 34.4, 60.2, 43.8, 23.2, 13.8
Nonsmokers: 28.6, 25.1, 26.4, 34.9, 29.8, 28.4, 38.5, 30.2, 30.6, 31.8, 41.6, 21.1, 36.0, 37.9, 13.9
a. Find the sample mean for each group.
b. Find the sample standard deviation for each group.
c. Find the coefficient of variation for each group.
d. Comment on what kind of impact smoking appears to have on the time required to fall asleep.
10. The weights of 10 boxes of a certain brand of cereal have a mean content of 278 grams with a
standard deviation of 9.64 grams. If these boxes were purchased at 10 different stores and the average
price per box is $1.29 with a standard deviation of $0.09, can you conclude that the weights are
relatively more homogeneous than the prices?