GE 4 – MATHEMATICS IN THE MODERN WORLD
TOPIC 2: STATISTICS
STATISTICS - a branch of science that deals with the development of method for a more effective way of collecting,
presenting, analyzing and interpreting data.
BRANCHES OF STATISTICS
o Descriptive statistics is concerned with summarizing and describing numerical data through the use of
tables, graphs, and charts.
o Inferential statistics is concerned with generalizing information or making inference about a population
based on a sample observed.
MEASURES OF CENTRAL TENDENCY
Central Tendency
- It is a single value that is used to identify the center of the data and is considered as the typical value in a set of
scores.
- It may also be called a center or location in a distribution.
- It tends to lie within the center if the observation (data) are arranged from lowest to highest or vice versa.
- The three measures of central tendency are: Mean, Mode and Median
1. MEAN ( 𝒙̅)
- It is also called as Arithmetic mean or average
- It is found by taking the sum of all the data values and then dividing the result by the number of values. It can be
affected by extreme scores.
- It is stable, and varies less from sample to sample.
- It is used if the most reliable measure is desired and when there are a few with very high values and a few with
very low values.
- The mean is the balance point of a score distribution.
Mean of Ungrouped Data
𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 +. . . +𝒙𝒏 ∑ 𝒙
̅=
𝑴𝒆𝒂𝒏 𝒐𝒓 𝒙 =
𝒏 𝒏
EXAMPLE:
1) Suppose we are given the ages (in months) of ten rabbits owned by Rita as follows: 3, 8, 5, 11, 13, 9, 5, 3, 11,
5. Determine the mean age of the ten rabbits.
Solution: Use the formula:
𝑥1 + 𝑥2 + 𝑥3 +. . . +𝑥𝑛 ∑ 𝑥
𝑀𝑒𝑎𝑛 𝑜𝑟 𝑥̅ = =
𝑛 𝑛
3 + 8 + 5 + 11 + 13 + 9 + 5 + 3 + 11 + 5 73
𝑥̅ = = = 𝟕. 𝟑
10 10
2) Seven friends in a chemistry class of 40 students received test grades of 92, 84, 65, 76, 88, 89 and 90. Find the
mean of these test scores.
Solution: Use the formula:
𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 ∑ 𝑥
𝑥̅ = =
𝑛 𝑛
92 + 84 + 65 + 76 + 88 + 89 + 90 584
𝑥̅ = = = 𝟖𝟑. 𝟒𝟑
7 7
2. MEDIAN
- It is the middle number or the mean of the two middle numbers in a list of numbers that have been arranged in
ascending order (from smallest to largest) or descending order (from largest to smallest).
Any list of numbers that is arranged in numerical order from smallest to largest or from largest to smallest
is called a ranked list.
- When the data set has an odd number of values, the median is the middle value.
- When the number of values is even, the median is the average of the two middle values.
EXAMPLES: Find the median in the following lists.
1) 4, 8, 1, 14, 9, 21, 12
Solution: Arrange the data values in ascending order
1, 4, 8, 9, 12, 14, 21
Since there are seven data values, the middlemost value is the 4th data value which is 9.
𝑀𝑒𝑑𝑖𝑎𝑛 = 9
2) 46, 23, 92, 89, 77, 108
Solution: Arrange the data values in ascending order
23, 46, 77, 89, 92, 108
Since the data set has six (even) data values, we get the median by getting the average of the two
middle values. The middle values are 77 and 89
77 + 89 166
𝑀𝑒𝑑𝑖𝑎𝑛 = = = 83
2 2
3. MODE
- The mode of a list of numbers is the number that occurs most frequently (largest frequency).
Those with one mode is described as unimodal.
It is also possible for other set of values to have more than one mode. Those with two modes described
as bimodal, while those with many modes are called multimodal.
- Used when we need to find the quickest estimate/rough estimate of central value.
EXAMPLES: Find the mode in the following lists.
1) 18, 15, 21, 16, 15, 14, 15, 21
Solution: Look for the data that occurs most frequently. The mode is 15 because it appeared 3 times in the
list.
2) 2, 5, 8, 9, 11, 4, 7, 23
Solution: There is no mode since all data values appeared once in the list.
3) 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 8
Solution: Look for the data that occurs most frequently. The mode is 3 because it appeared 5 times in the list.
4) 12, 34, 12, 71, 48, 93, 71
Solution: Look for the data that occurs most frequently. The modes are 12 and 71 because they both appeared
twice in the list. Hence, the list is bimodal (two modes).
MEASURES OF VARIABILITY/DISPERSION
- It represents the degree of scatter shown by observations or the inherent variability in a phenomenon under
observation.
- This is also used to determine how varied, dispersed, scattered, or distant or how close, clustered, or near the
performances of the members of the group are.
- It also describes the heterogeneity and homogeneity of the group.
- They are accurate indicators of consistency and quality of the data/scores
Table 1. Soda dispensed in ounces (oz)
Machine 1 Machine 2
9.52 8.01
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.02
̅ = 𝟖. 𝟎
𝒙 ̅ = 𝟖. 𝟎
𝒙
Table 1 above shows the amount of soda dispensed by machines 1 and 2. The mean for both machines are the
same (8.0 ounces). But notice on the amount dispensed by machine 1, they are inconsistent. Sometimes the soda
overflows the cup and sometimes too little soda is dispensed. The machine needs some adjusting. Machine 2 on
the other hand, shows little variation on the amount of soda dispensed – it works just fine. With this, we can say
that measures of variability or dispersion are useful in comparing data sets that have the same mean.
1) RANGE
- It is the simplest measure of variability.
- The range is actually the difference between the highest and the lowest values in a set of distribution.
- Consider the range as the rough estimate of variability because it is largely dependent only on two values, the
highest and the lowest values.
𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
EXAMPLES:
1) Find the range of the numbers of ounces dispensed by Machine 1 in Table 1.
Solution: Use the formula:
𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 = 10.07 − 5.85 = 𝟒. 𝟐𝟐 𝒐𝒛.
2) Find the range of the given data set: -8, -5, -12, -1, 4, 7, 11
Solution: Use the formula:
𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 = 11 − (−12) = 11 + 12 = 𝟐𝟑
2) THE STANDARD DEVIATION (𝝈)
- most practical and commonly used measure of variation
- The standard deviation is a measure of how widely values are dispersed from the mean.
- It describes the homogeneity and the heterogeneity of the variables in the distribution.
- It is a more stable measure of variation because it involves all of the scores in a distribution.
-
∑(𝑥 − 𝑥̅ )2
𝜎=√
𝑛
- A low standard deviation means that most of the numbers are very close to the mean/average.
- A high standard deviation means that the numbers are spread out- a more dispersed distribution
EXAMPLES:
1) Find the standard deviation of the following numbers: 2, 4, 7, 12, 15.
Solution: Solve first for the mean
2 + 4 + 7 + 12 + 15 40
𝑥̅ = = =8
5 5
Next, calculate the deviation between the number and the mean for each number.
𝒙 𝒙−𝒙 ̅
2 2 − 8 = −6
4 4 − 8 = −4
7 7 − 8 = −1
12 12 − 8 = 4
15 15 − 8 = 7
Then, calculate the square of each of the deviations in the previous step, and find the sum of these
squared deviations.
𝒙 ̅
𝒙−𝒙 𝒙)𝟐
(𝒙 − ̅
2 2 − 8 = −6 (−6)2 = 36
4 4 − 8 = −4 (−4)2 = 16
7 7 − 8 = −1 (−1)2 = 1
12 12 − 8 = 4 (4)2 = 16
15 15 − 8 = 7 (7)2 = 49
Sum of the squared deviation = 118
Finally, use the formula:
∑(𝑥 − 𝑥̅ )2 118
𝜎 =√ =√ = 𝟒. 𝟖𝟔
𝑛 5
2) A consumer group has tested 8 size-D batteries from each of 3 companies. The results of the tests are shown
in the following table. According to these tests, which company produces batteries for which the values
representing hours of constant use have the smallest standard deviation?
Company Hours of constant use per battery
EverSoBright 6.2 6.4 7.1 5.9 8.3 5.3 7.5 9.3 ̅ 𝒙=𝟕𝒉
Dependable 6.8 6.2 7.2 5.9 7.0 7.4 7.3 8.2 𝒙 ̅=𝟕𝒉
Beacon 6.1 6.6 7.3 5.7 7.1 7.6 7.1 8.5 𝒙 ̅=𝟕𝒉
Solution: Solve the standard deviation for each company.
For EverSoBright
𝒙 ̅
𝒙−𝒙 ̅)𝟐
(𝒙 − 𝒙
6.2 6.2 – 7= -0.8 (−0.8)2 = 0.64
6.4 6.4 – 7= -0.6 (−0.6)2 = 0.36
∑(𝑥 − 𝑥̅ )2 12.34
7.1 7.1 – 7= 0.1 (0.1)2 = 0.01 𝜎 =√ =√ = 𝟏. 𝟐𝟒
5.9 5.9 – 7= -1.1 (−1.1)2 = 1.21 𝑛 8
8.3 8.3 – 7= 1.3 (1.3)2 = 1.69
5.3 5.3 – 7= -1.7 (−1.7)2 = 2.89
7.5 7.5 – 7= 0.5 (0.5)2 = 0.25
9.3 9.3 – 7= 2.3 (2.3)2 = 5.29
Sum of the squared deviation = 12.34
For Dependable
𝒙 ̅
𝒙−𝒙 ̅)𝟐
(𝒙 − 𝒙
6.8 6.8 – 7= -0.2 (−0.2)2 = 0.04
6.2 6.2 – 7= -0.8 (−0.8)2 = 0.64
7.2 7.2 – 7= 0.2 (0.2)2 = 0.04 ∑(𝑥 − 𝑥̅ )2 3.62
𝜎 =√ =√ = 𝟎. 𝟔𝟕
5.9 5.9 – 7= -1.1 (−1.1)2 = 1.21 𝑛 8
7.0 7.0 – 7= 0 (0)2 = 0
7.4 7.4 – 7= 0.4 (0.4)2 = 0.16
7.3 7.3 – 7= 0.3 (0.3)2 = 0.09
8.2 8.2 – 7= 1.2 (1.2)2 = 1.44
Sum of the squared deviation = 3.62
For Beacon
𝒙 ̅
𝒙−𝒙 ̅)𝟐
(𝒙 − 𝒙
6.1 6.1 – 7= -0.9 (−0.9)2 = 0.81
6.6 6.6 – 7= -0.4 (−0.4)2 = 0.16
7.3 7.3 – 7= 0.3 (0.3)2 = 0.09 ∑(𝑥 − 𝑥̅ )2 5.38
𝜎 =√ =√ = 𝟎. 𝟖𝟐
5.7 5.7 – 7= -1.3 (−1.3)2 = 1.69 𝑛 8
7.1 7.1 – 7= 0.1 (0.1)2 = 0.01
7.6 7.6 – 7= 0.6 (0.6)2 = 0.36
7.1 7.1 – 7= 0.1 (0.1)2 = 0.01
8.5 8.5 – 7= 1.5 (1.5)2 = 2.25
Sum of the squared deviation = 5.38
The batteries from Dependable has the lowest standard deviation. This means that the Dependable
company produces the most consistent batteries with regards to life expectancy under constant use.
3) VARIANCE (𝝈𝟐 )
- The variance for a given set of data is the square of the standard deviation of the data
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = (𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)2
∑(𝑥 − 𝑥̅ )2
𝜎2 =
𝑛
EXAMPLES:
1) Find the variance of the data: 3, 6, 2, 9, 5
Solution: First, solve for the mean
3 + 6 + 2 + 9 + 5 25
𝑥̅ = = =5
5 5
Next, calculate the deviation between the number and the mean for each number (𝑥 − 𝑥̅ ), the square
of each of the deviations (𝑥 − 𝑥̅ )2 , and the sum of these squared deviations ∑(𝑥 − 𝑥̅ )2 .
𝒙 ̅
𝒙−𝒙 ̅)𝟐
(𝒙 − 𝒙
3 3 – 5 = -2 (−2)2 = 4
6 6–5=1 (1)2 = 1
2 2 – 5 = -3 (−3)2 = 9
9 9–5=4 (4)2 = 16
5 5–5=0 (0)2 = 0
Sum of the squared deviation = 30
Then, use the formula:
∑(𝑥 − 𝑥̅ )2 30
𝜎2 = = =𝟔
𝑛 5
MEASURES OF RELATIVE POSITION
1. z-Scores
- The z-Score for a given data value x is the number of standard deviations that x is above (if the z-Score is positive)
or below (if the z-Score is negative) the mean of the data.
𝑥 − 𝑥̅
𝑧𝑥 =
𝜎
EXAMPLES:
1) Raul has taken two tests in his chemistry class. He scored 72 on the first test, for which the mean of all scores
was 65 and the standard deviation was 8. He received a 60 on a second test, for which the mean of all scores
was 45 and the standard deviation was 12. In comparison to the other students, did Raul do better on the first
test or the second test?
Solution: Find the z-Score for each test
For the first test For the second test,
72 − 65 7 60 − 45 15
𝑧72 = = = 𝟎. 𝟖𝟕𝟓 𝑧60 = = = 𝟏. 𝟐
8 8 12 12
Raul scored 0.875 standard deviation above the mean on the first test and 1.25 standard deviations above
the mean on the second test. These z-scores indicate that, in comparison to his classmates, Raul scored
better on the second test than he did on the first test.
2) A consumer group tested a sample of 100 light bulbs. It found that the mean life expectancy of the bulbs was
842 h, with a standard deviation of 90. One particular light bulb from the DuraBright Company had a z-score
of 1.2. What was the life span of this light bulb?
Solution: Use the z-Score formula to find x (life span of the light bulb)
𝑥 − 𝑥̅
𝑧𝑥 =
𝜎
𝑥 − 842
1.2 =
90
𝑥 = 1.2(90) + 842 = 𝟗𝟓𝟎
The light bulb has a life span of 950 hours.
2. Percentiles
- are measures used in indicating the value below which a given percentage of observations in a group of
observation falls.
For example, the 20th percentile is the value (or score) below which 20% of the observations may be found.
Also, 80% of the observations are found above the 20th percentile.
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑥
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒 𝑥 = ∙ 100
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
EXAMPLES:
1) According to the U.S. Department of Labor, as of May 2009 the median annual salary for a physical therapist
was $74,480. If the 90th percentile for the annual salary of a physical therapist was $105,900, find the percent
of physical therapists whose annual salary was
a. more than $74,480.
b. less than $105,900.
c. between $74,480 and $105,900.
Solution:
a. By definition, the median is the 50th percentile. Therefore, 50% of the physical therapists earned
more than $74,480 per year.
b. Because $105,900 is the 90th percentile, 90% of all physical therapists made less than $105,900.
c. From parts a and b, 90% - 50% = 40% of the physical therapists earned between $74,480 and
$105,900.
2) On a reading examination given to 900 students, Elaine’s score of 602 was higher than the scores of 576 of
the students who took the examination. What is the percentile for Elaine’s score?
Solution: Use the formula for finding percentiles
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑥
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒 𝑥 = ∙ 100
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
576
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝐸𝑙𝑎𝑖𝑛𝑒 ′ 𝑠 𝑠𝑐𝑜𝑟𝑒 = ∙ 100 = 𝟔𝟒
902
Elaine’s score of 602 places her at the 64th percentile.
NOTE: This does not mean that Elaine answered 64% of the examination questions correctly. A 64 th
percentile score means that Elaine scored higher than 64% of the students who took the test.
Consequently, Elaine scored lower than 36% (100% - 64%) of the students who took the test. The same
concept applies to your USMCEE percentile rank results.
3. Quartiles
- These are three numbers Q1, Q2, and Q3 that partition a ranked data set into four (approximately) equal groups.
- The quartile Q1 is called the first quartile. The quartile Q2 is called the second quartile. The second quartile is the
median of the data. The quartile Q3 is called the third quartile.
The Median Procedure for Finding Quartiles
1. Rank the data either in ascending order (lowest to highest).
2. Find the median of the data. This is the second quartile, Q2.
3. The first quartile, Q1, is the median of the data values less than Q2. The third quartile, Q3, is the median of
the data values greater than Q2.
EXAMPLE:
1) Find the quartiles of the ranked data: 2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, 104, 159, 181, 312, 354
Solution: Find the second quartile, Q2. This is the median of the data.
2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, 104, 159, 181, 312, 354
𝑄2
Next, find the first quartile Q1. It is the median of the data values lower than Q2.
2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, 104, 159, 181, 312, 354
𝑄1 𝑄2
Lastly, find the third quartile Q3. It is the median of the data values higher than Q2.
2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, 104, 159, 181, 312, 354
𝑄1 𝑄2 𝑄3
The quartiles of the data are: 𝑄1 = 11, 𝑄2 = 29 𝑎𝑛𝑑 𝑄3 = 104