STATISTICS (Averages and Variation)
STATISTICS (Averages and Variation)
PART 1
MODE - for discrete data, the mode is the value that occurs the most - may involve
one or two or even three values
Example:
✔ 1,1,2,2,2,3,4,5,6,6 mode= 2 ✔ -1,-1,0,0,0,1,2,3,3,4,4,4,4 mode= 0, 4
✔ 5,6,8,10,12,15,20 no mode ✔ 8,8,9,9,10,10,11,11,12,12 no mode - for
continuous data, its is (are) the peak(s) of the distribution
Advantages:
⮚ Easy to fine
⮚ Not sensitive to extreme values
⮚ Only measure of central tendency for categorical data
Disadvantages:
⮚ Only uses some of the data
MEDIAN - the central value of an ordered distribution: half of the dataisbelow the
median and half of the data is above the median 1. Order the data from the smallest to
largest
2. For an odd number of values, the median is the middle value 3. For an even
number of values, the median is the average of thetwo middle values
Example:
5,6,6,8,10 median = 6
5,6,8,10,12,14 median = 9 (8+10/2)
For a large data sets, it is handy to know that the position of the meanis n + 1
2
Advantage:
⮚ Not sensitive to extreme values
Disadvantage:
⮚ Only includes one or two data values
sum of all values
MEAN - the average value. For discrete data
number of values
Where:
● n is the sample size
● N is the population size
Example:
1,1,1,2,2,3,3,4,5,5 mean = 2.7 (27/10)
1,1,1,2,2,3,3,4,5,100 mean = 12.2 (122/10)
Advantages:
⮚ Every data value is used
⮚ Reliable:means of samples from the same population do not varymuch (relatively
speaking)
Disadvantage:
⮚ Sensitive to extreme values
Example:
Calculate a 5% trimmed mean
1,1,2,3,4,4,5,5,5,6,6,6,6,7,7,8,9,10,18n = 19
5% of 19 = 95
WEIGHTED MEAN - gives more weight or importance to some values: like grades
Example:
You want to know your grade in statistics before the final exam. You currently have a
homework (20%) grade of 92, three test grades(12% each) of 100, 85, 96, and a
participation grade (20%) of 98.
PART 2
Example:
-1,-1,0,0,0,1,2,3,3,4,4,4
5,6,8,10,12,15,100
Advantage:
⮚ Easy to find
Disadvantage:
⮚ Very sensitive to extreme values
⮚ Does not provide information about the shape
Advantages:
⮚ Uses all values
⮚ Same units as the data
Disadvantages
⮚ Difficult to calculate
⮚ Sensitive to extreme values
PART 3
● Use to determine the minimum proportion of data (or the population) that must lie
within more (greater) than 1 standard deviation toeither side of the mean
● For any set of data (either population or sample) and for any constant k greater than 1,
the proportion of the data that must lie withinkstandard deviations on either side of
the mean is at least
● It applies to any distribution as long as the man and standarddeviation are defined
(finite)
● Tells us the minimum proportion (percentage) of the data (or thepopulation) that falls
within k standard deviations of the mean(either side of the mean)
● A minimum of 88.9% of the data falls between the values 3 standarddeviations below
the mean and 3 standard deviations above the mean. ⮚ This implies that a maximum
of 11.1% of data fall beyond3standard deviations of the mean
⮚ Such values might be suspect outliers, particularly for amound-shaped symmetric
distribution
PERCENTILE, QUARTILES & 5# SUMMARY
PERCENTILE - the Pth percentile (1< P< 99) of a distribution is a valuesuch that P%
of the data fall below it and (100-P)%of the data fall or above it.
Example:
If you are in the 89
th
percentile of math score, what %of students
have scores:
a. Below yours? 89%
b. Above yours? 11% (100 - 89%)
QUARTILES
Q1 = 25
th
percentile
Q2 = 50
th
percentile (median)
Q3 = 75
th
percentile
Procedure:
1. Put the data in order from the smallest to largest 2. Find the median (Q2)
3. Find the median of the values below (not equal to) the median-Q14. Find
the
median of the values above (not equal to) the median -Q3 5 NUMBER
SUMMARY
Procedure:
1. Draw a scale horizontal scale
2. Above the scale draw a box from Q1 to Q3 (height of boxcanvary)
3. Draw a solid vertical line from the top to the bottomof thebox at Q2
4. Draw horizontal lines (whiskers) from the left end of thebox(Q1) to the
minimum (lowest) value (located verticallynear the center of the box) and from
the right end of the box(Q3) to the maximum (highest) value
Skewed to the left - the line is closer to Q3; left (horizontal) or lower (vertical) side of
box bigger
Skewed to the right - the line is closer to Q1; right side (horizontal) on upper side
(vertical) is bigger