0% found this document useful (0 votes)
14 views8 pages

STATISTICS (Averages and Variation)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

STATISTICS (Averages and Variation)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CHAPTER 3

Averages and Variation

PART 1

MODE - for discrete data, the mode is the value that occurs the most - may involve
one or two or even three values

Example:
✔ 1,1,2,2,2,3,4,5,6,6 mode= 2 ✔ -1,-1,0,0,0,1,2,3,3,4,4,4,4 mode= 0, 4
✔ 5,6,8,10,12,15,20 no mode ✔ 8,8,9,9,10,10,11,11,12,12 no mode - for
continuous data, its is (are) the peak(s) of the distribution

Advantages:
⮚ Easy to fine
⮚ Not sensitive to extreme values
⮚ Only measure of central tendency for categorical data

Disadvantages:
⮚ Only uses some of the data

MEDIAN - the central value of an ordered distribution: half of the dataisbelow the
median and half of the data is above the median 1. Order the data from the smallest to
largest
2. For an odd number of values, the median is the middle value 3. For an even
number of values, the median is the average of thetwo middle values

Example:
5,6,6,8,10 median = 6
5,6,8,10,12,14 median = 9 (8+10/2)

For a large data sets, it is handy to know that the position of the meanis n + 1
2
Advantage:
⮚ Not sensitive to extreme values
Disadvantage:
⮚ Only includes one or two data values
sum of all values
MEAN - the average value. For discrete data
number of values
Where:
● n is the sample size
● N is the population size

Example:
1,1,1,2,2,3,3,4,5,5 mean = 2.7 (27/10)
1,1,1,2,2,3,3,4,5,100 mean = 12.2 (122/10)

Advantages:
⮚ Every data value is used
⮚ Reliable:means of samples from the same population do not varymuch (relatively
speaking)
Disadvantage:
⮚ Sensitive to extreme values

TRIMMED MEAN - we trim k% from both “ends” of the data: removeextreme


values.
Procedures:
1. Put the data in order from the smallest to largest
n k%
100
2. Calculate how many values make up k%
3. Discard the number of values from (2) fromthe top andthebottom of the data
4. Calculate the mean on the remaining values

Example:
Calculate a 5% trimmed mean

1,1,2,3,4,4,5,5,5,6,6,6,6,7,7,8,9,10,18n = 19
5% of 19 = 95
WEIGHTED MEAN - gives more weight or importance to some values: like grades

Example:
You want to know your grade in statistics before the final exam. You currently have a
homework (20%) grade of 92, three test grades(12% each) of 100, 85, 96, and a
participation grade (20%) of 98.
PART 2

RANGE - the overall spread of the data between the minimumandmaximum


values
R = max - min

Example:
-1,-1,0,0,0,1,2,3,3,4,4,4
5,6,8,10,12,15,100

Advantage:
⮚ Easy to find
Disadvantage:
⮚ Very sensitive to extreme values
⮚ Does not provide information about the shape

STANDARD DEVIATION - it measures the variation of all values fromthe mean.

Advantages:
⮚ Uses all values
⮚ Same units as the data
Disadvantages
⮚ Difficult to calculate
⮚ Sensitive to extreme values

Note: the variance is the square of the standard deviation


* The round-off rule for science states that you include one moredecimal place than
you have in your data. But you do not round until thefinal answer

PART 3

COEFFICIENT OF VARIATION (CV) - it is a measure of relativevariation. We use it


to compare the variation in two or more samples or populations

Note: It is always better to have less variation


PART 4
CHEBYSHEV’S THEOREM

● Use to determine the minimum proportion of data (or the population) that must lie
within more (greater) than 1 standard deviation toeither side of the mean
● For any set of data (either population or sample) and for any constant k greater than 1,
the proportion of the data that must lie withinkstandard deviations on either side of
the mean is at least

● It applies to any distribution as long as the man and standarddeviation are defined
(finite)
● Tells us the minimum proportion (percentage) of the data (or thepopulation) that falls
within k standard deviations of the mean(either side of the mean)
● A minimum of 88.9% of the data falls between the values 3 standarddeviations below
the mean and 3 standard deviations above the mean. ⮚ This implies that a maximum
of 11.1% of data fall beyond3standard deviations of the mean
⮚ Such values might be suspect outliers, particularly for amound-shaped symmetric
distribution
PERCENTILE, QUARTILES & 5# SUMMARY

PERCENTILE - the Pth percentile (1< P< 99) of a distribution is a valuesuch that P%
of the data fall below it and (100-P)%of the data fall or above it.

Example:
If you are in the 89
th
percentile of math score, what %of students
have scores:
a. Below yours? 89%
b. Above yours? 11% (100 - 89%)

Note: There is no 100


th
percentile because any person is part of 100%soa
100% can’t be below that person’s score because the person is

QUARTILES
Q1 = 25
th
percentile
Q2 = 50
th
percentile (median)
Q3 = 75
th
percentile

Procedure:
1. Put the data in order from the smallest to largest 2. Find the median (Q2)
3. Find the median of the values below (not equal to) the median-Q14. Find
the
median of the values above (not equal to) the median -Q3 5 NUMBER

SUMMARY

1. Minimum value = 111


2. Q1 = 182
3. Q2 = 221.5
4. Q3 = 319
5. Maximum value = 439

The 5 number summary for example 2 are:


111, 182, 221.5, 319, 439

BOX AND WHISKER PLOTS (BOX PLOTS) - a useful technique from


exploratory data analysis for describingdata

Procedure:
1. Draw a scale horizontal scale
2. Above the scale draw a box from Q1 to Q3 (height of boxcanvary)
3. Draw a solid vertical line from the top to the bottomof thebox at Q2
4. Draw horizontal lines (whiskers) from the left end of thebox(Q1) to the
minimum (lowest) value (located verticallynear the center of the box) and from
the right end of the box(Q3) to the maximum (highest) value

Symmetric Distribution - if the line for Q2 id approximatelyat thecenter of the


box, the distribution is symmetric

Skewed to the left - the line is closer to Q3; left (horizontal) or lower (vertical) side of
box bigger
Skewed to the right - the line is closer to Q1; right side (horizontal) on upper side
(vertical) is bigger

You might also like