Lecture 3
Lecture 3
OF
CENTRAL
TENDENCY
DESCRIPTIVE
MEASURES
Measures that summarise the data into a
single number are called descriptive
measures.
• a histogram
The Mode
MODE
The mode (or modal value) of a variable in a
data set is the value of the variable that is
observed most frequently in that data
• or, given a continuous frequency curve, is at the point
of greatest density.
x x2 ... xn x i
x 1 i 1
n n
CALCULATION OF THE
MEAN
Given the data:
1 2 3 3 3 4 4 5 5 5 5 5 5
Mean = 1+2+3+3+3+4+4+5+5+5+5 = 50
13 13
Mean = 3.85
FREQUENCY DISTRIBUTION
TABLE
MEAN FROM THE
FREQUENCY TABLE
MEAN ON A FREQUENCY CURVE
The mean is the “center of gravity” of the
distribution.
• Determine (by “eyeball” approximation) the
value of the variable such that the density
“balances” at that point; this value is the mean.
STRENGTHS OF THE
MEAN
The mean is unique for a given set of data
• There is only one mean.
It is easily understood and easy to
calculate.
It takes into consideration all the values in
the set of data.
WEAKNESS OF THE MEAN
The mean is affected by extreme values
• Because each value in the set of data is included in the
computation.
e.g. family income.
20, 30, 40, and 990
Mean = (20+30+40+990)/4 = 270.
Median = (30+40)/2 = 35.
Here 3 observations out of 4 lie between 20-40.
So, the mean 270 really fails to give a realistic
picture of the major part of the data.
It is influenced by extreme value 990
CHOOSING A MEASURE OF
CENTRAL TENDENCY
Disadvantage
It communicates very little information about
the data set
• It only takes into account the largest and
smallest value
• This makes it a poor measure of dispersion
VARIANCE
The variance is a measure of variability which
takes into account the differences between
each observation and the sample mean
Population variance:
Mathematical notation: σ² = Σ(x – μ)²
N
POPULATION
VARIANCE
Average of squared deviations of values from the mean
Calculating the variance (population)
Population Variance:
x xx x x
2
x)
(x
2
Variance S
2
(n 1)
Mean=3.0025
Variance = 0.097
Standard dev. = 0.31
VARIANCE AND S.D.
FROM THE FREQUENCY
TABLE
VARIANCE AND S.D.
FOR GROUPED DATA
RECAP OF FORMULAS
x)
(x
2
Variance S
2
(n 1)
FEATURES OF THE
STANDARD DEVIATION
• It is usually positive and NEVER negative
• It is 0 only when all data values are the same
number
• The larger value for SD the greater amount the
data varies
• It can increase dramatically with the inclusion
of outliers
• The units (minutes, feet, etc...) are the same as
the units of original values
COEFFICIENT OF VARIATION (CV)
Sometimes we may wish to compare standard
deviations in two groups.
• i.e. we may want to compare the variability in two
groups.
• The two groups may be from two different data
sets
Sample 1 Sample 2
Age 25 years 11 years
Mean weight 145 pounds 80 pounds
Standard 10 pounds 10 pounds
deviation
We wish to know which of the weights is more variable.
EXAMPLE 2
If we calculate the CV for the 25 year olds;
pth percentile
1 1 3 4 5 5 7 8 9
p% (100-p)% greater
PERCENTILES
• If approximately n percent of the items in a
distribution are less than the number x;
• then x is the nth percentile of the
distribution, denoted Pn.
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
DECILES AND
QUARTILES
Deciles and quartiles are determined in the
same manner as percentiles, since they
may be expressed as percentiles.
EXAMPLE: DECILES
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are larger)
Lastly, Only 25% of the observations are greater than the third
quartile.
QUARTILES
For any set of data (ranked in order from
least to greatest):
• The second quartile, Q2, is the median.
• The first quartile, Q1, is the median of all
items below Q2.
• The third quartile, Q3, is the median of
all items above Q2.
QUARTILES
• Quartiles are the three values (Q1, Q2, Q3) that
divide the data set into four (approximately) equal
parts.
Q1, Q2, Q3
divides ranked scores into four equal parts
25% 25% 25% 25%
(minimum)
Q1 Q2 Q3 (maximum)
(median)
INTER QUARTILE
RANGE
The interquartile range shows the spread of
the middle 50% of the data.
Interquartile Range (or IQR): Q3 - Q1
INTERQUARTILE
RANGE
The interquartile range (IQR) is a measure of
variability, based on dividing a data set into
quartiles.