CH 3
CH 3
Departments of Laboratory
Descriptive Statistics:
Numerical Summary Measures
By:Hana.S. (MPH/Epi)
Numerical summary
measures
A single number which quantify the characteristics of a
distribution of values.
2
Measures of Central Tendency (MCT)
19 21 20 34 22 24 27
27 20
27
x = x.
i=1 i
n
7
The Summation Notation
b) Grouped data
• We assume that all values falling into a particular class interval are located at
the mid-point of the interval. It is calculated as
follow: k
i=1
m ifi
x = k
f
i i=1
• where,
10
Properties of the arithmetic mean
• For given set of data there is one and only one arithmetic
mean (uniqueness).
• It is easily calculate and understand (simple).
a) ungrouped data
observation.
13
The median is a better measure of central tendency (than the mean)
when the distribution is skewed
14
b) Grouped data
15
Median for Grouped data…..
To find a unique median value, use the following formal.
nF
x = Lm 2 c W
~
• where, fm
• Lm = lower true class boundary of the interval containing the median
• n = total number of
observations
16
Example. Compute the median age of 169 subjects from the grouped
data.
17
• n/2 = 84.5 = in the 3rd class interval
• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• Fc = 70
• (n/2 – fc) = 84.5-70 = 14.5
• Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33
18
Properties of median
• There is only one median for a given set of data
(uniqueness)
• The median is easy to calculate
• The mode of grouped data usually refers to the modal class with
the highest frequency.
• If a single value for the mode of grouped data must be specified, it is
taken as the mid point of the modal class interval.
21
Properties of mode
It is not affected by extreme values
Often its value is not unique (more than
one mode is possible)
The main drawback of mode is that often
it does not exist, therefore it is not a good
summary of the majority of the data.
22
Quartiles
• If the data are divided into four equal parts, we speak
of quartiles.
• The median divides the data into two equal parts
are less than Q2. [50th percentile] The second quartile is the
median.
c) The third quartile (Q3): 75% of all the ranked observations are
less than Q3. [75th percentile] 104
Percentiles
24
– P0: The minimum
– P25: 25% of the sample values are less than or equal to this value.
P25 means 1st Quartile or 25th percentile and given by:-
0.25(n+1)th observation
– P50: 50% of the sample are less than or equal to this value. 2nd
Quartile or 50th percentile and given by:-
0.5(n+1)th observation
– P75: 75% of the sample values are less than or equal to this
value. 3rd Quartile or 75th percentile and given by:-
0.75(n+1)th observation
– P100: The maximum
25
Example: Birth weight in grams
2069, 2581, 2759, 2834, 2838, 2841, 3031, 3101, 3200, 3245, 3248,
3260, 3265, 3314, 3323, 3484, 3541, 3609, 3649, 4146
find the 10th and 90th percentile of the data set.
10th percentile = 0.1(20+1) = 2.1th value
26
27
Descriptive statistics
Measures of
dispersion
28
Measures of Dispersion……
29
Measures of Dispersion
• The amount may be small when the values are close together.
• Example –
– Range = 42-5 = 37
31
Properties of range
32
2. Inter-quartile range (IQR)
IQR = Q3 - Q1
i.e., 50% of the infant girls weigh between 8.8 and 10.2 Kg.
33
Example 2
• Given the following data set (age of patients):-
• Solution: 18 21 23 24 24 32 42 59
• 1st quartile = {(n+1)/4}th = (2.25)th = (21 + 23)/2 =
22
• 3rd quartile = {3/4 (n+1)}th = (6.75)th = (32 + 42)/2 =
37
• Hence, IQR = 37 - 22 = 15
34
Properties of IQR:
• It encloses the central 50% of the observations
35
36
37
38
39
40
n
(x i x) 2
i=1
S2
n-
1
41
n
(x i x) 2
i=1
S2
n-
1
42
n
(x i x) 2
i=1
S2
n-
1
43
n
(x i x) 2
i=1
S2
n-
1
44
n
(x i x) 2
i=1
S2
n-
1
45
n
(x i x) 2
i=1
S2
n-
1
46
n
(x i x) 2
i=1
S2
n-
1
47
n
(x i x) 2
i=1
S2
n-
1
48
n
(x i x) 2
i=1
S2
n-
1
49
n
(x i x) 2
i=1
S2
n-
1
50
Example. Compute the variance and SD of the age of 169 subjects from
the grouped data.
Mean = 5810.5/169 = 34.48
years S2 = 20199.22/169-1 =
120.23
SD = √S2 = √120.23 = 10.96
Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19 14.5 4 -19.98 399.20 1596.80
20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80
Total 169 1901.20 20199.22
51
Properties of SD
• Has the advantage of being expressed in the same units
of measurement as the mean
53
CV is the ratio of the SD to the mean multiplied by
100.
S
C V x 100
SD Mean CV (%)
54
Skewed distributions
Skewness: If extremely low or extremely high observations are
present in a distribution, then the mean tends to shift towards
those scores.
Based on the type of Skewness, distributions can be:
55
B. Negatively skewed distribution: occurs when majority of
scores are at the right end of the curve and a few small scores
are scattered at the left end.
56
Mean, Median & Mode
57
Which measures to use?
• When the distribution is symmetric, summarize the data using means and
standard deviations.
• When the data are skewed, it is preferable to use the median and IQR as
summary statistics.
• Median and IQR are not easily influenced by extreme values in a
skewed
distribution unlike means and standard deviations.
• Remark:
• The mean and median of symmetric distribution coincide.
• When skewed to the right, its mean is larger than its median.
• When skewed to the left, its mean is smaller than its median.(see fig.58a-
Median Mode Mean
Fig. 2(a). Symmetric Distribution Mode Median Mean
Fig. 2(b). Distribution skewed to the right