0% found this document useful (0 votes)
12 views59 pages

CH 3

1) Measures of central tendency and dispersion are numerical summary measures used to describe the characteristics of a distribution. Common measures of central tendency include the mean, median, and mode. (2) The mean is the average and is calculated by summing all values and dividing by the total number. The median is the middle value when values are arranged in order. The mode is the most frequent value. (3) Other measures described include quartiles which divide the data into four equal parts using the first, second (median), and third quartiles.

Uploaded by

temesgengetaye09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views59 pages

CH 3

1) Measures of central tendency and dispersion are numerical summary measures used to describe the characteristics of a distribution. Common measures of central tendency include the mean, median, and mode. (2) The mean is the average and is calculated by summing all values and dividing by the total number. The median is the middle value when values are arranged in order. The mode is the most frequent value. (3) Other measures described include quartiles which divide the data into four equal parts using the first, second (median), and third quartiles.

Uploaded by

temesgengetaye09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 59

Hossana Health Science Collage,

Departments of Laboratory

Descriptive Statistics:
Numerical Summary Measures
By:Hana.S. (MPH/Epi)
Numerical summary
measures
 A single number which quantify the characteristics of a
distribution of values.

 Measures of central tendency (location)

 Measures of dispersion (variability)

2
Measures of Central Tendency (MCT)

• On the scale of values of a variable there is a


certain stage at which the largest number of
items tend to cluster.
• Since this stage is usually in the centre of
distribution, the tendency of the statistical data
to get concentrated at a certain value is called
“central tendency”
• The various methods of determining the point
about which the observations tend to
concentrate are called MCT.
• The objective of calculating MCT is to
determine a single figure which may be used to
represent the whole data set.

• In that sense it is an even more compact


description of the statistical data than the
frequency distribution.

• Since a MCT represents the entire data, it


facilitates comparison within one group or
between groups of data.
• The most common measures of central
tendency include:
– Arithmetic Mean
– Median
– Mode
– Others
1. Arithmetic Mean
A. Ungrouped Data
• The arithmetic mean is the "average" of the data set and
by far the most widely used measure of central location
• Is the sum of all the observations divided by the total
number of observations.
Example

19 21 20 34 22 24 27
27 20
27

• Then, Mean = (19 + 21 + … +27) = 24.1


10
• General formula
a) Ungrouped data
If x 1 , x 2 , ..., x n are n observed
n values, then

x =  x.
i=1 i
n
7
The Summation Notation
b) Grouped data
• We assume that all values falling into a particular class interval are located at
the mid-point of the interval. It is calculated as

follow: k


i=1
m ifi
x = k

 f
i i=1

• where,

k = the number of class intervals

mi = the mid-point of the ith class interval

fi = the frequency of the ith class interval


9
Example. Compute the mean age of 169 subjects from the grouped
data.
Mean = 5810.5/169 = 34.38 years

Class interval Mid-point (mi) Frequency (fi) mifi


[10-19] 14.5 4 58.0
[20-29] 24.5 66 1617.0
[30-39] 34.5 47 1621.5
[40-49] 44.5 36 1602.0
[50-59] 54.5 12 654.0
[60-69] 64.5 4 258.0

Total 169 5810.5

10
Properties of the arithmetic mean
• For given set of data there is one and only one arithmetic
mean (uniqueness).
• It is easily calculate and understand (simple).

• Poor measure of central location if the underlying distribution


is not normal (or not Gaussian).
• Influenced by each and every value in the data set hence
affected by the extreme values.
• In grouped data if any class interval is open, arithmetic mean
can not be calculated. 11
12
Median
• With the observations arranged in increasing or decreasing order,
the median is defined as the middle observation.

a) ungrouped data

If observations are odd, the median is defined as the [(n+1)/2]th

observation.

• If observations are even the median is the average of the two


middle (n/2)th and [(n/2)+1]th values i.e
Example : 19 2 0 20 21 22 24 27 27 27 34
• Then, the median = (22 + 24)/2 = 23

13
The median is a better measure of central tendency (than the mean)
when the distribution is skewed

14
b) Grouped data

 we assume that the values within a class-interval are evenly


distributed through the interval.
– The first step is to locate the class interval in which it
is located.
– Find n/2 and see a class interval with a minimum
cumulative frequency which contains n/2.

15
Median for Grouped data…..
To find a unique median value, use the following formal.

nF
 
x = Lm  2 c W
~
• where,   fm 
 
• Lm = lower true class boundary of the interval containing the median

• Fc = cumulative frequency of the interval just above the median class


interval
• fm = frequency of the interval containing the median
• W= class interval width

• n = total number of
observations

16
Example. Compute the median age of 169 subjects from the grouped
data.

n/2 = 169/2 = 84.5

Class interval Mid-point (mi) Frequency (fi) Cum. freq


[10-19] 14.5 4 4
[20-29] 24.5 66 70
[30-39] 34.5 47 117
[40-49] 44.5 36 153
[50-59] 54.5 12 165
[60-69] 64.5 4 169
Total 169

17
• n/2 = 84.5 = in the 3rd class interval
• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• Fc = 70
• (n/2 – fc) = 84.5-70 = 14.5
• Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

18
Properties of median
• There is only one median for a given set of data
(uniqueness)
• The median is easy to calculate

• Median is a positional average and hence it is not


sensitive to very large or very small values.
• The median is a better measure of central tendency
(than the mean) when the distribution is skewed
(not normal)
• Can be calculated even in the case of open end
intervals 19
Mode
• It is a value that occur most often.

• Most distributions have one peak and are described as uni-modal.


• E.g. 19 21 20 20 34 22 24 27 27 27
• The mode is 27, because the value 27 occurs three times (the
most frequent).
• Some distributions have more than one mode

 Unimodal: A distribution with one mode.

 Bimodal: A distribution with two modes.

 Trimodal: A distribution with three modes.


20
Mode….

• The mode of grouped data usually refers to the modal class with
the highest frequency.
• If a single value for the mode of grouped data must be specified, it is
taken as the mid point of the modal class interval.
21
Properties of mode
It is not affected by extreme values
Often its value is not unique (more than
one mode is possible)
The main drawback of mode is that often
it does not exist, therefore it is not a good
summary of the majority of the data.

22
Quartiles
• If the data are divided into four equal parts, we speak
of quartiles.
• The median divides the data into two equal parts

a) The first quartile (Q1): 25% of all the ranked

observations are less than Q1. [25th percentile]

b) b) The second quartile (Q2): 50% of all the ranked observations

are less than Q2. [50th percentile] The second quartile is the
median.

c) The third quartile (Q3): 75% of all the ranked observations are
less than Q3. [75th percentile] 104
Percentiles

 Simply divide the data into 100


pieces.
 Commonly used percentiles:
→ 10, 20, … . . 90% (deciles)
→ 20, 40, … . . 80% (quintiles)
→ 25, 50, 75% (quartiles)
→ 33.3, 66.7% (tertiles)

24
– P0: The minimum

– P25: 25% of the sample values are less than or equal to this value.
P25 means 1st Quartile or 25th percentile and given by:-
0.25(n+1)th observation

– P50: 50% of the sample are less than or equal to this value. 2nd
Quartile or 50th percentile and given by:-

0.5(n+1)th observation
– P75: 75% of the sample values are less than or equal to this
value. 3rd Quartile or 75th percentile and given by:-

0.75(n+1)th observation
– P100: The maximum
25
Example: Birth weight in grams

2069, 2581, 2759, 2834, 2838, 2841, 3031, 3101, 3200, 3245, 3248,
3260, 3265, 3314, 3323, 3484, 3541, 3609, 3649, 4146
 find the 10th and 90th percentile of the data set.
 10th percentile = 0.1(20+1) = 2.1th value

 the average of the 2nd and 3rd values = (2581+2759)/2 = 2670


g
 90th percentile = 0.9(20+1) = 18.9th value

 the average of the18th and 19th values = (3609+3649)/2 =


3629 g

26
27
Descriptive statistics
Measures of
dispersion

28
Measures of Dispersion……

Consider the following two sets of data:


A: 177, 193, 195, 209, 226 Mean = 200
B: 192, 197, 200, 202, 209 Mean = 200

• Two or more sets may have the same mean and/or


median but they may be quite different.
• MCT are not good to describe about the variability
or spread of the values.

29
Measures of Dispersion

• Measures that quantify the variation or dispersion of a set of data


from its central location.

• Dispersion refers to the variety exhibited by the values of


the data.

• The amount may be small when the values are close together.

• If all the values are the same, no dispersion


30
1. Range (R)
• The difference between the largest and smallest observations in a
data set.

• Range = Maximum value – Minimum value

• Example –

– Data values: 5, 9, 12, 16, 23, 34, 37, 42

– Range = 42-5 = 37

31
Properties of range

 It is the simplest crude measure and can be easily understood


 It takes into account only two values which causes it to be a
poor measure of dispersion
 Very sensitive to extreme observations

32
2. Inter-quartile range (IQR)

• Indicates the spread of the middle 50% of the observations,


and used with median

IQR = Q3 - Q1

Example: Suppose the first and third quartile for weights of


girls 12 months of age are 8.8 Kg and 10.2 Kg, respectively.

IQR = 10.2 Kg – 8.8 Kg

i.e., 50% of the infant girls weigh between 8.8 and 10.2 Kg.

33
Example 2
• Given the following data set (age of patients):-

18, 59, 24, 42, 21, 23, 24, 32


• Find the inter-quartile range

• Solution: 18 21 23 24 24 32 42 59
• 1st quartile = {(n+1)/4}th = (2.25)th = (21 + 23)/2 =
22
• 3rd quartile = {3/4 (n+1)}th = (6.75)th = (32 + 42)/2 =
37
• Hence, IQR = 37 - 22 = 15

34
Properties of IQR:
• It encloses the central 50% of the observations

• It is not based on all observations but only on two specific


values
• It is important in selecting cut-off points in the formulation
of clinical standards.
• Since it excludes the lowest and highest 25% values, it is
not affected by extreme values
• Less sensitive to the size of the sample

35
36
37
38
39
40
n
 (x i  x) 2
i=1
S2 
n-
1

41
n
 (x i  x) 2
i=1
S2 
n-
1

42
n
 (x i  x) 2
i=1
S2 
n-
1

43
n
 (x i  x) 2
i=1
S2 
n-
1

44
n
 (x i  x) 2
i=1
S2 
n-
1

45
n
 (x i  x) 2
i=1
S2 
n-
1

46
n
 (x i  x) 2
i=1
S2 
n-
1

47
n
 (x i  x) 2
i=1
S2 
n-
1

48
n
 (x i  x) 2
i=1
S2 
n-
1

49
n
 (x i  x) 2
i=1
S2 
n-
1

50
Example. Compute the variance and SD of the age of 169 subjects from
the grouped data.
Mean = 5810.5/169 = 34.48
years S2 = 20199.22/169-1 =
120.23
SD = √S2 = √120.23 = 10.96
Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19 14.5 4 -19.98 399.20 1596.80
20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80
Total 169 1901.20 20199.22

51
Properties of SD
• Has the advantage of being expressed in the same units
of measurement as the mean

• The best measure of dispersion and is used widely because of the


properties of the theoretical normal curve.

• However, if the units of measurements of variables of two data sets


is not the same, then there variability can‟t be compared by
comparing the values of SD.
52
Coefficient of variation (CV)
 When two data sets have different units of measurements the CV
should be used as a measure of dispersion.
 It is the best measure to compare the variability of two series of
sets of observations.
 Data with less coefficient of variation is considered
more consistent.

53
CV is the ratio of the SD to the mean multiplied by
100.

S
C V  x  100

SD Mean CV (%)

SBP 15mm 130mm 11.5


Cholesterol 40mg/dl 200md/dl 20.0

“Cholesterol is more variable than systolic blood


pressure”

54
Skewed distributions
 Skewness: If extremely low or extremely high observations are
present in a distribution, then the mean tends to shift towards
those scores.
 Based on the type of Skewness, distributions can be:

A. Positively skewed distribution: Occurs when the majority of


scores are at the left end of the curve and a few extreme large
scores are scattered at the right end.

55
B. Negatively skewed distribution: occurs when majority of
scores are at the right end of the curve and a few small scores
are scattered at the left end.

C. Symmetrical distribution: It is neither positively


nor negatively skewed.

A curve is symmetrical if one half of the curve is the mirror


image of the other half.

56
Mean, Median & Mode

57
Which measures to use?
• When the distribution is symmetric, summarize the data using means and
standard deviations.
• When the data are skewed, it is preferable to use the median and IQR as
summary statistics.
• Median and IQR are not easily influenced by extreme values in a
skewed
distribution unlike means and standard deviations.
• Remark:
• The mean and median of symmetric distribution coincide.

• When skewed to the right, its mean is larger than its median.

• When skewed to the left, its mean is smaller than its median.(see fig.58a-
Median Mode Mean
Fig. 2(a). Symmetric Distribution Mode Median Mean
Fig. 2(b). Distribution skewed to the right

Mean = Median = Mode Mean > Median > Mode

Mean Median Mode


Fig. 2(c). Distribution skewed to the left

Mean < Median < Mode 143

You might also like