0% found this document useful (0 votes)
32 views8 pages

Statistical Measures

This document discusses various statistical measures used to summarize data including measures of central tendency like the mean, median, and mode. It provides examples and formulas for calculating these measures and discusses their advantages and disadvantages. Measures of dispersion are also introduced.

Uploaded by

francis Magoba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views8 pages

Statistical Measures

This document discusses various statistical measures used to summarize data including measures of central tendency like the mean, median, and mode. It provides examples and formulas for calculating these measures and discusses their advantages and disadvantages. Measures of dispersion are also introduced.

Uploaded by

francis Magoba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

LESSON TWO: CONTINUATION OF DESCRIPTIVE STATISTICS

STATISTICAL MEASURES

In the preceding part of the course, we studied tables and graphs as methods of organizing, visually
summarizing and displaying data. Among the highlights we sort to depict are central tendency,
dispersion, skewness (lack of symmetry), Kurtosis (degree of flatness or peakedness at the top of
a distribution). Although these techniques are extremely useful, they do not allow us to make
concise, quantitative statements that characterize a distribution as a whole. In order to do this we
rely on numerical summary measures.

Measures of Central Tendency

There are normally three commonly investigated measures of central tendency; namely mean,
mode and median.

Measures of Central Tendency

Measures of central tendency

Mean Median Mode

Arithmetic Mean Geometric Mean Harmonic Meam

Mean
The most frequently investigated characteristic of a set of data is its center. The commonly used
measure of central tendency is the arithmetic mean, or average. It is calculated by summing all the
observation in a set of data and dividing by the total number of measurements. In the table below,
we have n=13 observations. If x is used to represent FEV1, then x1 = 2.30 denotes the first in the
series of observations; x2 = 2.15, the second, and so on up through x13 =3.8. In general, xi refers to
a single measurement where i can take on any value from 1 to n. The mean of the observations in

1 n
x   xi
n i 1
the sample represented by x
Table: Forced expiratory volume in 1 second for 13 adolescents suffering from asthma

Subject Fev1{liters}
1 2.30
2 2.15
3 3.50
4 2.60
5 2.75
6 2.82
7 4.05
8 2.25
9 2.68
10 3.00
11 4.02
12 2.85
13 3.38

For the FEV 1 data, therefore, x = 2.95 liters

For discrete frequency distributions, the mean is given by;

Example2.

The following is a frequency distribution of fasting serum insulin (μU/ml)(expressed as whole


numbers) of males in some rural area.
Freq 1 9 20 32 22 23 19 20 13 10 8
μU/ml 7 9 11 13 15 17 19 21 23 25 27

The mean = 2775/160=17.34 μU/ml

For grouped frequency distributions, the mean is where xi’s are the midpoints of
the classes.
Example 3.

Cholesterol Number of men Midpoints fixi


level (fi) xi
[mg/ 100 ml ]
80-119 13 99.5 1293.50
120-159 150 135.5 20925.00
160-199 442 175.5 79339.00
200-239 299 215.5 65630.50
240-279 115 255.5 29842.50
280-319 34 295.5 10186.40
320-359 9 335.5 3055.50
360-399 5 395.5 1897.50

Total 1,069 212169

Now the mean = 198.8 mg/ 100 ml

Advantages
(i) Readily Understood
(ii) Can be treated algebraically and easy to compute
(iii)It is stable as regards sampling fluctuations

Disadvantage
(i) Affected by extreme values

Median
The median is defined as the 50th percentile of a set of measurements. It can be used as a summary
measure for ordinal data as well as discrete and continuous data. If a set of data contains a total of
n (odd) observations, the median is the middle value, or the [(n+1)/2] th largest measurement; if n
is even, the median is usually taken to be the average of the two middlemost values, the [(n/2)]th
and the [(n+1)/2]th observation. In the example considered above, the ranked 13 FEV1
measurements would be
2.15, 2.25, 2.30, 2.60, 2.68, 2.75, 2.82, 2.85, 3.00, 3.38, 3.50, 4.02, 4.05
Since n = 13, is odd, the median is the [(13+1)/2] = 7th observation or 2.82 liters. In the situation
where the FEV1 of subject 11 was recorded as 40.2 rather than 4.02, the ranking of the
measurement would change only slightly but the median would still be 2.82 liters. The median is
said to be robust; that is, it is much less sensitive to unusual data points than is the mean.

For frequency distribution, we generate cumulative frequency and the median will be the value
corresponding to a cumulative frequency of N/2.
In the example of fasting serum insulin we proceed as follows;
Example 4.
The following is a frequency distribution of fasting serum insulin (μU/ml)(expressed as whole
numbers) of males in some rural area.
μU/ml 7 9 11 13 15 17 19 21 23 25 27
Freq 1 9 20 32 22 23 19 20 13 10 8
Cf 1 10 30 62 84 107 126 146 159 169 177

The median is the value in 177/2th position = 17 μU/ml

Given a continuous (or grouped discrete) frequency distribution the median is obtained as follows
(i) Prepare a cumulative frequency table
(ii) Determine the median class by identifying the class corresponding to N/2th c.f

Estimated median =

Example 5.

Cholesterol Number of men Cumulative


level (fi) frequency
[mg/ 100 ml ]
80-119 13 13
120-159 150 163
160-199 442 605
200-239 299 904
240-279 115 1019
280-319 34 1053
320-359 9 1062
360-399 5 1067

In this case N=1067/2 = 533.5. Thus median class is 160-199.


l=lcb of median class = 159.5
fl= cumulative frequency proceeding the median class= 163
f=median class frequency = 442
c=width of median class = 40

Therefore median, m= 167.9 mg/ 100 ml

Mode
The mode of a set of data is the observation that occurs with the highest frequency and thus is not
unique. It can be used as a summary measure for all types of data.
The best measure of central tendency for a given set of data often depends on the way in which
the values are distributed. If they are symmetric and unimodal, then the mean, median and mode
will coincide. If the distribution of the values is symmetric but bimodal, then the mean and median
should be approximately the same. A bimodal distribution often indicates that the population from
which the data is taken consists of two distinct subgroups that differ in the characteristic being
measured; in this situation it might be better to report two modes rather than the mean or the
median.
When the data are not symmetric, the median is often the best measure of central tendency.
Because the mean is sensitive to extreme observations it is pulled in the direction of the outlying
data values and as a result it might end up either excessively inflated or deflated. Note that when
the data are skewed to the right the mean lies to the right of the median; when they are skewed to
the left the mean lies to the left of the median.
Regardless of the measure of central tendency used in a particular situation it can be misleading to
assume that this value is representatives of all observations.

Example 6.

Cholesterol Number of men Cumulative


level (fi) frequency
[mg/ 100 ml ]
80-119 13 13
120-159 150 163
160-199 442 605
200-239 299 904
240-279 115 1019
280-319 34 1053
320-359 9 1062
360-399 5 1067

The modal class is 160 – 199 mg/ 100 ml since it has the highest frequency, and hence the modal
value should lie in this class. An estimate of the median is obtained as follows

In this example l= lcb of the modal class =159.5


f1= frequency proceeding the modal class = 150
fm= frequency of the modal class = 442
f2=frequency immediately after the modal class = 299
c = width of the modal class =40

Therefore the mode m= 166.2 mg/100ml


Measures of Spread/dispersion/variation

Measures of central tendency give us some idea of the size of central values, while measures of
spread give us some idea of how the values of a distribution cluster around the average. If the
dispersion is small many values cluster around the mean, whereas if the dispersion is large a
considerable proportion of values are markedly different from the average. Incidentally the
importance of the measures of central tendency such as the mean and the measures of spread such
as the standard deviation can be appreciated when it is realized that for all normally distributed
variables, be they continuous or discrete, approximately 68 per cent of the values lie within one
standard deviation (S.D) of the mean, approximately 95 per cent within two S.Ds and practically
100 per cent within three S.Ds

Measures of Spread

Measures of Dispersion

Absolute Measures Relative Measures

Range Quartile Deviation Coeff. of Variat. Coeff of


Quartile Dev

Mean Deviation Standard Dev. Coff. of


mean Dev.

Absolute Measures: These are measures of spread that carry the unit of measurement.

Range
One number that can be used to describe the variability in a set of data values is known as the
range. The range of a group of measurements is defined as the difference between the largest and
the smallest observation. Its usefulness is limited since it considers only the extreme values of a
data set rather than the majority of the observations. Therefore it is highly sensitive to
exceptionally large or exceptionally small values.

Inter-quartile Range
This is a measure of variability that is not easily influenced by extreme values. It is calculated by
subtracting the 25th percentile of the data from the 75th percentile. It encompasses the middle 50
percent of the observations.
IR = Q3 – Q1
Note: when data is a grouped frequency, we follow the same procedure as that for computing the
median except that we identify the class with Q1 and Q3 then estimate Q1 and Q3 in the same
manner as we estimated the median.

Mean deviation
It is the mean of a series of deviations from the mean or the median. If we let, d, denote the
deviations from the mean, i.e di= xi - x (absolute deviations). The mean deviation, M.D is given
by:

m.d 
d1  d 2  ....  d n

 di
n n
Where n is the number of observations recorded. A small value of M.D means a low dispersion,
while a large M.D means a high dispersion. Note: When data is a grouped frequency distribution,
the xis shall be the mid values of the respective classes, di = fi xi - x  and n = ∑fi

Variance and Standard Deviation

Another commonly used measure of spread is the variance and standard deviation. The variance
quantifies the amount of variability or spread about the mean of a sample. Assuming the
observations are given by xi, i = 1,2, …, n, then the mean x =  xi/n
The variance is given by;
n
1
2 
n 1
 (x
i 1
i  x) 2

The variance is calculated by subtracting the mean of a set of data values from each of the
observations, squaring these deviations adding them up and dividing by one less than the number
of observations in the data set representing the variance by 2.

Since the standard deviation has units of measurement, it is meaning less to compare standard
deviation for two unrelated quantities.

Note: For grouped frequencies

n
1
2 
N 1
f
i 1
i
( xi  x) 2

Where xis are the mid values of the respective classes and fi the corresponding frequencies
Relative Measures: These are measures of spread that are unit free. They are useful for purposes
of comparisons where the variables of interest have different units of measurements.

Coefficient of Variation

It is possible to make comparisons among data sets representing different quantities using a
numerical summary measure known as the coefficient of variation. It relates the standard deviation
of a sample to its mean; it is the ratio of  tox multiplied by 100 and is therefore, a measure of
relative variability. Because the standard deviation and the mean share the same units of
measurement, the units cancel out and leave the coefficient of variation as a dimensionless number.
Since it is independent of measurement unit, it can be used to compare the relative variation
between any two sets of values.
C.V = S.D/Mean × 100%

Advantages and disadvantages of various measures of dispersion

The range
It is a reasonably good indication of dispersion, but will be badly affected by just one extreme
value. Care is therefore necessary when it is used.

The mean deviation (from the mean or median)


Generally this is a good measure of dispersion since all the values are used in its computation;
however, it has the disadvantage of not being soundly based mathematically.

The standard deviation


The standard deviation has the same advantages as the mean deviation, but in addition is
mathematically sound.

Skewness

Skewness refers to lack of symmetry. If a distribution is normal, the distribution is said to be


symmetrical otherwise it is asymmetric (Skewed). For a symmetric distribution, the
mean=mode=median
If the mean>Median> Mode, the distribution is skewed to the right (Positively skewed)
If the Mode>median>Mean, then the distribution is skewed to the left (Negatively skewed)

Kurtosis

Kurtosis is a measure which indicates the degree to which a curve of a frequency distribution is
peaked or flat topped. If the distribution is more peaked than the normal distribution, it is called
“Leptokurtic”. If it is more flat than the normal, it is called “Platykurtic”. The normal distribution
is “mesokurtic”.

You might also like