0% found this document useful (0 votes)
41 views24 pages

2) SummarizationOfData Mean Median Mod SD CV

Uploaded by

uwtfme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views24 pages

2) SummarizationOfData Mean Median Mod SD CV

Uploaded by

uwtfme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Summarization and

Presentation of Data
* Measures of Central Tendency
* Measures of variation
* Coefficient of Variation
*
Especially in quantitative data is often
important to give in a single value, an indication
of the general level of a set of measurements.

Such a symbol-value is a "measure of central


tendency" (a measure of location, a mean, an average).

On the other hand it is also necessary to express


the degree of variation of raw data values around
this central figure.

-Are the data values all rather close to the mean or some of
them scatter widely around it ? –

So, a "measure of variation" (or dispersion) is required.


1 ) Measures of Location or Central tendency.
The three most common measures of central
tendency are the mean, median and mode.
Sometimes these three measurements are
very close even identical.

a – MEAN
The most commonly used central tendency measure is
the mean more appropriately titled "ARITMETIC MEAN".

For a series of quantitatively measured observations the


mean is defined as the total of observed values divided
by the number of observations
 Let us denote by n the number of observations
and by Xi each of the observed values, the mean,
 which is designed by is

_ X i
X 
n
Example: Age at death of 7 surgeons are ;
70, 68, 68, 40, 71, 73, 65.
The mean is 455 / 7 = 65 years

 As it must be remarked in the above example


unfortunately the mean is seriously affected by
extreme values.
b- MEDIAN
The median is the observed value in a set of observations
which is such that half the observations exceed it
and half are below it.
As a descriptive measure the median is not affected by
extreme observations but it is not as appropriate to
mathematical treatments as the mean.
_________________________________________________
For an odd number of ranked observations
(designed by 2n observations)
the median is [(n+(n+1))/2] th. value;
Example : 3,5,8,9,12,24 median = (8+9)/2 = 8.5
_________________________________________________
For even number of observations (2n+1 observations)
the median is n. observation value.
Example : 3, 5, 8, 9, 10, 12, 15 median is 9.
-------- -----------
c – MODE
The mode is the most frequently occurring observation value.
Example: 1, 2, 3, 3.5, 4, 4, 5, 5, 6, 6, 6, 9, 10
“6”, which frequency is 3 (three are three "6" values) is the mode.
It is to notice that sometimes mode is not a good
"central" value.

Some times there is not an obvious mode value


-if all the frequencies are the same-
or there are more then one mode value
(bimodal, multimodal series).

* Mode is generally used for descriptive purposes only.


Example (Mean)
Age distribution of seven children attending to a
children clinic is given below

{1,3,6,7,2,3,5}

n 7

x x i i
1  3  6  7  2  3  5 27
X  i 1
 i 1
 
n 7 7 7

X  3,9 years
Example (Mode)
The mode is the value of x that occurs most frequently.

Data {1,3,7,3,2,3,6,7}
Mode : 3

Data {1,3,7,3,2,3,6,7,1,1}
Mode : 1 and 3

Data {1,3,7,0,2,-3, 6,5,-1}


Mode : No mode
Example: Suppose the age in years of the first 10 subjects
enrolled in your study are;
34, 24, 56, 52, 21, 44, 64, 34, 42, 46

Then the mean age of this group is 41.7 years

To find the median, first order the data:


21, 24, 34, 34, 42, 44, 46, 52, 56, 64

The median is (42+44)/2 = 43 years

The mode is 34 years.


Comparison of Mean and Median
•Mean is sensitive to a few very large (or
small) values “outliers” so sometime mean
does not reflect the quantity desired.

• Median is “resistant” to outliers

•Mean is attractive mathematically.

•50% of sample is above the median, 50% of


sample is below the median.
Geometric mean is a summary statistic useful
when the measurement scale is not linear.
It is computed as

G  x1  x2   xn
n or log( G ) 
 log( x )
i

n
For example,
in the area of psychometrics it is well known that
the rated intensity of a stimulus (e.g., brightness of a light)
is often a logarithmic function of the actual
intensity of the stimulus (brightness measured in units of Lux).
In this instance, the geometric mean is a better
"summary" of ratings than the simple mean.
Harmonic Mean is a "summary" statistic used in
analyses of frequency data.
The harmonic mean is sometimes used to average
values that change in time.

n
HM 
1
 xi

If a variable contains a zero (0) as a valid score, then the


harmonic mean cannot be calculated (since it implies division by zero).
For example:
The annual percent of increase in tumor sizes are as follows:
“ 10, 8, 4, 1 ”

4
HM   2.71
1 1 1 1
  
10 8 4 1

The annual average increase in tumor size is “2.71%”.


2 ) Measures of variation
i - RANGE
 The simplest measure of variation is the RANGE
which is defined as the difference between the
maximum value and the minimum value.
 Not that the range is a definite quantity, measured
in the same units as the original observations ;
 if the highest and lowest of a series of diastolic
blood pressures are 96 and 62 mmHg, we may say
not only (as in conversation) that the readings from
62 to 96 mmHg, but that the range is 34 mmHg.
ii - STANDARD DEVIATION AND VARIANCE

 The second characteristic of interest in regard to the


frequency distribution is the spread of the magnitudes
around the mean magnitude.

 Of the many representative values that have been as a


measure of the random error, the one most
fundamentally important is that known as the
STANDARD DEVIATION.

 The standard deviation of a population is calculated by


the first determining the amount each member xi
deviates from the population mean .
The population standard deviation sigma is

( xi  x ) 2

n
• The units and dimensions of the standard deviation will be
those of the quantity expressed by xi .

• The estimated standard deviation SD is calculated from the


n member of a sample as

( xi  x ) 2

SD  ( when n<31 use n-1)


n 1
Example: Calculation of estimated variance and
standard deviation; direct formula.

Xi Xi - mean (Xi – mean)2


8 0 0
5 -3 9
4 -4 16
12 4 16
15 7 49
5 -3 9
7 -1 1
n=7  = 56 0 100

ഥ = 56/7 = 8
𝒙 SD2=100/(7-1)=16.67 SD= 16.67 = 4.08
It is inherent in the definition of a mean
that the algebraic sum of all the deviations
(both positive and negative ) is equal to 0,
so that the average deviation is 0.

But in the process of squaring


both positive and negative deviations become positive,
and the average of the squared deviations,
the variance, is not 0.

When xi are very nearly equal to the mean x,


the sum of the squared deviations and
the standard deviation are small.
Step 1 Step 3 Step 4
x (x  x) (x  x)2
x
 x 25
 5
Step 2
6 1 1 n 5
3 -2 4
8 3 9
5 0 0
Step 5 s2 
 ( x  x ) 2


18
 4.5
3 -2 4 n 1 4
25 0 18 s  s 2  4.5  2.12
x (x  x) (x  x)2
1 -4 16 x
 x 25
 5
n 5
3 -2 4
5
6
0
1
0
1 s2 
 (x  x) 2


46
 11.5
10 5 25 n 1 4
25 0 46 s  s 2  11.5  3.39
n
NOTE: The sum of the deviation,  ( xi  x ) , is always zero.
For example: Comparison of variances
The second set of data is more dispersed than
the first set, and therefore its variance is larger.

3
First sample 3 5 6 8
s2=4.5
Second sample 1 3 5 6 10 s2=11.5
iii - COEFFICIENT OF VARIATION (CV)
It is occasionally useful to describe the
variability by expressing the standard deviation SD
as a proportion, or a percentage, of the mean. CV  _ *100
The resulting measure, called the COEFFICIENT X
OF VARIATION , is thus a dimensionless
quantity -a pure number-. in symbols,
The CV is most useful as a descriptive tool to detect
normality of data series.

The coefficient of variation then remains unchanged and


is a useful single measure of variability.
Example: Calculation of CV for the previous data.

4.08
CV  * 100  _____
51%
8

You might also like