0% found this document useful (0 votes)
34 views30 pages

Class 2 SP

The document discusses measures of central tendency, which describe the central or typical value in a data set. The three main measures are the mean, median, and mode. The mean is the average value found by dividing the sum of all values by the total number of data points. The median is the middle value of the data set when sorted. The mode is the value that occurs most frequently. The document provides formulas and examples for calculating each of these measures from raw data or grouped frequency distributions.

Uploaded by

Abel Gulilat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views30 pages

Class 2 SP

The document discusses measures of central tendency, which describe the central or typical value in a data set. The three main measures are the mean, median, and mode. The mean is the average value found by dividing the sum of all values by the total number of data points. The median is the middle value of the data set when sorted. The mode is the value that occurs most frequently. The document provides formulas and examples for calculating each of these measures from raw data or grouped frequency distributions.

Uploaded by

Abel Gulilat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Summary Statistics

Measure of Central Tendency


Summery statistics
Summary Measures
Summary Measures

Central Tendency Quantiles Variation

Mean Median Mode

Arthemetic
Range Coefficient
Geometric of Variation
Variance
Harmonic

Standard Deviation
2 Introduction to Biostatistics 25-Sep-23
Introduction
 Measures of central location are statistical measures which
describe the position of a distribution.
Measures of location
 Measures of central tendency
 Measures of non central locations (Quartiles, Percentiles )

 MCT: Single value that best represents the entire data

Why measure of central tendency:


To describe (locate) the center of the distribution
To facilitate comparison.
To make further statistical analysis.

3 Introduction to Biostatistics 25-Sep-23


Measures of central tendency
The tendency of statistical data to get concentrated at certain values is
called “CentralTendency”

Type:
There are three kinds of averages – each in a different way
measuring the location of the distribution.
These are the;
The Mean, (Arithmetic, Geometric, and Harmonic)
The Median, and
The Mode.

4 Introduction to Biostatistics 25-Sep-23


Arithmetic Mean ( X)
Is defined as the value each item in the distribution would have if all the
values were shared out equally among all the items.

Is the measure to which we usually refer in everyday life when we use the
word “average.”
Average: a figure that best represents the location of the distribution.
The mean of X1, X2 ,X3 …, Xn is denoted by A.M , and is given by:
1.General formula for raw data

Sample Mean : Population Mean :


x 1  x 2  ...  x n X1  X2 ... Xn
X  
n N
n N

 xi =  xi
i1 i 1
 N
n
5 Introduction to Biostatistics 25-Sep-23
2.For ungrouped frequency distribution
k

 f i X i where
X  i1
k xi is the ith class observation

i1
fi
k is the number of class
fi is the ith class frequency

3.For grouped frequency distribution


k where
 fi X i
X  i 1
k
, xi is the ith class mark
i 1
fi k is the number of class
Introduction to Biostatistics
fi is the ith class frequency
6 25-Sep-23
Example

7 Introduction to Biostatistics 25-Sep-23


Example
Mean for Aggregate (Grouped) Data
To calculate the mean for grouped data, you need a frequency table that includes a column for the
midpoints, for the product of the frequencies times the midpoints (fm).
k

Formula:
 fi X i
X  i 1
k
,
 i 1
fi

Score f xi (fixi) X = Σ (fixi)


N
41-50 1 45.5 45.5
= 1420 / 20 = 71
51-60 3 55.5 166.5
61-70 8 65.5 524 The mean for the grouped data is 71.
71-80 3 75.5 226.5
81-90 2 85.5 171
91-100 3 95.5 286.5
N = 20 Σ (fixi) =1420
* Find midpoints first

8 Introduction to Biostatistics 25-Sep-23


Special Properties of A.M

1)
n
 

i 1
x  x  0
 

2 
 

2)
n

i1
 x  x  
n

i1

x  A 
2
x  A
 
3) The effect of transforming original series on the mean.
If a constant k is added/ subtracted to/from every observation then the new
mean will be the old mean± k respectively.
If every observations are multiplied by a constant k then the new mean will be
k*old mean
k
X 1 n 1  X 2 n 2  ....  X k n k  X ini
4) X c   i 1
n 1  n 2  ... n k k
 ni
i 1

9 Introduction to Biostatistics 25-Sep-23


Example
Suppose the mean monthly wages per worker of two time employing 40 &
60 workers are Birr 350 and 380 respectively.
What is the mean wage per worker for the two firms taken together?
Solution: let n1 = the number of workers in firm one =40
n2 =the number of workers in firm two=60
x 1 = the mean wage of workers in firm one=350Birr
x 2 =the mean wage of workers in firm two =380Birr

X 1 n1  X 2 n 2 X
i 1
i ni
40 (350 )  60 (380 )
Xc     368 Birr
n1  n 2 2
40  60
n i 1
i

10 Introduction to Biostatistics 25-Sep-23


Properties of the Mean
For a given set of data there is one and only one
arithmetic mean (Uniqueness).
It is easily understood and easy to compute
(Simplicity).
It makes use of every value in the distribution. It can,
therefore, be distorted by extreme values.

11 Introduction to Biostatistics 25-Sep-23


The Median
 When ordering the data, it is the observation that divide the
set of observations into two equal parts such that half of the
data are before it and the other are after it.
If “n” is Even
If “n” is odd
~
X  X1 ~ 1 
( n 1)
X   X n  X n 
2 2 1
2 2 
i.e
When n = 11, then the median is the 6th observation.
When n = 12, then the median is the 6.5th observation,
which is an observation halfway between the 6th and
7th ordered observation.

12 Introduction to Biostatistics 25-Sep-23


Example: For the same random sample, the ordered observations will
be as: 23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. = (32+34)/2 =
33.
Median of Group Data
~ w  n 
X  L me    Fpm 
fm  2 

Lme = Lower class boundary of the median class


w = Width of the median class fm = Frequency of the median class
n = total observation Fpm= less than cumulative frequency of
the before the median class
 To determine the median class, we have to take the class that contains or
th
 n  or n 
  lcf
 2 
2
Introduction to Biostatistics 25-Sep-23
13
Example-3:Find Median
Age in years Number of people Cumulative number of people
14.5-19.5 677 677
19.5-24.5 1908 2585
24.5-29.5 1737 4332
29.5-34.5 1040 5362
34.5-39.5 294 5656
39.5-44.5 91 5747
44.5-49.5 16 5763
Total 5763 -

14 Introduction to Biostatistics 25-Sep-23


solution
To determine the median class, we have to take the class that
contains
th th
 n    5763  th
     2881.5 item
2  2 
The first Lcf in which 2881.5 is less than or equal to is 4332.
Hence, the median class is 24.5-29.5 Then,
Lme=15
~  n w 
w =5 X  L me   Fpm 
f m  2 
n=5763
fm = 1737 5  5763 
 24 . 5    2585 
Fpm= 2585 1737  2 
 24 . 5  0 . 85  24 . 9

15 Introduction to Biostatistics 25-Sep-23


Properties of the Median:
Uniqueness.
Simplicity. It is easy to calculate.
It is not affected by extreme values as
is the mean.

16 Introduction to Biostatistics 25-Sep-23


THE MODE ( X̂ )
The most frequently occurring value in a series
The modal value is in the highest bar in a histogram
The mode of a set of data or distribution can be: Mode

 No mode: In this case all values appear equal number of times.


 Unimodal: If the distribution has only one mode
 Bimodal: If the distribution has two modes
 Multi-modal: If the distribution has more than two modes
Example:
The modal age of the age distribution:
23, 28, 28, 31, 32, 34, 37, 42, 50, and 61 is 28, since it occurred twice
while the other values occurred only once.

17 Introduction to Biostatistics 25-Sep-23


Mode of Group Data
 1
x  Lmo  w
1   2

Lmo = Lower boundary of modal class


Δ1 = difference of frequency between modal class and class before it
Δ2 = difference of frequency between modal class and class after
 1
 f mo
 f 1

 2
 f mo
 f 2

w = class width
Modal Class: class which has highest frequency

18 Introduction to Biostatistics 25-Sep-23


Example
The following are the sizes (in millimeters) incidental intracranial
aneurysms (IIAs) of 30 patients.
Solution
IIAs Frequency (f) Since, the maximum frequency is 12, the modal
0-4 6 class is 5-9.Then,
Lmo=4.5 w=5 fmo=12 f1=6 f2=7
5-9 12
10-14 7  1
 f mo
 f 1
 12  6  6
15-19 5  2
 f  f  12  7  5
mo 2
20-24 0
  1 
Total n = 30 x  L mo  W  
 
 1   2 
  6 
x  4 .5  5  
 6  5 

19 Introduction to Biostatistics x  4 . 5  0 . 55  5 . 05 25-Sep-23
Cont’d
Permissible
Measurement Best measure of
mathematic
scale central tendency
operations
Nominal Counting Mode
Greater or less than
Ordinal Median
operations
Addition and Symmetrical – Mean
Interval
subtraction Skewed – Median

Addition, subtraction,
Symmetrical – Mean
Ratio multiplication and
Skewed – Median
division

20 Introduction to Biostatistics 25-Sep-23


Summary Statistics

Measure of Dispersion
Introduction
The degree to which a numerical data tends to spread about an average is
called dispersion or variation of the data
 MCT & MV together help us to sum up a distribution of scores
without looking at each and every score.
 MCT tell you about typical (or central) scores.
 MV reveal how far from the typical or central score that the
distribution tends to vary.
Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
a) If the values close to each other
→The amount of Dispersion small.
b) If the values are widely scattered
→The Dispersion is greater.
22 Introduction to Biostatistics 25-Sep-23
Look the dispersion

 Hence measure of central tendency is not sufficient to evaluate


data set
We may use Measuring variation to:
Comparison of two or more groups in terms of variability,
Judge the reliability of measure of central tendency,
Further statistical analysis.
23 Introduction to Biostatistics 25-Sep-23
Common Measures of Dispersion are:
1. Range (R) and Relative Range (RR)
2. Quartile deviation (QD) and Coefficient of quartile deviation (CQD)
3. Mean Deviation (M.D) and Coefficient of Mean Deviation (C.M.D)
4. Variance (s2), Standard Deviation (s) and Coefficient of Variation (CV).
Range
One way to measure the variation in a set of values is to compute the
range.
It is the difference between the highest (maximum) and the lowest
(minimum) scores in the distribution.
Range = highest score - lowest score
Range – A measure of variation in interval-ratio variables
E.g. The range of the set: 2, 3, 3, 5, 5, 8, 8, 12 is 12 – 2 = 10
Among others, the range is not the most important measure of
variability (bad measure of variability)
24 Introduction to Biostatistics 25-Sep-23
Variance (S2)and Standard Deviation (Sd)
• Variance (S2) is another measure of spread
• Standard deviation is the square root of variance
• Average squared deviation from the mean is referred to as
measure of variability or spread
Sample Population
n


i1
(xi  x)2 N

 (X   )2
s 2
 i

n 1  2
 i1
N
Standard deviation
 i x  x 2

 iX   2

s  s2    2 
n 1 N

25 Introduction to Biostatistics 25-Sep-23


Example
Find the standard deviation of the numbers 12, 6, 7, 3, 15, 10, 18, 5.
Solution: X-bar = (12+6+7+3+15+10+18+5) /8= 9.5
Sample variance = [(12-9.5)2+…+ (5-9.5)2]/ (8-1) = 5.21
s  5 . 21  2 . 28
The variance is a measure of the average spread of values
(observations/records) around the mean.
However for interpretation purposes the standard deviation is
more useful as it is expressed in the same units as the observations
(whereas the variance is in units squared).
A small standard deviation would imply that the values
(observations) lie close to the mean and a large one would
imply that the values lie further from the mean overall.
26 Introduction to Biostatistics 25-Sep-23
The variance for grouped data can be computed as:
1
S 
2

n 1
 f i ( X i  X ) 2
, i  1,2,.....,n

Activity
Find sample variance for leisure time data

Special properties of Standard deviations

For normal (symmetric) distribution the following holds.


Approximately 68.27% of the data values fall within one standard deviation
of the mean. i.e. with in
Approximately 95.45% of the data values fall within two standard deviations
of the mean. i.e. with in
Approximately 99.73% of the data values fall within three standard
deviations of the mean. i.e. with in

27 Introduction to Biostatistics 25-Sep-23


The Coefficient of Variation
To compare the variability of two or more sets of data for same or
different variables, standard deviations may lead to fallacious
results.
The variables involved might be measured in different units, or
different characteristics
Coefficient ofVariation (CV) is the standard deviation expressed as a
percentage of the mean.
Is a measure use to compare the dispersion in two sets of data
which is independent of the unit of the measurement .
S
C .V  ( 100 %)
X
where
S= Sample standard deviation. X=Sample mean.
28 Introduction to Biostatistics 25-Sep-23
CV… Cont’d
Suppose two samples of human males yield the following data:
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound

We wish to know which is more variable.


Solution:
c.v (Sample1)= (10/145)*100= 6.9
c.v (Sample2)= (10/80)*100= 12.5
Then age of 11-years old(sample2) is more variable
(less consistent)
29 Introduction to Biostatistics 25-Sep-23
End

Probability!!

30 Introduction to Biostatistics 25-Sep-23

You might also like