Biostatistics: Khadeeja PK
Biostatistics: Khadeeja PK
KHADEEJA PK
PART 1
• MEASURES OF CENTRAL TENDENCY
• Central tendency: refers to the Middle of the Distribution
Value or parameter which serves as single estimate of a series
of data.
• Gives a Mental picture of central value
• Enables comparison
• One central value around which all other observations are
dispersed
• Objective:
• to condense the entire mass of data
• to facilitate comparison
• There are 3 types of measures to calculate the central value. Which are
:
1) Mean
2) Median
3) Mode
MEAN
• Mean is the arithmetic average. Calculated by adding all the values
and then dividing by the total number of observations.
• The calculation of the mean incorporates all values in the data. If you
change any value, the mean changes. However, the mean doesn’t
always locate the centre of the data accurately. This happens when we
have a huge differences between the values.
• For e.g. our values are 4, 3, 5, 6, 14, 18, 40, 10, 2, 6, 7.. If u calculate
mean for this e.g. it will be 10.45 which does not appear as a central
value. As most of the observations are below 10. Here the mean is
getting influenced by the extreme value that is 40. So this is the
problem with mean. Thus mean can be used when the values are not
scattered.
• Advantages of mean :
• Easy to calculate and understand.
• Takes all values into consideration
• Allows further statistical analysis
• More reliable than other measures of central tendency.
• Limitation:
• Gets influenced by extreme values
MEDIAN
• Median is the middle value. It is the value that splits the data set in
half.
• To calculate median, we first have to arrange the values in either
ascending or descending order and then pick the middle value.. With
the above same values used to calculate mean,
• For e.g. Arrange the values above in ascending order: 2, 3, 4, 5, 6, 6, 7,
10, 14, 18, 40. Now select the central value and that will be 6. So the
median is 6. This is when we have odd numbers.
• In the case of even set of numbers, after arranging the values in
ascending or descending order, select the central 2 values and then
calculate their average (add the nos. and divide by 2).
• E.g. 2, 3, 4, 6, 8, 10.. so to calculate median its already in ascending
order, central values are 4 and 6. 4+6/2 will be 5. So median is 5.
•MODE
Mode is the value that occurs the most frequently in the data set.
Not affected by extreme values. 2, 3, 4, 5, 6, 6, 7, 10, 14, 18, 40. in
this example mode is: 6 (appeared twice)
• 2, 5, 4, 7, 8, 9 ,10 here none of the value is repeating. So there is
no mode.
• So either u can have single mode or multiple modes or no mode at
all.. When there is no mode, we can calculate the mode by the
following formula :-
• Mode= 3median - 2mean
Measure of dispersion
• The main idea about the measure of dispersion is to get to know
how the data are spread. It shows how much the data vary from
their average value. Dispersion helps to understand the distribution
of the data.
RANGE
• Range is the difference between the largest and the smallest observations.
Range = X max – X min
• Advantages of Range:
• It is the simplest of the measure of dispersion
• Easy to calculate
• Easy to understand
• Independent of change of origin
• Limitations of Range:
• It is based on two extreme observations. Hence, get affected by fluctuations
• A range is not a reliable measure of dispersion
• Dependent on change of scale
MEAN DEVIATION
• Mean deviation is the arithmetic mean of the absolute deviations of
the observations from a measure of central tendency. Also called
Average deviation
STANDARD
• Standard deviation DEVIATION
is the most important and widely used. First used by KARL
PEARSON in 1893. It is the square root of the mean of the squared deviations
from arithmetic mean. It is denoted by a Greek letter sigma, σ.
• Greater the deviation – greater the dispersion from central value Smaller the
deviation- higher degree of uniformity
Advantages of Standard Deviation:
• When data is collected from a very large number of people and a frequency
distribution is made with narrow class intervals the resulting curve is smooth
and symmetrical and it is called a normal curve.
• In a normal curve:
• The area between one standard deviation on either side of the mean will include
approximately 68% of the values
• The area between two standard deviation on either side of the mean will include
approximately 95% of the values
• The area between three standard deviation on either side of the mean will include
approximately 99.7% of the values
• The limit on either side of the mean are called confidence limit.
• STANDARD NORMAL CURVE
• There might be many normal curves but there is only one standard normal
curve
• The standard normal curve is bell shaped
• The curve is perfectly symmetrical based on an infinitely large number of observations.
The maximum number of observation is at the mean and the number of observation
gradually decrease on either side with few observation at few extreme points
• The total area of the curve is one, its mean id zero and standard deviation is one
• All the three measures of central tendency the mean median and mode coincide
• If mean > 2standard deviation it indicates that values are normally distributed
Skewness
• It is the statistic to measure the asymmetry of distribution on either
side of mean
kurtosis
• Is the measure of height of distribution curve
• Kurtosis: -
• Tall curve: Leptokurtic
• Flat curve : Platykurtic
• Normal: Mesokurtic
Reference
• P. Soben. Essentials of preventive and
social medicine.
• K. Park. Park’s Textbook of Preventive
and Social medicine.