0% found this document useful (0 votes)
20 views68 pages

Measures of Central Tendency Lecture 3

Uploaded by

enoch taclan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views68 pages

Measures of Central Tendency Lecture 3

Uploaded by

enoch taclan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

QUIZ 1. Fill in the blanks. Give your complete answers.

1. A representative sample is a subset that provides an accurate picture of the whole


population. True or False
2. One in which the respondents themselves decide whether to be included is called
________
3. It involves using respondents who are “convenient” to the researcher
4, Each subject in the population has the same chance of being selected
5. the sampling frame is ordered, and a number s is selected
6. A common way that statistics can be misused to distort the truth or create a false
impression
7. Type of epidemiological study where you just interview people in the community
8. Testing the side effects along with other behavioral change interventions is a type of what
epidemiological study?
9. A type of sampling subdivide the population into at least two different subgroups that share
the same characteristics, then draw a sample from each subgroup
10. Graphic presentations used to illustrate and clarify information.
11. Subdivision of a single bar to indicate the composition of the total divided into sections
according to their relative proportion.
12. It is diagram showing the relationship between two numeric variables (as the scatter) but
the points are joined together
13. Consist of a circle whose area represents the total frequency (100%) which is divided into
segments.
14. Also called as measurement data
15 Also called as enumeration data
Measures of Central
Tendency, Dispersion,
Skewness and Kurtosis
COURSE/SUBJECT: BIOSTATISTICS AND EPIDEMIOLOGY
LECTURE/09.16.24 /BSMLS 3-A&B
PRESENTED BY: ENOCH TACLAN BS BIOL, MSC BIOL
Basic Characteristics of a Distribution
Central Tendency
Variability
Skewness
Kurtosis
Measures of Central Tendency

• The 3 M's • Numerical descriptive measures that


indicate or locate the center of a
– Mean distribution or data set.
– Median • It represents the central point of the
given data set.
– Mode
Comparison of Central Tendency Measures

In a perfect world, the mean, median & mode would be the


same.
However, the world is not perfect & very often, the mean,
median and mode are not the same
Measures of Central Tendency
When assessing the central tendency of your measurements, you are
attempting to identify the “average” measurement

◦ Mean: best known & most widely used average, describing the center of a frequency
distribution
◦ Median: the middle value/point of a set of ordered numbers below which 50% of the
distribution falls
◦ Mode: the most frequent value or category in a distribution
Measure of central tendency
Arithmetic Mean (Mean)
Definition:
Sum of all the observation s divided by the number
of the observations

The arithmetic mean is the most common measure


of the central location of a sample.
N n

x i
x i
Population  = i =1 Sample X= i =1

N n
“sigma”, the sum of X, add up
Mean all scores
Population

X
“mu” = “N”, the total number of
Sample N scores in a population

“sigma”, the sum of X, add up


all scores
X
“X bar” X =
n
“n”, the total number
of scores in a sample
Measures of Central Tendency
Sample Mean
The sample mean, 𝑥, is the sum of all values in the sample
divided by the total number of observations, n, in the
sample.

∑ 𝑛𝑖=1 𝑥𝑖
𝑥=
𝑛
Examples: Sample Mean
DAYDorine noticed that her son,TIME
Mark, leaves
SPENT INat
TRAVELLING Computation :
exactly 6:30 am everyday to go to school. She
Monday 53 min.
asked him to document his travel time last
week and shown are the results.
Tuesday Compute for
47 min. X = 53+47+60+45+50/ 5
the mean.
Wednesday 60 min. x̄ = 51 min
Thursday 45 min.
Friday 50 min.

Interpretation: Mark spent 51


minutes in going to school.
Example: Sample Mean
Mean systolic blood pressure

Scenario 2: Subjects BP
Mean = (120 + 135 + 115 + 1 120 (x1)
2 135 (x2)
110 + 105 + 140)/6
3 115 (x3)
=121 4 110 (x4)
5 105 (x5)
6 140 (x6)
WEIGHTED MEAN
Or the Weighted Arithmetic Mean
Each value or measurement is associated with a certain weight or degree of importance

Where:
x = item value
w = weight association to x
Example
Nora, an MLS student, is enrolled in 6 subjects this semester. The subjects,
equivalent number of units and grades are shown below. Compute for her general
weighted average.
Subject No. of Units Grade (Units)
(grade)
BACTE 5 87 435
x = (UnitsGrade)/
CC 4 88 352 No of units
x = 1824/21
HEMA 4 87 348
x = 86.86
HISTO 2 88 176

H+ EDUC 3 90 270 Interpretation: Nora has a GWA of 86.19 for


the first semester
CM 3 81 243
Properties of the Mean
❖The mean of a random sample is an unbiased estimator of the mean of the population from
which it came.
❖The mean is the mathematical expectation. As such, it is different from the mode, which is the
value observed most often.
❖For a set of data, the sum of the squared deviations of the observations from the mean is
smaller than the sum of the squared deviations from any other number.
❖For a set of data, the sum of the squared deviations from the mean is fixed for a given set of
observations.
Median
• Middle value of a given • The sample median, M, is the
set of measurements, number such that “half" the values
provided that the values in the sample are smaller and the
or measurements are other “half" are larger.
arranged in an array • Use the following steps to find M.
• Array: arrangement of – Sort the data (arrange in increasing
order).
values in either in
– Is the size of the data set n even or
increasing or
odd?
decreasing order – If odd: M = value in the exact middle.
– If even: M = the average of the
two middle numbers.
MEDIAN FOR UNGROUPED DATA
To compute for the median, first you have to arrange them
in an array.
And then, simply get the middle value.
Example 1
The following are the midterm scores of 9 students in
the midterm exam in BioStatistics. Compute for the
median.
Given: 70,49, 62, 83, 20, 38, 46,25, 58
Example 2
GIVEN: 70, 49, 62, 83, 20,
38, 46, 25
Example: Sample Median
• Median systolic BP:
Scenario 1:
120 : 135 : 115 : 110 : 105 : 140
Median = (115 + 110) /2 = 112.5

Scenario 2:
120 : 135 : 115 : 110 : 105 : 140 : 280
Median = 110

• The median is not affected by extreme


observations and is a resistant measure.
Mode
• The sample mode is the value that occurs
most frequently in the sample (a data set can
have more than one mode).

• This is the only measure of center which can


also be used for categorical data.

• The population mode is the highest point on


the population distribution.
Example
The number of visits per week to a medical clinic by 7 patients:

Diabetes Patient No of times the patient


visit the clinic •Since this distribution has only
1 5 one mode, this is said to be
UNIMODAL.
2 8
3 7
4 5
5 6
6 5
7 7
GIVEN: 40, 35, 19, 22, 27, 40, 19,32, GIVEN: 61,26,30,33, 45, 17, 29,39,20
51
Measures of Dispersion

• Range
• Sample Variance
• Standard Deviation
• Inter Quartile Range (IQR)
Sets of Grades in Biostatistics of 2 groups
with 5 members each
MALE Group FEMALE Group
Kent - 100 Jaynne – 80
Lloyd – 70 Venice – 81
Lordwin - 80 Frankie – 82
Adrian – 60 Julie – 79
Justein – 95 Cielo - 83
Mean : 81 Mean: 81
•Let us picture the position of each grade in a number line
• The measure of the
center of the distribution
(mean) is of little help in
describing and
comparing these two
sets of data.
• But, by getting the
average distance of each
item from the center of
the distribution, the
group can be described
more completely, and
likewise, the similarities
and interferences can be
easily identified.
Measures of Dispersion: Range

Range: The range of the Difference between the largest


data set is the difference and the smallest observations
between the highest value
and the lowest value.
Range = X Largest − X Smallest
Range = highest value -
lowest value
Set of grades for two groups in Bio-Statistics

Interpretation:
The range of grades of the
female group is 4 while
that of the male group is
40. this shows that the
grades of females are close
to each other while the
grades of the male group
are scattered.
Measure of Dispersion: Interquartile Range
• Percentile: The percentile of a distribution is
the value at which observations fall at or below
it.
Measure of Dispersion

Inter Quartile Range (IQR)


A simple measure spread giving the range covered
by the middle half of the data is the (IQR) defined
below.

IQR = Q3 - Q1

The IQR is a resistant measure of spread.


Measure of Dispersion: Interquartile
Range
IQR = Q3 - Q1
Based on dividing the distribution into four equal parts or quartiles
Measure of Dispersion

• The most commonly used percentiles are the


quartiles.

“Middle” value in the first half of


1st quartile Q1 = 25th percentile. ➢

the rank-ordered data set


2nd quartile Q2 = 50th percentile. ➢ Median Value
3rd quartile Q1 = 75th percentile. ➢ “Middle” value in the second half
Inter Quartile Range
Consider the data: {4,8,15,16,23,42}

•Sort the data set in ascending order


•Find the first quartile (Q1): This is the 25th
percentile of the data set
•Find the third quartile (Q3): This is the 75th
percentile of the data set
•Calculate the IQR by subtracting Q1 from Q3
Percentile
Consider this data set : 15, 20, 35, 40, 50

1. Sort the data set in ascending order.


2. Determine the rank (index) for the percentile using the
formula:

Where P is the desired percentile, and n is the number of


observations in the data set

3. If the rank is an integer, the value at that position is the


percentile.
4. If the rank is not an integer, interpolate between the closest
ranks.
Numerical Measures of Spread
Measures of Dispersion: Variance
• Variance: equal to the sum of squared deviations from the
sample mean divided by n - 1, where n is the number of
observations in the sample.
Where:
x = individual values
x̄ = mean
n = sample size
Variance
Consider this data set : 2, 4, 4, 6, 8

▪ Calculate the sample mean (𝑥): Sum all the data points and
divide by the number of observations (𝑛).
▪ Subtract the mean from each data point to get the
deviation of each data point from the mean.
▪ Square each deviation to eliminate negative values.
▪ Sum all the squared deviations.
▪ Divide the sum of squared deviations by 𝑛−1, where 𝑛 is
the number of observations.
Variance
Variance is the average of the
squared deviations taken from the
mean value. X (Xi-Mean) (Xi-Mean)^2
2 -2.8 7.84
4 -0.8 0.64
Variance = 20.8/5-1 -0.8 0.64
4 1.44
= 20.8/4 1.2
= 5.2 6 10.24
3.2 20.8
8 1.76
4.8

40
Standard Deviation
Standard deviation is the positive square root of the mean-square
deviations of the observations from their arithmetic mean.

Population Sample

=
 i
( x −  )2

s=
 i
( x − x )2

N N −1

SD = variance
Where:
x = individual values
x̄ = mean
n = sample size
Variance = 20.8/5-1
= 20.8/4 X (Xi-Mean) (Xi-Mean)^2
= 5.2 2 -2.8 7.84
-0.8 0.64
4
-0.8 0.64
4 1.2 1.44
6 3.2 10.24
8 1.76 20.8
SD = √20.8/5-1 4.8
= √20.8/4
=2.28

43
Coefficient of Variation (CV)
 S 
CV =   100%

X 
Can be used to compare two or more
sets of data measured in different units
or same units but different average size.

44
Coefficient of Variation (CV)
Where:
CV = coefficient of variation
s = standard deviation
x̄ = mean
The following are the mean and SD values of the heigh and weight of 3rd year BSMLS students of
MAC.

HEIGHT (in) WEIGHT (lbs)

Mean 85 120.3

SD 6.4 19.2

Since
• Compute for the CV of the height HEIGHT
CV =6.4/85x 100 CV for weight > CV for height
• Compute for the CV of the weight
=7.53 %
• Which has more variation? The body weight of the 3rd year
BSMLS students has more
WEIGHT variation than their height
CV=19.2/120 x100
= 15.96 %
The total marks of Benson and Kent in 5 subjects are 575 and 530 with a variance of 3.4 and 8.9,
respectively. Who has more consistent performance?

BENSON KENT

Σx = 575 Σx = 530
Variance= 3.4 Variance= 8.9

x̄ = 575/5 x̄ = 530/5 Benson Since


= 115 = 106
CV = 1.6 % CV for Benson < CV for Kent
SD = √ 3.4 SD = √8.9
= 1.84 = 2.98
Kent Benson has a more consistent
CV = 1.84/115 x 100 CV = 2.98/106 x 100 performance in the 5 subjects
CV= 2.81 %
CV = 1.6 % CV = 2.81 %
Measure of Dispersion
Outliers: extreme observations that fall well outside the overall
pattern of the distribution.

• An outlier may be the result of a


– Recording error,
– An observation from a different population,
– An unusual extreme observation (biological
diversity)
Outliers
Consider the data set: {9,9,10,10,10,11,12,36}
Q1 – 1.5 X IOQ ≤ = Acceptable Range ≤ = Q3 + 1.5 X IOR
Median =
Q1 =
Q3 =
IQR=𝑄3−𝑄1
IQR =

Q1 – 1.5 x IQR ≤ Acceptable range ≤ Q3 + 1.5 X IQR


9.5 – 1.5 (2) ≤ Acceptable range ≤ 11.5 + 1.5 (2)
≤ Acceptable range ≤
To Determine Our Outliers:
◦We use the 1.5 IQR Rule
Large Outliers
◦Are higher than Q3 + 1.5 X IQR
Small Outliers
◦Are less than Q1 – 1.5 X IOR
The Empirical Rule - Gives us how much of the data lies within
one, two or three standard deviation of the mean

X  1S contains about 68% of values 68%


X
X  1S
X  2S contains about 95% of values

95%

X  2S

99.7% X  3S contains about 99.7% of values


X  3S
52
The Empirical Rule
Ex. Grades on a Bacteriology exam 95% ----- 2 Standard deviation
follow a normal distribution with a x̄ (±) 2 SD
mean of 78 and a standard
deviation of 6 78 - 2⋅6
78 -12
Find the range around the meant 66
that includes 95% of the grades
78 + 2⋅6
x̄ = 78 78 + 12
s= 6 90
The Empirical Rule
Ex. The heights of women follow a x̄ ± (K) SD
bell-shaped distribution with a 160 + K (7.5) = 182.5 cm
mean of 160 cm and a standard K (7.5) = 22.5
deviation of 7.5 cm K=3

What is the approximate 99.7% -> SD


percentage of women between
137.5 cm and 182.5 cm?
x̄ = 160
s = 75
The Shape of Distributions
➢Distributions can be either symmetrical or skewed, depending on
whether there are more frequencies at one end of the distribution
than the other.

?
Symmetrical Distributions
❑A distribution is symmetrical if the frequencies at the right and left
tails of the distribution are identical, so that if it is divided into two
halves, each will be the mirror image of the other.

❑ In a symmetrical distribution the mean, median, and mode are


identical.
Skewed Distribution

❑Few extreme values on one side of the distribution


or on the other.

❑Positively skewed distributions: distributions which


have few extremely high values (Mean>Median)

❑Negatively skewed distributions: distributions


which have few extremely low
values(Mean<Median)
Measures of Skewness
A distribution in which the values
equidistant from the center have equal
frequencies is defined to be symmetrical
and any departure from symmetry is
called skewness.

1. Length of Right Tail = Length of Left


Tail
2. Mean = Median = Mode
3. Skewness = 0
a) Sk = (Mean-Mode)/SD
b) Sk = (Q3-2Q2+Q1)/(Q3-Q1)

58
Symmetric Data Distribution
6

4
Frequency

0 10 20 30 40 50
Value
Measures of Skewness

A distribution is positively skewed, if


the observations tend to concentrate
more at the lower end of the
possible values of the variable than
the upper end. A positively skewed
frequency curve has a longer tail on
the right hand side

1. Length of Right Tail > Length of Left Tail


2. Mean > Median > Mode
3. SK>0

60
Positively Skewed / Rightward Skewness of
Data
6 Mode Median Mean

4
Frequency

0 10 20 30 40 50

Value
Measures of Skewness

A distribution is negatively skewed, if


the observations tend to concentrate
more at the upper end of the possible
values of the variable than the lower
end. A negatively skewed frequency
curve has a longer tail on the left side.

1. Length of Right Tail < Length of


Left Tail
2. Mean < Median < Mode
3. SK< 0

62
Negatively Skewed / Leftward Skewness of Data
6 Mean Median Mode

4
Frequency

0
10 20 30 40 50
Value
Measures of Kurtosis
• The Kurtosis is the degree of peakedness or flatness of a
unimodal (single humped) distribution,
• When the values of a variable are highly concentrated
around the mode, the peak of the curve becomes
relatively high; the curve is Leptokurtic.
• When the values of a variable have low concentration
around the mode, the peak of the curve becomes
relatively flat;curve is Platykurtic.
• A curve, which is neither very peaked nor very flat-toped,
it is taken as a basis for comparison, is called
Mesokurtic/Normal.

64
Measures of Kurtosis

65
SKEWNESS AND KURTOSIS
EXAMPLE 1: SAMPLE DATA
Calculate Sample Skewness, Sample Kurtosis from the following grouped data

Class Frequenc χ (χ- x̄) (χ-x̄)^2 (χ-x̄)^3 (χ-x̄)^4 f(x. x̄)^2 f(x. x̄)^3 fs
y
12-14 13 -12.2 148.8 1815.85 22153.3 446.52 5447.55 66460.02
3 4 4

14-16 15 -11.2 125.4 1404.93 15735.1 501.76 5619.72 62940.76


4 4 9

16-18 17 -13.2 174.2 2299.97 30359.5 348.48 4599.94 60719.16


2 4 8

18-20 19 -14.2 201.6 2863.29 40658.6 201.64 2863.29 40658.69


1 4 9
Find the x = average of class 12-14, 14-16,16-18,18-20 (s) = √E f(x-x̄)^2/ Ef-1
x̄= 13(3) + 15(4) + 17(2) + 19(1) / total number ng = √1498.4/10-1
frequency is 10 = 12.90
x̄= 15.2
skew = Ef (x-x̄)^3/ Ef -1 x s^3
x-x̄ = 3-15.2, 5-15.2, 7-15.2, 9-15.2 = 18530.5/9 x12.90^3
x-x̄^2 = 148.4, 125.44, 174.24,201.64 = 0.960
x-x̄^3 = 1815.85, 1404.93, 2299.97, 2863.29 Kurt = Ef (x-x̄) ^4/ Ef -1 X s^4
x-x̄^4 = 22153.34, 15735.19, 30359.58, 40658.69 = 0.925

f(x. x̄)^2 = 3x148.84, 4x125.44, 2x174.24, 1X201.64


= 446.52, 501.76, 348.48, 201.64
sum= 1498.4
f(x. x̄)^3 = 3x1815.85, 4x1404.93, 2x2299.97, 1x2863.29
= -5447.55, 5619.72, 4599.94, 2863.29
sum =18530.5
f(x. x̄)^4 = 3x22153.34, 4x15735.19, 2x30359.58,
1x40658.69
= 66460.02, 62940.76, 60719.16, 40658.69
sum = 230,778.63

You might also like