0% found this document useful (0 votes)
1 views

Lecture 3 - Stat HO

Uploaded by

gamingskenzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Lecture 3 - Stat HO

Uploaded by

gamingskenzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Descriptive Statistics 02

Numerical Descriptive
Measures
Dr.Damitha Asanga Gunawardane
MBBS,MSc,MD Community Medicine, FRSPH (UK)
Consultant in Community Medicine
Senior Lecturer
Department of Community Medicine
Faculty of Medicine
Peradeniya.

Objectives

At the end of this lecture, the student should be able to


Part I
• Define, Describe and Discuss;
• Measures of central tendency – the mean median and the mode
• Other measures of location -the percentile decile and the quartile
• Apply this knowledge in calculations
Part II
• Define, Describe and Discuss;
• Measures of variability (Dispersion) – the range, the interquartile range, the variance,
the standard deviation and coefficient of variance
• Apply this knowledge in calculations
Summarizing and describing quantitative data
using numerical measures
Numerical
Descriptive
Measures

Measures of Measures of
location variability

Measures of Other Range


the center measures Inter Quartile Range
Variance
Standard Deviation
Mean Coefficient of Variance
Mode Percentiles
Median

Lecture 01 –
Measures of
location
Measures of central tendency

Mean

Mean Sum of all data points


= Total Number of data points
Example 01

To the nearest tenth, what is the mean of the following data set?
• 13, 14, 15, 16, 28, 28, 32, 35, 37, 39

Median

• The median is the central value when all observations are sorted in
order
Steps to follow;
1. Put the data in order from lowest to highest.
2. If n is odd, the median is the middle of the ordered values. Find
median by counting (n + 1)/2 up from the beginning.
3. If n is even, the median is the average of the middle two of the
ordered values. Find median by averaging the values (n/2) and (n/2) + 1
from the beginning.
Example 2

To the nearest tenth, what is the mean of the following data set?
13, 14, 15, 35, 28, 39, 32, 16, 37, 28
Put them in order
13, 14, 15, 16, 28, 28, 32, 35, 37, 39
In simple you have to find the middle value
13, 14, 15, 16, 28, 28, 32, 35, 37, 39

Example 2

To the nearest tenth, what is the mean of the following data set?
13, 14, 15, 35, 28, 39, 32, 16, 37, 28
Put them in order
13, 14, 15, 16, 28, 28, 32, 35, 37, 39
In simple you have to find the middle value
13,
3, 1
14,
4, 1
15,
5, 1
16,
6, 28, 28,, 3
32,
2, 3
35,
5, 3
37,
7, 339
9
Example 2

To the nearest tenth, what is the mean of the following data set?
14, 15, 35, 28, 39, 32, 16, 37, 28
Put them in order
14, 15, 16, 28, 28, 32, 35, 37, 39
In simple you have to find the middle value
14, 15, 16, 28, 28, 32, 35, 37, 39

Example 2

To the nearest tenth, what is the mean of the following data set?
14, 15, 35, 28, 39, 32, 16, 37, 28
Put them in order
14, 15, 16, 28, 28, 32, 35, 37, 39
In simple you have to find the middle value
14,
14, 1
15,
5, 1
16,
6, 2
28,
8, 28, 3
32,
2, 3
35,
5, 3
37,
7, 3
399
Mode

• The mode is the value that occurs with the highest frequency in a
data set.
What is the mode of the following data sets?
28, 28, 32, 35, 14, 15, 16, 13, 37,
39
15, 16, 28, 13, 14, 35,
37,28,37.32
27, 13, 14, 28, 15, 16, 32, 39, 35,
37

The shapes of a frequency distribution


(Symmetry and skewness)

Normal distribution Right skewed distribution Left skewed distribution


Bell shape The tail is in the positive The tail is in the negative
One peak direction. There are fewer direction. There are fewer
Symmetrical larger scores than we would smaller scores than we
Eg- Hights of grade 10 expect with a normal would expect with a normal
male students distribution. distribution.
Eg – Household income Eg – Gestational age at birth
Comparison between the Mean, Median, and
Mode

Mean Mode
Mode Mean
Median
Median

Mean and Median

1. Calculate the mean and median of the following data set.


100,110,120,130,140
Mean – 120 , Median – 120
2. Calculate the mean and median of the following data set.
100,110,120,130,140,200
Mean – 133.33 , Median – 125
3.Calculate the mean , median and mode of the following data set.
10,100,110,120,130,140
Mean – 101.6 , Median – 115
Selecting an appropriate measure of centre

Type of Summary measure of location


variable Mode Median Mean
Nominal Yes NO No
Ordinal Yes Yes No
Discrete Yes Yes, if the Yes
distribution is
Continuous No markedly Yes
skewed

Other measures of location


Percentiles
• Percentile is defined as the point on a distribution below which a
given percentage of scores fall.
• Example: You are the 2nd tallest person in a group of 10
Since 80 % of the
people are shorter
than you, you are at
the 80th percentile.
If your height is 175
cm, that means 175
cm is the 80th
percentile.

Certain percentiles have particular importance


because of their position.

• Deciles – Divide the data set into tenths (10 equal parts)
• Eg – 1st decile is the 10th percentile
• Quartiles – Divide the data set into quarters (4 equal parts)
• Eg- 1st quartile is the 25th percentile
• Quintiles – Divide a data set into fifths (5 equal parts)
• Eg- 1 quintile is the 20th percentile
Summarizing and describing qualitative data
using numerical measures
Numerical
Descriptive
Measures

Measures of Measures of
location variability

Measures of Other Range


the center measures Inter Quartile Range
Variance
Standard Deviation
Mean Coefficient of Variance
Mode Percentiles
Median

Selecting an appropriate measure of centre

Type of Summary measure of location


variable Mode Median Mean
Nominal Yes NO No
Ordinal Yes Yes No
Discrete Yes Yes, if the Yes
distribution is
Continuous No markedly Yes
skewed
Lecture 02
Measures of variability (Dispersion)

Range

• It is the difference between the lowest and highest values, and it


gives an idea on how wide is your data.
Calculate the range of the following data set
• 5,7,8,3,9
Range

• It is the difference between the lowest and highest values, and it


gives an idea on how wide is your data.
Calculate the range of the following data set
• 5,7,8,3,9
• 3, 5, 7, 8, 1009

Quartiles and the Inter Quartile Range

Quartiles
Quartiles are the values that divide a list of numbers into quarters or
four equal parts.
Example: 09
Find the quartiles in the following data set;
5, 7, 4, 4, 6, 2, 9
Quartiles and the Inter Quartile Range

Quartiles
Quartiles are the values that divide a list of numbers into quarters or four
equal parts.
Example: 09 Q1 Q2 Q3
Find the quartiles in the following data set;
5, 7, 4, 4, 6, 2, 9
25% 25% 25% 25%
2, 4, 4, 5, 6, 7, 9
Q1 = 4
Q2 = 5
Q3 = 7

Box and Whisker Plot

A box and whisker plot contains five pieces of summary information


about the data:
• Median = Middle verticle line in the box
• Upper quartile = Right edge of the boxx
• Lower quartile = Left edge of the box
• Maximum = Right end of ‘whisker.’
• Minimum = Left end of ‘whisker.’
But when deciding on the Maximum and Minimum, outliers will not be
considered.
Box and Whisker Plot

A box and whisker plot contains five pieces of summary information


about the data:
• Median = Middle verticle line in the box
• Upper quartile = Right edge of the boxx
• Lower quartile = Left edge of the box
• Maximum = Right end of ‘whisker.’
• Minimum = Left end of ‘whisker.’
But when deciding on the Maximum and Minimum, outliers will not be
considered.

Box and Whisker Plot

To construct a box plot


1. Determine the quartiles
2. Calculate the interquartile range
3. Calculate lower and upper limits for outliers.
4. Decide the minimum and maximum values after excluding the
outliers
5. Draw a horizontal axis on which the five-number summary can
be located
6. Draw the box-and-whiskers
7. Plot each outlier with an asterisk
Box and Whisker Plot

Construct a box and whisker plot for the following data set
8,4,2,4,5,5,6,7,7

Box and Whisker Plot

Construct a box and whisker plot for the following data set
8,4,2,4,5,5,6,7,7
2,4,4,5,5,6,7,7,8
Measures of variability (Dispersion) cont.

• Range and IQR does not use all the data point in a data set in their
calculations.
• So it is important to have a measure of variability which involves all
the data points.

Measures of variability (Dispersion) cont.

• Range and IQR does not use all the data point in a data set in their
calculations.
• So it is important to have a measure of variability which involves all
the data points.
p
Standard Deviation

• The standard deviation measures the degree to which individual


observations in a dataset deviate from the mean
To calculate the standard deviation, follow these steps:
1. Calculate the mean (the simple average of the numbers)
2. Then for each number: subtract the Mean (Deviation score)
3. Square the result (the squared deviation scores).
4. Then work out the average of those squared differences.
(Variance)
5. Then take the square root of the result (Standard deviation)

Standard Deviation

Formula for calculating Standard deviation


• S – Standard Deviation
• Deviation scores – (x- x)2
• Degrees of freedom - n – 1 (Since this is a sample , we need to take
sample size (n) minus one)

∑( − ̅ )2
=
−1
Variance

• Another measure of variability that may be encountered is the


variance. This is simply the square of the standard deviation.
• Variance is expressed in squared units of measurement, limiting its
usefulness as a descriptive term.
• Due to this reason, the variance is not generally used in data
description.
• Still, it is central to the analysis
y of variance,, which
w will not be possible
to calculate with standard ard deviations.
deevviat
ations.
io
o
∑( − ̅ )2
=
−1

Standard Deviation
• Calculate the standard deviation of the following data set
Raw Deviation Squared
score score deviation
X-X score
2
4
6
8
10
Coefficient of variance (CV)

• The CV measures the percentage of variation in the data relative to


the mean of data. The following formula is used to calculate the CV
Standard deviation
CV = X 100%
Mean

• This is a useful measure to compare variability between two different


data sets when the means are different or units of measurements are
different.

Coefficient of variance (CV)

Example
A researcher is comparing two test results performed by using different
techniques
The results from the two tests are given below
Subject A Subject B
Mean 59.9 44.8
SD 10.2 12.7
CV 17.03 28.35
Selecting an appropriate measure of variability
(Dispersion / Spread)

Summary measure of variability (Spread)


Type of
variable Interquartile Standard
Range
range deviation
Nominal No No No
Ordinal Yes Yes No
Yes, if the
Discrete / distribution is
Yes Yes
Continuous markedly
skewed

Thank You
[email protected]
0772004317

You might also like