01 - Scales of Mesurement - Sumarising Numeric Data
01 - Scales of Mesurement - Sumarising Numeric Data
• Introduction to Biostatistics
• Scales of measurement
• Summarizing numerical data with numbers
Joaniter Nankabirwa_Wandera
Introduction to statistics
• Statistics:
– Recorded data
interpretation of data
• Biostatistics:
Sources of data
• Surveys
– Examples of national surveys (MIS, DHS etc)
• Surveillance
– DHIS
– UMSP
• Records
– Hospitals
– Government institutions
• Planned studies
– NGOs
– Universities
– Researchers
VARIABLE TYPES IN STATISTICS
• Categorical variables
– Take on values that are names or labels
– E.g Sex, Education status, Disease Severity
• Binary/Dichotomous variable
– Categorical variables that take on only 2 values
– E.g Sex is male or female, responses with Yes/No
• Continuous variables
– Take on any set of values including decimal point
– E.g age, weight, height
• Discrete variable
– Takes on specific value
– E.g number of children, # of pregnancies
Scales of measurement
Ways in which variables/numbers are defined and
categorized
Nominal scale:
• Used when data fits into categories
• The data do not represent a quantity or an amount
• We count the number of observations (e.g Sex)
• They are often summarized as percentages or proportions
Ordinal Scales:
• Special kinds of nominal scales
• Used when an inherent order occurs among the categories
• E.g Stages of tumors, SES, level of education etc
Scales of measurement
Interval scale:
• Data is quantitative and measured in intervals
• There is order and there is difference between two values
• Distances between each interval on the scale are equivalent
along the
• E.g Temperature, Ph
Ratio scale:
• Data is also quantitative
• Has all properties of an interval scale
• In addition, has a clear definition/meaning of zero as having
nothing of that variable
• Example is weight
Numeric data
• The data types in a numeric data are
expressed in numbers
Dorothy 1 25 F 65
Anthony 2 27 M 65
Janet 3 27 F 58
Jovan 4 24 M 55
Rogers 5 26 M 72
Joselyn 6 24 F 48
Bernard 7 30 M 76
Daniel 8 34 M 78
Seti 9 29 M 62
Aziida 10 31 F 75
Mean
• Mean = Sum values/total # of values
mean Age
=(25+27+27+24+26+24+30+34+29+31)/10
=277/10
=27.7
The mean is sensitive to extreme values (out-liars) especially when the sample size is small (check it out)
1 100 F 65
2 27 M 65
3 27 F 58
4 24 M 55
5 26 M 72
6 24 F 48
7 30 M 76
8 34 M 78
9 29 M 62
10 31 F 75
Mean
• Mean = Sum values/total # of values
mean Age
=(100+27+27+24+26+24+30+34+29+31)/10
=352/10
=35.2
***** WITHOUT OUTLIER….27.7
If original observations are not available, use weighted average ; Through a frequency table.
Weighted mean =Ƹ(fx)/n (work it out)
20 - 24 II=2 22 44
25 - 29 IIIII=5 27 135
30 - 34 III=3 32 96
Total 275
Mean= Ƹ(fx)/n
Mean=275/10
=27.5
Measures of the middle continued (numeric data)
• Median
o Is the middle observation
o Point at which half the observations are smaller and half are
larger
• Steps
o Arrange observations from smallest to largest
o Count to find the middle value
o Median is the middle value for odd number of observations and
the mean of the two middle values for an even number of
observations
• Activity using our data
Position -----n/2,
=(24, 24, 25, 26, 27, 27, 29, 30, 31,34)
=27
With outlier=24, 24, 26, 27, 27, 29, 30, 31, 34,100 median 28
(24, 24, 25, 26, 27) (27, 29, 30, 31,34)
27.7 27.7
• 27.7 -24 =3.7 • 27.7 - 27 =0.7
• 3.7 • 27.7 – 29=-1.3
• 27.7-30=-2.3
Measures of the middle continued (numeric data)
• Mode
• Is the value that occurs most frequently
• When data set has two modes its called bimodal
• In the frequency tables the mode is also called the
model class
24, 24, 25, 26, 27, 27, 29, 30, 31,34
• Work it out using our data
Measures of the middle continued (numeric data)
• Median
– Used when data is skewed
• Mode:
– Used for bimodal data
• Geometric mean
– Used for observations measured on a logarithmic scale
Measures of spread (Numeric data)
• Range
o Difference between the largest and smallest
observation
(24, 24, 25, 26, 27, 27, 29, 30, 31,34) Range= 10 (24-34)
Measures of spread (Numeric data)
1
2
3
4
5
6
7
8
9
10
Measures of spread (Numeric data)
• Variance
• SD ˄2
• Often SD preferred to this
• Working it out using our data
• Coefficient of variation
• Often used in biological sciences
• It’s SD/Mean *100
• Work it out
Measures of spread (Numeric data)
• Percentiles
o Percentiles tell you where a score stands relative to other scores
o E.g In the growth chart for girls at 3 years, 90th percentile is 15kg.
o Means that for girls aged 3 years, 90% weigh 15 Kg and below
How to get percentiles & The Interquartile range
• Arrange data in increasing numbers
• Say we want 70% percentile for our 10 observations
• 70/100*10= 7th Position, Determine 7th position
• Work it out