0% found this document useful (0 votes)
13 views108 pages

Biostatistics 3

chapter 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views108 pages

Biostatistics 3

chapter 3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 108

Chapter 3:

Descriptive Statistics

1
Descriptive Statistics
 Techniques used to organize and summarize a set
of data.
 The best way to work with data is to summarize
and organize them.
 Numbers that have not been summarized and
organized are called raw data.

2
Cont.

 Descriptive statistics include:


• Tables
• Graphs
• Numerical summary measures
- Measures of central tendency
- Measures of variability

3
Methods of data organization and
presentation

4
Methods of data organization and
presentation
• The data collected in a survey is called raw data.
• Collected data need to be organized in such a way
as to condense the information they contain in a
way that will show patterns of variation clearly.
• Precise methods of analysis can be decided up on
only when the characteristics of the data are
understood.

5
Cont.

• For data to be more easily appreciated and to


draw quick comparisons, it is often useful to
arrange the data in the form of a table, or in one
of a number of different graphical forms.
• When analysing voluminous data collected from
health centre's records, it is quite useful to put
them into compact tables.

6
Cont.
• Quite often, the presentation of data in a
meaningful way is done by preparing a
frequency distribution.
• If this is not done, the raw data will not present
any meaning and any pattern in them (if any)
may not be detected.

7
Frequency Distributions
• Ordered array: A simple arrangement of
individual observations in order of magnitude.
• The actual summarization and organization of
data starts from frequency distribution.

8
Cont.

• Frequency distribution: A table which


involves a listing of all values of the studied
variable and how many times each value is
observed.
• Tables make easier to see how the data are
distributed

9
a) Qualitative variable: Count the number of cases
in each category.
Example1:
• A 25 patients entering ICU at a given hospital:
1. Medical
2. Surgical
3. Cardiac
4. Other

10
ICU Type Frequency Relative Frequency
(How often) (Proportionately often)

Medical 12 0.48
Surgical 6 0.24
Cardiac 5 0.20
Other 2 0.08

Total 25 1.00

11
Example 2:

A study was conducted to assess the


characteristics of a group of 234 smokers by
collecting data on gender and other variables.
Gender, 1 = male, 2 = female
Gender Frequency (n) Relative Frequency

Male 1 124 53.0%


Female 2 110 47.0%
Total 234 100%
12
b) Quantitative variable:
- Select a set of continuous, non-overlapping
intervals such that each value can be placed in
one, and only one of the intervals.

13
• To determine the number of class intervals and
the corresponding width, we may use:
Sturge’s rule:
K  1  3.322(logn)
LS
W
where K

• K = number of class intervals n = no. of


observations
• W = width of the class interval L = the largest
value
• S = the smallest value
14
Example:
– Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20 22 14 13
10 19 27 29 22 38 28 34 32 23 19 21 31 16 28 19 18
12 27 15 21 25 16
K = Largest val1 + 3.322 (logN)
K = 1 + 3.322 (log40) = 6.32 ≈ 6
Largest = 38, Smallest value = 10

Width = (38-10)/6 = 4.66 ≈ 5 15


Time Frequency Relative Cumulative
(Hours) Frequency Relative
Frequency
10-14 5 0.125 0.125
15-19 11 0.275 0.400
20-24 12 0.300 0.700
25-29 7 0.175 0.875
30-34 3 0.075 0.950
35-39 2 0.050 1.00

Total 40 1.00
16
• Cumulative frequencies: When frequencies of
two or more classes are added.
• Cumulative relative frequency: The Cumulative
Relative Frequency is the sum of the relative
frequencies for all values that are less than or
equal to the given value.
• Mid-point: The value of the interval which lies
midway between the lower and the upper limits of
a class.

17
• True limits: Are those limits that make an
interval of a continuous variable, continuous in
both directions.
• Used for smoothening of the class intervals
• Subtract 0.5 from the lower and add it to the
upper limit.

18
Time True limit Mid-point Frequency
(Hours)
10-14 9.5 – 14.5 12 5
15-19 14.5 – 19.5 17 11
20-24 19.5 – 24.5 22 12
25-29 24.5 – 29.5 27 7
30-34 29.5 – 34.5 32 3
35-39 34.5 - 39.5 37 2

Total 40
19
Exercise

Age of 30 pregnant mothers visited in Dr. Alag


MCH center in Burao city on March, 2021.
20 24 30 15 16 19 26 17 32
19 22 21 26 23 18 19 27 20
18 28 17 30 31 29 18 26 27
25 19 23

20
Calculate
• Number of class intervals
• Width then
• Frequency, relative frequency and cumulative
relative frequency

21
Exercise
Age of 35 mothers who delivered at Burao General
Hospital on June, 2019.
24 30 40 18 19 26 20 32 19 22
21 26 23 18 19 27 20 18 28 30
32 38 29 41 26 27 39 23 31
25 33 34
35 18 37

22
Calculate
• Number of class intervals
• Width then
• Frequency, relative frequency and cumulative
relative frequency

23
Exercise
The following table shows the number of hours
45 hospital patients slept following the
administration of a certain anesthetic.
7 10 12 4 8 7 3 8 5 12 11 3 8
1 1
13 10 4 4 5 5 8 7 7 3 2 3 8
13 1 7 17 3 4 5 5 3 1 17 10 4
7 7 11 8

24
Calculate
• Number of class intervals
• Width then
• Frequency, relative frequency and cumulative
relative frequency

25
Exercise
25 workers’ Dollar wages of company X in 2018.
24 33 18 19 26 19
21 23 19 27 20 25
28 30 29 32 29 31
35 27 22 28 26 34
24

26
Calculate
• Number of class intervals
• Width then
• Frequency, relative frequency and cumulative
relative frequency

27
Exercise
This is a data set that shows a distribution of the
age of 28 men at the time of marriage at distinct Y
in 2017.
19 33 18 19 26
24 21 23 19 27
20 25 28 30 29
32 29 31 35 27
22 28 26 34 24
30 22 18
28
Calculate
• Number of class intervals
• Width then
• Frequency, relative frequency and cumulative
relative frequency

29
Tables can also be used to present more than
one variables

Variable Frequency (n) Percent


.
Sex
Male
Female
Age (yrs)
15-19
20-24
25-29
Religion
Christian
Muslim
Occupation
Student
Farmer
Merchant

30
Guidelines for Constructing Tables
• Keep them simple,
• Limit the number of variables to three or less,
• All tables should be self-explanatory,
• Include clear title telling what, when and where,
• Clearly label the rows and columns,
• Explain codes and abbreviations in the foot-note,
• Show totals,
• If data is not original, indicate the source in foot-note
31
32
Graphs
• Well designed graphs can be powerful means
of communicating a great deal of information.
• When graphs are poorly designed, they don’t
only miss to express the message, but they are
often misleading.

33
Specific types of graphs include:
• Bar graph Nominal, ordinal and
• Pie chart discrete data

• Histogram
• Scatter plot
Continuous data
• Line graph
• Others

34
Bar charts (or graphs)
• Categories are listed on the horizontal axis (X-
axis)
• Frequencies or relative frequencies are
represented on the Y-axis (ordinate).
• The height of each bar is proportional to the
frequency or relative frequency of observations
in that category.

35
Bar chart for the reason of ICU for 25
patients

36
Method of constructing bar chart
• All the bars must have equal width
• The bars are not joined together
• The different bars should be separated by equal
distances
• All the bars should rest on the same line called
the base.

37
Example: Construct a bar chart for the
following data
Distribution of patients in hospital by source of referral
Source of referral No. of patients Relative freq.
Other hospital 97 5.1
General practitioner 769 40.3
Out-patient department 623 32.7
Casualty 256 13.4
Other 161 8.5
Total 1 906 100.0

38
Distribution of patients in hopital X by source of referal, 1999
769
800

700 623
600
No. of pat i ent s

500
400

300 256

200 161
97
100
0
Other GP OPD Casualty Other
hospital
Source of referal

39
Sub-divided Bar chart
• If there are different quantities forming the sub-
divisions of the totals, simple bars may be sub-
divided in the ratio of the various sub-divisions
to exhibit the relationship of the parts to the
whole.
• The order in which the components are shown in
a “bar” is followed in all bars used in the
diagram.

40
• Example: Plasmodium species distribution for
confirmed malaria cases, Zeway, 2003.

100 Mixed
P. vivax
80 P. falciparum

60
Percent

40

20

0
August October December
2003

41
Pie chart
• Shows the relative frequency for each
category by dividing a circle into sectors, the
angles of which are proportional to the
relative frequency.
• Use percentage distributions
• Used for a single categorical variable

42
Steps to construct a pie-chart
• Construct a frequency table
• Change the frequency into percentage (P)
• Change the percentages into degrees, where:
degree = Percentage X 360o
• Draw a circle and divide it accordingly

43
Example: Distribution of deaths for females, in
England and Wales, 1989.
Cause of death Number of death
Circulatory system (C) 100 000
Neoplasmas (Cancer) (N) 70 000
Respiratory system (R) 30 000
Injury and poisoning (I) 6000
Digestive System (D) 10 000
Others 20 000
Total 236 000
44
Distribution fo cause of death for females, in England and Wales, 1989

Others
8%
Digestive System
4%
Injury and Poisoning
3%

Circulatory system
Respiratory system
42%
13%

Neoplasmas
30%

45
Histogram
• Histograms are frequency distributions with
continuous class intervals that have been turned
into graphs.
• To construct a histogram, we draw the interval
boundaries on a horizontal line and the
frequencies on a vertical line.
• Non-overlapping intervals that cover all of the
data values must be used.

46
• Example: Distribution of the age of women at
the time of marriage

Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49


Group
No. of 11 36 28 13 7 3 3
Women

47
Age of women at the time of marriage

40

35

30

25
No of women

20

15

10

0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group

48
• Histogram for the ages of 2087 mothers with
<5 children, Adami Tulu, 2003.
700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6

0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH
49
Frequency Polygon
• A frequency distribution can be portrayed
graphically in yet another way by means of a
frequency polygon.
• To draw a frequency polygon we connect the
mid-point of the tops of the cells of the
histogram by a straight line.

50
• Frequency polygon for the ages of 2087
mothers with <5 children, Adami Tulu, 2003.
700

600

500

400

300

200

100 Std. Dev = 6.13


Mean = 27.6

0 N = 2087.00
15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0

N1AGEMOTH

51
It can be also drawn without erecting rectangles by
joining the top midpoints of the intervals
representing the frequency of the classes as
follows: Age of women at the time of marriage

40

35

30
No of women

25

20

15

10

0
12 17 22 27 32 37 42 47
Age
52
Percentiles (Quartiles)
• Suppose that 50% of a cohort survived at least 4
years.
• This also means that 50% survived at most 4
years.
• We say 4 years is the median.
• The median is also called the 50th percentile
• We write: P50 = 4 years

53
Cont.

Similarly we could speak of other percentiles:


– P25: 25% of the sample values are less than or
equal to this value. 1st Quartile P25 means 25th
percentile
– P50: 50% of the sample are less than or equal
to this value. 2nd Quartile
– P75: 75% of the sample values are less than or
equal to this value. 3rd Quartile

54
• It is possible to estimate the values of percentiles
from a cumulative frequency polygon.

55
Numerical Summary Measures
– Single number which quantify the
characteristics of a distribution of values
 Measures of central tendency or location
 Measures of dispersion

56
Measures of Central Tendency (MCT)

57
Measures of Central Tendency
• On the scale of values of a variable there is a
certain stage at which the largest number of items
tend to cluster.
• Since this stage is usually in the centre of
distribution, the tendency of the statistical data
to get concentrated at a certain value is called
“central tendency”

58
• The various methods of determining the point
about which the observations tend to concentrate
are called Measures of Central Tendency
(MCT).
• The objective of calculating MCT is to determine
a single figure which may be used to represent
the whole data set.
• Since a MCT represents the entire data, it
facilitates comparison within one group or
between groups of data.

59
Characteristics of a good MCT
A MCT is good or satisfactory if it possesses
the following characteristics.
1. MCT should be based on all the
observations
2. It should not be affected by the extreme
values
3. It should have a definite value
4. It should not be subjected to complicated
and tedious calculations
5. It should be stable with regard to sampling
60
• The most common measures of central
tendency include:
– Mean
– Median
– Mode
– Others

61
The Arithmetic Mean or simple Mean
a) Ungrouped Data
• The arithmetic Mean is the "average" which is
obtained by adding all the values in a sample or
population and dividing them by the number of
values.

62
The heart rates for 10 patients were as follows
(beats per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150
What is the average heart rate for these patients?
The sample mean:
X  X / n
X = (167 + 120 + 150 + 125 +
150+140+40+136+120+150)/10
= 1298/10 = 129.8 beats per minute

63
b) Grouped data:
In calculating mean from grouped data, we
assume that all values falling into particular class
are located at the mid point of interval. It is
calculated as follow:

64
• Example: Compute the mean age of 169 subjects from
the grouped data.
Class interval Mid-point (mi) Frequency (fi) mifi
10-19 14.5 4 58.0
20-29 24.5 66 1617.0
30-39 34.5 47 1621.5
40-49 44.5 36 1602.0
50-59 54.5 12 654.0
60-69 64.5 4 258.0

Total 169 5810.5

Mean = 5810.5/169 = 34.48 years


65
The mean can be thought of as a “balancing point”,
“center of gravity”

66
When the data are skewed, the mean is “dragged”
in the direction of the skewness.

67
Properties of the Arithmetic Mean

• For any given set of data there is one arithmetic


mean.
• Easy to calculate and understand.
• Influenced by each value
• Greatly affected by the extreme values.

68
Median
a) Ungrouped data
• The median is the value which divides the data
set into two equal parts.
• If the number of values is odd, the median will
be the middle value when all values are arranged
in order of magnitude.

69
• When the number of observations is even, there
is no single middle value but two middle
observations.
• In this case the median is the mean of these two
middle observations, when all observations have
been arranged in the order of their magnitude.

70
71
Example 1
• Calculate the medium of this biostatistics exam
result?
65 50 85 46 70 75 60
80 90
Solution:
 First arrange the sample in ascending order
46 50 60 65 70 75 80 85 90
 Since n = 9, it is Odd number
 Medium = n+1/2 9+1/2 = 5
 So, the medium is the 5th number which is 70
72
Example 2
• Calculate the medium of this Pharmacology exam
result?
65 50 85 46 70 75 60 80 90 40
Solution:
•First arrange the sample in ascending order
4046 50 60 65 70 75 80 85 90

•Since n =10, it is even number


•Medium = the average of (n/2) and n+2/2 =
10/2 and 10+2/2
73
Cont…

• So, the medium is the 5th and 6th number.


Which are is 65 and 70 = 65+70/2 = 67.5
• The average of medium is 67.5

74
b) Grouped data

• In calculating the median from grouped data, we


assume that the values within a class-interval are
evenly distributed through the interval.
• The first step is to locate the class interval in
which the median is located, using the following
procedure.

75
• Find n/2 and see a class interval with a
minimum cumulative frequency which
contains n/2.
• Then, use the following formula.

76
n 
  Fc 
~
x = Lm  2 W
 fm 
 
where,
Lm = lower true class boundary of the interval containing the median
Fc = cumulative frequency of the interval just above the median
class
interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations
77
Example: Compute the median age of 169 subjects from the grouped data.
n/2 = 169/2 = 84.5

Class Mid-point Frequency Cum. freq


interval (mi) (fi)
10-19 14.5 4 4
20-29 24.5 66 70
30-39 34.5 47 117
40-49 44.5 36 153
50-59 54.5 12 165
60-69 64.5 4 169
Total 169
78
• n/2 = 84.5 = in the 3rd class interval
• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• (n/2 – fc) = 84.5-70 = 14.5

• Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

79
Properties of the Median
• There is only one median for a given set of
data
• The median is easy to calculate
• Median is a positional average and hence it is
not drastically affected by extreme values
• It is not a good representative of data if the
number of items is small

80
The median is a better description (than the
mean) of the majority when the distribution
is skewed
Example:
• Data are: 14, 89, 93, 95, 96
• Skewness is reflected in the outlying
low value of 14
• The sample mean is 77.4
• The median is 93

81
Mode
• The mode is the most frequently occurring
value in a set of data.
• It is not influenced by extreme values.
• It is possible to have more than one mode or
no mode.
• It is not a good summary of the majority of the
data.

82
Example:
a) Ungrouped data
• Find the modal values for the following data
a) 22, 66, 69, 70, 73. (no mode)
b) 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5
(mode = 3.0 kg)
c ) 9, 2, 10, 9, 5, 10, 8, 4, 12 ( mode = 9
and 10)

83
b) Grouped data

• In find the mode of grouped data, we usually


refer to the modal class, where the modal
class is the class interval with the highest
frequency.

84
85
Properties of mode
• It is not affected by extreme values
• It can be calculated for distributions with open
end classes
• Often its value is not unique
• The main drawback of mode is that often it
does not exist

86
Exercise
• Calculate mean, medium and mode from the
following ungrouped data:
22, 10, 12, 23,25,20, 24,26, 25, 12

87
Exercise
• Calculate mean, medium and mode from the
following ungrouped data:
18 17 17 18 18 19 9 34 18 10 22
19 10 15 14 10 10 16 21 11 1

88
Exercise
• Find the mean, medium and mode of the following
grouped data.

Class interval Mid Frequency(fi) Cum. freq


Point(mi)

1-5 3 6 6
6-10 8 3 9
11-15 13 5 14
16-20 18 10 24
21-25 23 11 35
26-30 28 10 45
Total 45
89
Measures of Dispersion

90
Measures of Dispersion
• Dispersion refers to the variety exhibited by the
values of the data.
• The amount may be small when the values are
close together.
• MCT are not enough to give a clear
understanding about the distribution of the data.
• Moreover, two or more sets may have the same
mean and/or median but they may be quite
different.

91
Consider the following data sets:

Set 1: 60 40 30 50 60 40 70
Set 2: 50 49 49 51 48 50 53
• The two data sets given above have a mean of
50, but obviously set 1 is more “spread out”
than set 2.
• How do we express this numerically?

92
• We need to know something about the
variability or spread of the values — whether
they tend to be clustered close together, or
spread out over a broad range.

93
• Measures of dispersion include:
– Range
– Variance
– Standard deviation
– Others

94
1. Range (R)
• The range is the difference between the largest
and smallest values in the set of observations.
• These values are often called the maximum
and the minimum.

95
Example
Set 1: 60 40 30 50 60 40 70
Set 2: 50 49 49 51 48 50 53

* The range of data in set 1 is 70-30 = 40


* The range of data in set 2 is 53-48 = 5

96
Properties of range
• It is the simplest crude measure and can be
easily understood
• It takes into account only two values which
causes it to be a poor measure of dispersion
• Very sensitive to extreme values

97
2. Variance (2, S2)

• Variance is used to measure the dispersion of


values relative to the mean.
• When values are close to their mean (narrow
range) the dispersion is less than when there is
scattering over a wide range.
• Population variance = σ2
• Sample variance = S2

98
A sample variance is calculated for a sample of
individual values (X1, X2, … Xn) and uses the sample
mean (e.g. ) rather than the population mean µ.

99
3. Standard deviation (, S)
• It is the positive square root of the variance.

   and S = S2 2

100
Properties of Variance
• The main demerit of variance is that its unit is
the square of the unite of the original
measurement values
• The variance gives more weight to the extreme
values as compared to those which are near to
mean value, because the difference is squared in
variance.
• The drawbacks of variance are overcome by the
standard deviation.

101
Example
• Following are the survival times of n=11
patients after heart transplant surgery.
• Patients are identified numerically, from 1 to
11.
• The survival time for the “ith” patient is
represented as Xi for i= 1, …, 11.
• Calculate the sample variance and SD.

102
103
Properties of SD
• The SD has the advantage of being expressed in
the same units of measurement as the mean.
• SD is considered to be the best measure of
dispersion and is used widely because of the
properties of the theoretical normal curve.
• However, if the units of measurements of
variables of two data sets is not the same, then
there variability can’t be compared by comparing
the values of SD.

104
Exercise
• Find the mean, medium, mode, range, variance
and standard deviation of the following exam
result data and correct it to 2 decimal places:
74, 72, 83, 96, 64, 79, 88, 69

105
Exercise
Find the mean, medium, mode; range, variance
and standard deviation of the following exam
result data and correct it to 2 decimal places.
81, 81, 88, 72, 79, 81, 85, 72, 89, 72, 80, 72, 90,
71, 90, 88, 81, 90

106
Exercise
Find the mean, range, variance and standard
deviation of the following data set and correct it to
2 decimal places:
16 24 18 14 20 36 26 23 16 15 19 20 22 14
19 10 19 27 29 22 38 34 32 23 19 21 31 16
28 19 12 27 15 21 25

107
108

You might also like