Summry Biostatstics
Summry Biostatstics
1
Techniques or procedures for interpreting, displaying, analyzing and making
Statistics
decisions based on data.
2
Statistics deals
Biostatistics: A branch of statistics directed toward applications
with data in health science and biology.
3
4
Data Classification
1. Qualitative data (non-numerical data) where mathematical operations are meaningless like colors,
names, nationalities, ranks, gender, …
2. Quantitative data (numerical) where mathematical operations are meaningful like: body mass,
heights, temperatures, lengths, duration of seizure, …
Remark
5
Classification Quantitative data are either grouped (within intervals) or ungrouped
of (discrete)
Quantitative Data
Frequency table: a method for organizing both qualitative and quantitative data.
Organizing
Frequency table consists of:
Data 1. Classes
2. Tally marks
3. Frequency denoted by f.
6
Example: Put the following data about blood types in a frequency table.
A, B, AB, A, O, A, B, AB, O, A, AB, A, O, B, B, A, A, A, O, A, A.
Solution:
Example: The following table contains the age of death in day for 78 cases of sudden infant death
syndrome during 1976 − 1977.
7
Remark:
4. This length should be rounded from above to the accuracy unit of the observation.
5. Cumulative frequency is obtained by adding sequential frequencies together.
𝐿+𝑈
6. Midpoint=
2
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
7. Relative Frequency =
𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
1 1
9. Actual class= [𝐿 − ,𝑈 + ]
2 2
8
Number of classes in
The number of class intervals is the closest integer k
a frequency table
to 1 + 3.322(𝑙𝑜𝑔10 𝑛)
where n is the total number of observations.
Example: The following data are the age of 30 people, rounded to the nearest year, who have been
discharged from a general hospital last Friday
51 70 79 75 55 25 38 74 54 72 37 15 56 17 77 43 16 15 72 92
30 24 46 47 46 38 81 49 45 25
9
Solution: n= 30
In order to put the above data in an ordered array we just list the measurements from the smallest to
the largest: 15 15 16 17 24 25 25 30 37 38 38 43 45 46 46 47 49 51 54 55 56
70 72 72 74 75 77 79 81 92.
92−15
Class length= = 12.33 ≈ 13
6
Accuracy= 1 unit
10
Example: Consider the following cumulative frequency distribution.
Solution:
a) Class length= 15 − 10 + 1 = 6
7
b) Relative frequency of the second class = = 0.14
50
c)
7+25+4
Proportion of observations= = 0.72
50
11
12
Graphical Representation
of Data
Visual display of data and statistical results is more often and effective than presenting data in tabular
form. We will focus on these five ways: Bar Chart, Histogram, Pie chart, Polygon and Ogive (CFC).
Bar Chart
Rectangles spaced out with equal spaces between them and having equal width. The height of each bar
corresponds to the frequency of a particular observation.
14
Example: In a biostatistics test, 28 students got these grades:
15
Series of rectangular bars with no spaces between them. It divides the entire
Histogram
range of values into a series of intervals.
Example: The histogram below shows the heights (in cm) distribution of 30 people. What is the frequency
table?
Solution:
16
Straight lines connecting set of points which represent the midpoints of
Polygon intervals. It is used with sets of continuous data like heights, weights, ….
17
18
19
20
1. Measures of central tendency
Descriptive Statistical
measures
2. Measures of variation (dispersion)
Measures of central tendency describe where the data are located. The most common measures of
central tendency are: Arithmetic mean, median, mode.
21
Arithmetic Mean
Arithmetic mean:
σ𝑛 𝑥 𝑥 𝑥 + …+ 𝑥𝑛
Let 𝑥1 , 𝑥2 , …, 𝑥𝑛 be a sample of size 𝑛. The sample mean (average) is: 𝑋 = 𝑖=1 𝑖 = 1+ 2
ത
𝑛 𝑛
Examples:
9+3+7+3+8+10+2
Answer: 𝑋ത = =6
7
2. If the marks of five students (out of 100) are: 96, 94, 72, 52, 56, find the average of their marks.
96+94+72+52+56
Answer: 𝑋ത = = 74
5
22
Example: Use the frequency table of Apgar scores of 100 low birthweight infants to evaluate the average
of birthweights.
Solution:
0 6 + 1 1 + 2 3 + 3 4 + 4 5 + 5 10 + 6 11 + 7 23 + 8 24 +(9)(13)
𝑋ത = = 6.25
100
1. Uniqueness: for a given set of data there is one and only one mean.
2. Simplicity: it is so easy to compute the mean of any sample.
3. The value of each data item has an influence on the mean, thus the mean is affected by extreme
values, this makes the mean, in some cases, not a good representative of the tendency of the values of
the majority of the data.
23
Median
Let 𝑥1 , 𝑥2 , …, 𝑥𝑛 be a sample of size 𝑛, and 𝑥(1) , 𝑥(2) , …, 𝑥(𝑛) be the sorted sample. The sample median is:
𝑥 𝑛+1 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
Median = ൞𝑥 𝑛 + 𝑥 𝑛+1
2 2
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2
Solution:
a. 7 , 16 , 17 , 20 , 29 , 38 , 56 ⟹ 𝑚𝑒𝑑𝑖𝑎𝑛 = 20
20+32
b. 17 , 20 , 32 , 56 ⟹ 𝑚𝑒𝑑𝑖𝑎𝑛 = = 26
2
24
Example: Find the median for data in the following frequency table:
Solution:
13
Total frequencies is 13 ⟹ = 6.5 ≈ 7.
2
So, the median is 𝑥7 = 15
Properties 1. Uniqueness.
of Median 2. Simplicity.
3. Unlike the mean, it is not drastically affected by extreme values.
25
Mode
Remark
Solution:
a. 8
b. Mode does not exist.
26
Dispersion statistics summarize the scatter or spread of
Measures of Variation (dispersion)
data. Most of these functions describe deviation from a
particular location.
Range
27
Example: Find the range for the following list of values: 13,18,13,14,16,14,21,13
Solution:
Range = 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 21 − 13 = 8
Example: If Jordan's hottest temperature was 39.2° in 2018, and the range in temperature is 40.7°,
what was the coldest temperature?
Solution:
Range = 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
40.7° = 39.2° − 𝑚𝑖𝑛
𝑚𝑖𝑛 = −1.5°
28
29
30
31
32