Biostatistics 1
Biostatistics 1
Use of Biostatistics:
To design clinical trials to test new drugs or treatments.
To analyze data from surveys to track the spread of diseases.
To develop statistical models to predict the risk of developing certain
diseases.
To work with public health officials to develop programs to improve the
health of populations.
Data
Data are measurements or observations that are collected as a source of
information.
There are two main types of Data:
Qualitative Data and Quantitative Data
Qualitative or Categorical:
Qualitative data are generally described by words or letters. They are not as
widely used as quantitative data because many numerical techniques do not
apply to the qualitative data. For example, it does not make sense to find an
average hair color or blood type.
There are two subgroup of Qualitative Data
Nominal :-
A nominal variable is another name for a categorical variable. Nominal
variables have two or more categories without having any kind of natural
order. they are variables with no numeric value, such as occupation or
political party affiliation. Another way of thinking about nominal variables is
that they are Named (nominal is from Latin nominalis, meaning pertaining
to names).
Ordinal :-
The ordinal scale classifies according to Rank.
Means in order. Includes “First,” “second” and “ninety ninth.
Quantitative or Numerical
Quantitative data are always numbers and are the result of counting or
measuring attributes of a population.
Quantitative data can be separated into two subgroups:
discrete (if it is the result of counting (the number of students of a given
ethnic group in a class, the number of books on a shelf, ...)
continuous (if it is the result of measuring (distance traveled, weight of
luggage, …)
Data type
Nominal
Qualitative or
Categorical
Ordinal
Data
Discrete
Quantitative
or Numerical
Continuous
Example:
If you want to buy a shirt from Amazon.
Color: Red, blue, white
Pattern: plane, line, checks
Size: M, L, XL
Rating: 5 star, 4 star
Price: 499, 999
Discount: 25%, 50%
Color, Pattern
Qualitative or
Categorical
Rating, Size
Data
Quantity
Quantitative
or Numerical
Price,
Discount
Example:
Qualitative Quantitative
Profession
Number of
(Doctor,
Patients
Chemist)
Hypertension
SpO2
Stage( I, II, III)
Pain Intensity
(Mild, moderate,
severe)
Mathematical Statistics:
Measure of central tendency /Measures of Center:
Measures of central tendency are statistical measures used to describe the center
or average value of a dataset.
A measure of center is a value at the center or middle of a data set.
There are several different ways to determine the center, so we have different
definitions of measures of center, including the mean, median, and mode. We
begin with the mean.
Median:
The median of a data set is the measure of center that is the middle value
when the original data values are arranged in order of increasing (or
decreasing) magnitude.
To find the median, first sort the values (arrange them in order), then
follow one of these two procedures:
1. If the number of data values is odd, the median is the number located in
the exact middle of the list.
2. If the number of data values is even, the median is found by computing
the mean of the two middle numbers
Example 1:
Find the median for this sample of data values,
25, 74, 36, 46, 17, 57, 62.
First sort the data values, as shown below:
17, 25, 36, 46, 57, 62, 74
Because the number of data values is an odd number (7), the median is the
number located in the exact middle of the sorted list, that is 46
Example 2:
Find the median for this sample of data values,
25, 74, 36, 46, 17, 57, 62, 32
First sort the data values, as shown below:
17, 25, 32, 36, 46, 57, 62, 74
Because the number of data values is an odd number (8),
the median is found by computing the mean of the two middle numbers,
which are 36 and 46,
So median is,
36 + 46
𝑀𝑒𝑑𝑖𝑎𝑛 = = 41
2
Mode:
The mode of a data set is the value that occurs with the greatest
frequency.
A data set can have one mode, more than one mode, or no mode.
When two data values occur with the same greatest frequency, each one is
a mode and the data set is bimodal.
When more than two data values occur with the same greatest frequency,
each is a mode and the data set is said to be multimodal.
When no data value is repeated, we say that there is no mode.
Example:
Find mode of following data,
15, 24, 36, 41, 25, 24, 37, 36, 24, 68, 72
The mode is 24, because it is the data value with the greatest frequency.
Marks Frequency
1-10 5
11-20 8
21-30 13
31-40 23
41-50 11
Example: The following table shows the number of students and the time
they utilized daily for their studies. Find the mean time spent by students
for their studies.
Time (in Hrs.) 0-2 2-4 4-6 6-8 8-10
Students 7 20 12 8 3
Answer:
0-2 7 1 7
2-4 20 3 60
4-6 12 5 60
6-8 8 7 56
8-10 3 9 27
Total 50 210
∑ 𝑓 ∗ 𝑥 210
𝑀𝑒𝑎𝑛 = = = 4.2
∑𝑓 50
Thus mean time spent by students is 4.2 hours
Median for Grouped data:
In previous example:
N=50, And N/2= 25, Hence 25Th observation will be approximate median.
First we write cumulative frequency (cf),
Time frequency f cf
0-2 7 7
2-4 20 27
4-6 12 39
6-8 8 47
8-10 3 50
Here, median class is 2-4 because N/2=25 belong to that class,
L=2, N=50, h= 2, f=20, cf= 7.
𝑁
− 𝑐𝑓 25 − 7
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + [ 2 ]∗ℎ =2+[ ]∗2
𝑓 20
2
𝑀𝑒𝑑𝑖𝑎𝑛 = 4 + ∗ 2 = 4 + 1 = 5
4
Thus median is 5
Now, to find Modal Class
here, maximum frequency is 4.
The mode class is 4-6.
Range:
Range is the simplest measure of dispersion. It is defined as the difference
between the largest value and the smallest value in the data.
Thus,
Range = Largest Value – Smallest Value
Range = L – S
Where, L = Largest Value and S = Smallest Value.
Example: Following data gives weights of 10 students (in kgs) in a certain
school. Find the range of the data.
70, 62, 38, 55, 43, 73, 36, 58, 65, 47
Solution: Smallest Value = S = 36 Largest Value = L = 73
Range = L – S = 73 – 36 = 37
Variance:
The variance of a variable X is defined as the arithmetic mean of the
squares of all deviations of X taken from its arithmetic mean.
In other words,
A measurement of how far each number in a data set is from the mean, and thus
from every other number in the set.
It is denoted by 𝑉𝑎𝑟(𝑋) 𝑜𝑟 𝜎 2 .
𝑛
1
𝑉𝑎𝑟(𝑋) = 𝜎 2 = ∑(𝑋𝑖 − 𝑋̅)2
𝑛
𝑖=1
Standard deviation:
Standard Deviation is defined as the positive square root of the variance.
It is denoted by 𝜎 (𝑠𝑖𝑔𝑚𝑎)
Example: Compute variance and standard deviation of the following data.
9, 12, 15, 18, 21, 24, 27.
Example: A die is rolled 30 times and the following distribution is obtained.
Find the variance and S.D,
Score 1 2 3 4 5 6
Frequency 2 6 2 5 10 5
Solution: