0% found this document useful (0 votes)
11 views19 pages

Biostatistics 1

PSM Biostatistics

Uploaded by

Ayaan Baig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

Biostatistics 1

PSM Biostatistics

Uploaded by

Ayaan Baig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Statistics

Statistics is the science of collecting, analyzing, interpreting, and presenting


data. It is a vast and complex field, with applications in many different
areas, including business, government, healthcare, and education.
Biostatistics
 Biostatistics is the use of statistics to collect, analyze, and interpret data
about living things.
 Biostatisticians use statistical methods to help scientists and doctors
understand how diseases work, develop new treatments, and improve
public health.

Use of Biostatistics:
 To design clinical trials to test new drugs or treatments.
 To analyze data from surveys to track the spread of diseases.
 To develop statistical models to predict the risk of developing certain
diseases.
 To work with public health officials to develop programs to improve the
health of populations.

Data
 Data are measurements or observations that are collected as a source of
information.
There are two main types of Data:
 Qualitative Data and Quantitative Data
Qualitative or Categorical:
Qualitative data are generally described by words or letters. They are not as
widely used as quantitative data because many numerical techniques do not
apply to the qualitative data. For example, it does not make sense to find an
average hair color or blood type.
There are two subgroup of Qualitative Data

Nominal and Ordinal

Nominal :-
 A nominal variable is another name for a categorical variable. Nominal
variables have two or more categories without having any kind of natural
order. they are variables with no numeric value, such as occupation or
political party affiliation. Another way of thinking about nominal variables is
that they are Named (nominal is from Latin nominalis, meaning pertaining
to names).
Ordinal :-
 The ordinal scale classifies according to Rank.
Means in order. Includes “First,” “second” and “ninety ninth.
Quantitative or Numerical
Quantitative data are always numbers and are the result of counting or
measuring attributes of a population.
Quantitative data can be separated into two subgroups:
 discrete (if it is the result of counting (the number of students of a given
ethnic group in a class, the number of books on a shelf, ...)
 continuous (if it is the result of measuring (distance traveled, weight of
luggage, …)

Data type

Nominal
Qualitative or
Categorical
Ordinal
Data
Discrete
Quantitative
or Numerical
Continuous

Example:
If you want to buy a shirt from Amazon.
Color: Red, blue, white
Pattern: plane, line, checks
Size: M, L, XL
Rating: 5 star, 4 star
Price: 499, 999
Discount: 25%, 50%
Color, Pattern
Qualitative or
Categorical
Rating, Size
Data
Quantity
Quantitative
or Numerical
Price,
Discount

Example:

Qualitative Quantitative

Profession
Number of
(Doctor,
Patients
Chemist)

Hypertension
SpO2
Stage( I, II, III)

Pain Intensity
(Mild, moderate,
severe)
Mathematical Statistics:
 Measure of central tendency /Measures of Center:
Measures of central tendency are statistical measures used to describe the center
or average value of a dataset.
A measure of center is a value at the center or middle of a data set.
There are several different ways to determine the center, so we have different
definitions of measures of center, including the mean, median, and mode. We
begin with the mean.

The arithmetic mean:


The arithmetic mean, or the mean, of a set of data is the measure of center found
by adding the data values and dividing the total by the number of data values.

Median:
The median of a data set is the measure of center that is the middle value
when the original data values are arranged in order of increasing (or
decreasing) magnitude.
 To find the median, first sort the values (arrange them in order), then
follow one of these two procedures:
 1. If the number of data values is odd, the median is the number located in
the exact middle of the list.
 2. If the number of data values is even, the median is found by computing
the mean of the two middle numbers
Example 1:
Find the median for this sample of data values,
 25, 74, 36, 46, 17, 57, 62.
 First sort the data values, as shown below:
 17, 25, 36, 46, 57, 62, 74
 Because the number of data values is an odd number (7), the median is the
number located in the exact middle of the sorted list, that is 46
Example 2:
Find the median for this sample of data values,
 25, 74, 36, 46, 17, 57, 62, 32
 First sort the data values, as shown below:
 17, 25, 32, 36, 46, 57, 62, 74
 Because the number of data values is an odd number (8),
 the median is found by computing the mean of the two middle numbers,
which are 36 and 46,
So median is,
36 + 46
𝑀𝑒𝑑𝑖𝑎𝑛 = = 41
2
Mode:
The mode of a data set is the value that occurs with the greatest
frequency.
 A data set can have one mode, more than one mode, or no mode.
 When two data values occur with the same greatest frequency, each one is
a mode and the data set is bimodal.
 When more than two data values occur with the same greatest frequency,
each is a mode and the data set is said to be multimodal.
 When no data value is repeated, we say that there is no mode.
Example:
Find mode of following data,
15, 24, 36, 41, 25, 24, 37, 36, 24, 68, 72
 The mode is 24, because it is the data value with the greatest frequency.

 Two modes: The values of 0, 0, 0, 1, 1, 2, 3, 5, 5, 5 have two modes: 0 and 5.


 No mode: The values of 0, 1, 2, 3, 5 have no mode because no value occurs
more than once.
Frequency Distribution:
 A frequency distribution lists data values (either individually or by groups of
intervals), along with their corresponding frequencies (or counts).
 Example: Frequency Distribution of Marks
 Here The frequency for a particular class is the number of original values
that fall into that class

Marks Frequency
1-10 5
11-20 8
21-30 13
31-40 23
41-50 11

i.e. 8 students get marks in between 11 to 20


 Lower class limits are the smallest numbers that can belong to the different
classes (example has lower class limits of 1, 11, 21, 31, 41)
 Upper class limits are the largest numbers that can belong to the different
classes
 Class boundaries are the numbers used to separate classes, but without
the gaps created by class limits. They are obtained as follows: Find the size
of the gap between the upper class limit of one class and the lower class
limit of the next class. Add half of that amount to each upper class limit to
find the upper class boundaries; subtract half of that amount from each
lower class limit to find the lower class boundaries.
 Class midpoints are the midpoints of the classes.
Mean from Grouped data/Frequency Distribution:

 Example: The following table shows the number of students and the time
they utilized daily for their studies. Find the mean time spent by students
for their studies.
Time (in Hrs.) 0-2 2-4 4-6 6-8 8-10

Students 7 20 12 8 3
Answer:

Time frequency f midpoint x f*x

0-2 7 1 7

2-4 20 3 60
4-6 12 5 60

6-8 8 7 56

8-10 3 9 27

Total 50 210
∑ 𝑓 ∗ 𝑥 210
𝑀𝑒𝑎𝑛 = = = 4.2
∑𝑓 50
Thus mean time spent by students is 4.2 hours
Median for Grouped data:

 In previous example:
N=50, And N/2= 25, Hence 25Th observation will be approximate median.
First we write cumulative frequency (cf),

Time frequency f cf

0-2 7 7
2-4 20 27

4-6 12 39

6-8 8 47

8-10 3 50
 Here, median class is 2-4 because N/2=25 belong to that class,
L=2, N=50, h= 2, f=20, cf= 7.
𝑁
− 𝑐𝑓 25 − 7
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + [ 2 ]∗ℎ =2+[ ]∗2
𝑓 20

𝑀𝑒𝑑𝑖𝑎𝑛 = 2 + 1.8 = 3.8 hours


Mode:
 We know that the score repeating maximum number of times in a data is
called the mode of the data
Example: Calculate Mean, Median, Mode from the following grouped data
Class Frequency
2-4 3
4-6 4
6-8 2
8 - 10 1
Solution:
Now
L=lower boundary point of median class =4

n=Total frequency =10

cf=Cumulative frequency of the class preceding the median class =3

f=Frequency of the median class =4

h=class length of median class =2


𝑁
− 𝑐𝑓 5−3
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + [ 2 ]∗ℎ = 4+[ ]∗2
𝑓 4

2
𝑀𝑒𝑑𝑖𝑎𝑛 = 4 + ∗ 2 = 4 + 1 = 5
4
Thus median is 5
Now, to find Modal Class
here, maximum frequency is 4.
The mode class is 4-6.

L=lower boundary point of mode class =4


f1= frequency of the mode class =4
f0= frequency of the preceding class =3
f2= frequency of the succeeding class =2
h= class length of mode class =2
𝑓1 − 𝑓0
𝑀𝑜𝑑𝑒 = 𝐿 + ( )∗ℎ
2𝑓1 − 𝑓0 − 𝑓2
4−3
𝑀𝑜𝑑𝑒 = 4 + ( )∗2
2∗4−3−2
1
𝑀𝑜𝑑𝑒 = 4 + ( ) ∗ 2 = 4.6667
3
Measure of Dispersion:
In the earlier classes we have learnt about the measures of central tendency
namely mean, median and mode. Such an average tells us only about the central
part of the data. But it does not give any information about the spread of the
data.
 Dispersion:
The degree to which numerical data tend to spread about an average value is
called the variation or dispersion of the data.
 Measures of Dispersion :
Following measures of dispersion are the commonly used –
• Range
• Variance
• Standard deviation

Range:
 Range is the simplest measure of dispersion. It is defined as the difference
between the largest value and the smallest value in the data.
Thus,
Range = Largest Value – Smallest Value
Range = L – S
Where, L = Largest Value and S = Smallest Value.
 Example: Following data gives weights of 10 students (in kgs) in a certain
school. Find the range of the data.
70, 62, 38, 55, 43, 73, 36, 58, 65, 47
Solution: Smallest Value = S = 36 Largest Value = L = 73
Range = L – S = 73 – 36 = 37
Variance:
 The variance of a variable X is defined as the arithmetic mean of the
squares of all deviations of X taken from its arithmetic mean.
In other words,
A measurement of how far each number in a data set is from the mean, and thus
from every other number in the set.
It is denoted by 𝑉𝑎𝑟(𝑋) 𝑜𝑟 𝜎 2 .
𝑛
1
𝑉𝑎𝑟(𝑋) = 𝜎 2 = ∑(𝑋𝑖 − 𝑋̅)2
𝑛
𝑖=1

And for grouped data


𝑛 𝑛
1 1
𝑉𝑎𝑟(𝑋) = 𝜎 2 = ∑ 𝑓𝑖 (𝑋𝑖 − 𝑋̅)2 = ∑ 𝑓𝑖 𝑋𝑖2 − 𝑥̅ 2
𝑁 N
𝑖=1 𝑖=1

Standard deviation:
 Standard Deviation is defined as the positive square root of the variance.
It is denoted by 𝜎 (𝑠𝑖𝑔𝑚𝑎)
Example: Compute variance and standard deviation of the following data.
9, 12, 15, 18, 21, 24, 27.
 Example: A die is rolled 30 times and the following distribution is obtained.
Find the variance and S.D,

Score 1 2 3 4 5 6

Frequency 2 6 2 5 10 5

Solution:

You might also like