0% found this document useful (0 votes)
12 views74 pages

Biostate Simple

about biostatistics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views74 pages

Biostate Simple

about biostatistics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

BIOSTATICS

Dr.@mit
STATISTICS- is a science of compiling, classifying,
and tabulating numerical data and expressing the
results in a mathematical and graphical form.

BIOSTATISTICS- is that branch of statistics concerned


with the mathematical facts and data related to
biological events.
• Constant
– Quantities that do not vary e.g. in biostatistics,
mean, standard deviation are considered
constant for a population

• Variable
– Characteristics which takes different values for
different person, place or thing such as
height, weight, blood pressure
• Parameter
– It is a constant that describes a population e.g. in a
college there are 40% girls. This describes the
population, hence it is a parameter.

• Statistic
– Statistic is a constant that describes the sample e.g. out
of 200 students of the same college 45% girls. This
45% will be statistic as it describes the sample

• Attribute
• A characteristic based on which the population can
be described into categories or class e.g. gender,
caste, religion
WHAT IS STATISTICS ??
• The following essential features of statistics are evident
from various definitions of statistics:

a) principles and methods for the collection of


presentation, analysis and interpretation of numerical
data of different kinds.
1. Observational data, qualitative data.
2. Data that has been obtained by a repetitive operation.
3. Data affected to a marked degree of a multiplicity of
causes.

b) The science and art of dealing with variation in such a


way as to obtain reliable results.
c) Controlled objective methods whereby
group trends are abstracted from
observations on many separate individuals.

d) The science of experimentation which may


be regarded as mathematics applied to
observational data.
WHY STATISTICS ??
• Variabilty in measurement can be handled using statistics. Eg:
investigator makes observations according to his judgement of the
situation.
(Depending upon his skills, knowledge, experience.)

• Epidemiology and Biostatistics are sister sciences or disciplines.

• Epidemiology collects facts relating to group of population in places,


times and situation.

• Biostatistics converts all the facts into figures and at the end
translates them into facts, interpreting the significance of their
results.
• Epidemiology and biostatistics both deal with the
facts-figures-facts

QUANITATIVE METHADOLOGY
USES OF BIOSTATISTICS
1. To test whether the difference between two populations is real or
by chance occurrence.

2. To study the correlation between attributes in the same


population.

3. To evaluate the efficacy of vaccines.

4. To measure mortality and morbidity.

5. To evaluate the achievements of public health programs

6. To fix priorities in public health programs

7. To help promote health legislation and create administrative


COLLECTION OF DATA
• The collective recording of observations either
numerical or otherwise is called data.

• Demographic data comprises details of population


size, disrtibution, geographic distribution , ethnic
group , socio-economic factors and their trends
over time.
• It is obtained from census and other public service
reports.
• Depending upon the nature of the variable, data is
classified into:

1. Qualitative data- attributes or qualities.


a) discrete
b) continuous
2. Quantitative data- through measurements
using calipers.
Sources of statistical data

EXPERIMENTS SURVEYS RECORDS


Performed to collect Carried out for Epidemiological Records are maintained
data for investigations studies in the field by trained as a routine in registers
and research by one teams to find incidence or and books over a long
or more workers. prevalence of health or period of time provide
disease in a community. readymade data.

Data can be collected

PRIMARY SECONDARY
Data obtained by the investigator himself. Data has already recorded.
Eg: hospital records
Primary data can be obtained using any
one of the following methods:

Direct personal Oral health Questionnaire


interviews examination method

• Face-to-face contact with • When information is • List of Questions


the person. needed on health pertaining to the
status. survey “questionnaire”
• Subjective phenomena.
is prepared.
• Accurate and any • Cannot be used in
ambiguity can be clarified. extensive studies. • Various informants are
• Cannot be used in
requested to supply
extensive studies. • Includes the information.
treatment
Sampling and sample design
• Population:- group of all individuals who are the focus
of the investigation is known as population.

• Cencus enumeration:- if the information is obtained


from each and every individual in the population.

• Sample means the group of individuals who actually


available for investigation.

• Sampling units: the individual entities that form the


focus of the study.

• Sampling frame/list: list of sampling units


Sample selection

Purposive selection Random selection

• Representing the population as • Sample of units is selected


a whole. in such a way that all the
characteristics of the
• Great temptation to population is reflected in
deliberately or purposively the sample.
select the individual who seen
to represent the population • Random indicates the
under study. chance of the population
unit being selected in the
• Easy to carry out. sampe.

• Does not need the preperation


of sampling frame.
Sampling Design
BASED UPON TYPE AND NATURE OF THE POPULATION AND
THE OBJECTIVES OF THE INVESTIGATION.

1. Sample random sampling


2. Systematic random sampling
3. Stratified random sampling
4. Clusture sampling
5. Multiphase sampling pathfinder survey
Sample random sampling
• Each and every unit in the population has an equal
chance of being included in the sample.
• Selection of unit is by chance only.

Two methods

Lottery methods Table of random numbers

• Population units are • Random arrangement of


numbered on digits from 0-9 in rows and
separate slip. columns.

• Shuffled and • Selection is done either in a


blindfold selection. horizontal or vertical
direction
Systematic random sampling
• Select one unit at random and then selecting
additional units at evenly spaced interval till the sample
of required size has been drawn.

Stratified random selection


• Population to be sampled is subdivided into groups
(age/sex/genetic) known as Strata. ( i.e each group is
homogenous in characteristics.)

• Then a simple randon selection is done from each


stratum.

• More representative, provide greater accuracy and


concentrate on wider geographical area.
Cluster sampling

• The population forms natural groups or clusters


such as village, wards blocks or children of a school.

• Sample of the clusters is selected and then all the


units in each of the selected cluster is surveyed.

• Simpler, less time and cost.

• High standard of errors.


Multiphase sampling
• Part of information is collected from the whole sample and
part from the sub sample.
• First phase: All the children in school are surveyed.

• Second phase: Only the ones with oral health


problems.

• Third phase: section that needs treatment are selected.

• Sub-samples further becomes smaller and smaller.

• Adapted when the interest is in any specific disease.


Multistage sampling
• First stage is to select the groups or clusters.

• Then subsamples are taken in as many subsequent


stages as necessary to obtain the desired sample.
Errors in sampling

Sampling errors Non-Sampling errors

• Faulty sample design • Coverage errors- due to non-


response or non cooperation of
• Small sample sie the informant.

• Observational errors: interview


bias, imperfect
experimental technique.

• Processing errors: statistical


analysis
Data presentation
Two main types of data presentation are:
• Tabulation
• Graphic representation - charts and diagrams
Tabulation

– Tables are simple device used for the presentation of statistical


data.
PRINCIPLES:
– Tables should be as simple as possible.(2-3 small
tables).
– Data should be presented according to size or importance,
chronologically or alphabetically.
– Should be self explanatory.
– Each row and column should be labelled concisely and
– Specific unit of measure for the data should be given.

– Title should be clear, concise and to the point.

– Total should be shown.

– Every table should contain a title as to what is depiceted in the


table.

– In small table, vertical lines seperating the column may not be


necessary.

– If the data are not orignal, their source should be given in a


footnote.
TYPES OF TABLES
MASTER TABLE SIMPLE TABLE FREQUENCY DISTRIBUTION
TABLE

Contains all the One way tables which


data obtained supply the answer to Two column frequent table.
from a survey questions about one
characteristic of data First column list the classes
only. into which the data are
grouped.

Second column lists the


frequency for each
classification
Charts and diagrams
• Most convincing and appealing ways of depicting statistical
results.
Principles
1. Every diagram must be given a title that is self
explanatory.
2. Simple and consistent with the data.
3. The values of the variable are presented on the horizontal or
X-axis and frequency on the vertical line Y-axis.
4. Number of lines drawn in any graph should not be many.
5. Scale of presentation for X-axis and Y- axis should be
mentioned.
6. The scale of division of both the axes should be proportional
and the divisions should be marked along the details of the
variable and frequencies presented on the axes.
Bar chart
• Represents qualitative data.
• Bars can be either vertical or horizontal.
• Suitable scale is chosen
• Bars are usually equally spaced
• They are of three types:
• simple bar chart- represents only one variable.
• multiple bar chart- each category of a variable
there are set of bars.
• component /proportional bar chart- individual bar
is divided into 2 or more parts
Pie chart
• Entire graph looks like a pie.
• It is divided into different sectors corresponding to
the frequencies.
Line diagram
Useful to study changes of values in the variable over time and is the
simplest type of diagram.

Time such as hours, days , weeks , months or years


Histogram
• Pictorial presentation of frequency distribution
• No space between the cells on a histogram.
• class interval given on vertical axis
• area of rectangle is proportional to the frequency
Frequency polygon
• Obtained by joining midpoints of histogram
blocks at the height of frequency by straight lines
usually forming a polygon.
Frequency curve
• when number of observations is very large and class
interval is reduced the frequency polygon losses its
angulations becoming a smooth curve known as
frequency curve
Pictogram
• Popular method of presenting data to the
common man through small pictures or
symbols.

Spot map/shaded map/Cartogram


• These maps are prepared to show geographic
distribution of frequencies of characteristics
Measures of statistical averages or
central tendency
• central value around which all the other
observations are distributed.
• Main objective is to condense the entire mass of
dat and to facilitate the comparison.
• the most common measures of central tendency
that are used in sental sciences:
– mean
– median
– mode
Mean
• Refers to arithmetic mean
• It is obtained by adding the individual observations
divided by the total number of observations.
• Advantages – it is easy to calculate.
most useful of all the averages.
• Disadvantages – influenced by abnormal values.
Median

• When all the observation are arranged either in


ascending order or descending order, the middle
observation is known as median.

• In case of even number the average of the two


middle values is taken.

• Median is better indicator of central value as it is


not affected by the extreme values.
Mode
• Most frequently occurring observation in a data is called mode
• Not often used in medical statistics.

• EXAMPLE
• Number of decayed teeth in 10 children
• 2,2,4,1,3,0,10,2,3,8

• Mean = 34 / 10 = 3.4

• Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2
• = 2.5

• Mode = 2 ( 3 Times)
Types of variability

• There are three types of variability


– Biological variability
– Real variability
– Experimental variability
Biological variability

• It is the natural difference which occurs in


individuals due to age, gender and other
attributes which are inherent
• This difference is small and occurs by chance
and is within certain accepted biological limits
• e.g. vertical dimension may vary from patient
to patient
Real Variability

• Such variability is more than the normal


biological limits
• the cause of difference is not inherent or
natural and is due to some external factors
• e.g. difference in incidence of cancer among
smokers and non smokers may be due to
excessive smoking and not due to chance only
Experimental Variability
• It occurs due to the experimental study
• they are of three types
– Observer error
• the investigator may alter some information or not record the
measurement correctly
– Instrumental error
• this is due to defects in the measuring instrument
• both the observer and the instrument error are called non sampling
error
– Sampling error or errors of bias
• this is the error which occurs when the samples are not chosen at
random from population.
• Thus the sample does not truly represent the
population.
MEASURES OF DISPERSION
• Dispersion is the degree of spread or variation of
the variable about a central value.
• Helps to know how widely the observations are
spread on either side of the average.

• Most common measures of dispersion are:


1. RANGE
2. MEAN DEVIATION
3. STANDARD DEVIATION
STANDARD
RANGE MEAN DEVIATION DEVIATION

• Defined as the • It is the average of the • Most important and


deviation from the widely used measure of
difference between arithematic mean. studying dispersion.
the value of the
largest item and the • M.D= Ʃ(X-Xi) • Greater the S.D , greater
n will be the magnitude of
smallest item. dispersion from the mean.
• Ʃ-sum of
• Gives no information • X- arithematic mean
• Smaller S.D means a
higher degree of
about the values that • Xi- value of each uniformity of the
lie between the observation in the data observations.
extreme values. • n- number of
observation in the data • S.D= Ʃ(X-Xi)²
n
Coefficient of variation

• It is used to compare attributes having two


different units of measurement e.g.
height and weight
• Denoted by CV
• CV = SD X 100 / Mean
• and is expressed as percentage
Normal distribution/normal curve/
Gaussian distribution
• When the data is collected from a very large number of
people and a frequency distribution is made with
narrow class intervals, the resulting curve is smooth
and symmetrical- NARROW CURVE.

• These limits on either side of measurement are called


confidence limits .
STANDARD NORMAL DEVIATION
• There may be many normal curves but only one standard
normal curve.
Characteristics
• Bell shaped
• Perfectly symmetrical
• Frequency increases from one side reaches its highest and
decreases exactly the way it had increased .
• Total area of the curve is one, its mean is zero and standard
deviation is one.
• The highest point denotes mean, median and mode which
coincide.
Z-TEST
• Used to test the significance of difference in means
for large samples.
Criteria:
1. Sample must be randomly selected.
2. Data must be quantitative.
3. The variable is assumed to follow a normal
distribution in the population.
4. Samples should be larger than 30.
Tests of significance
• When different samples are drawn from the same
population, the estimates might differ - sampling
variability.

• It deals with technique to know how far the difference


between the estimates of different samples is due to
sampling variation.
a) Standard error of mean
b) Standard error of proportion
c) Standard error of difference between two means
d) Standard error of difference between two
1. Standard error of mean: Gives the standard
deviation of the means of several samples from
the same population.
Example : Let us suppose, we obtained a random
sample of 25 males, age 20-24 years whose mean
temperature was 98.14 deg. F with a standard
deviation of 0.6. What can we say of the true mean
of the universe from which the sample was drawn?
Standard Error of Proportion
• Standard error of proportion may be defined as a unit that
measures variation which occurs by chance in the proportions of a
character from sample to sample or from sample to population or
vice versa in a qualitative data.
Standard Error of Difference Between two Means

• The standard error of difference between the two means is 7 .5.


• The actual difference between the two means is (370 - 318) 52, which is more than
twice the standard error of difference between the two means, and therefore
"significant".
Standard Error of Difference Between Proportions

• The standard error of difference is 6 whereas the observed difference (24.4 - 16.2)
was 8.2.
• In other words the observed difference between the two groups is less than twice
the S.E. of difference, i.e., 2 x 6.
• There was no strong evidence of any difference between the efficacy of the two
vaccines. Therefore, the observed difference might be easily due to chance.
• A null hypothesis or hypothesis of no difference
(H0) asserts that there is no real difference in
sample and the population in particular matter
under consideration and the difference found is
accidental and arised out of sampling variations.

• The alternative hypothesis of significant


difference (H1) stated that there is a difference
between the two groups compared.
• A test of significance such as Z-test is performed to
accept the null hypothesis H0 or to reject it and
accept the alternative hypothesis H1.
• To make minimum error in rejection or acceptance
of H0, we divide the sampling distribution or the
area under the normalcurve into two regions or
zone.
i. A zone of acceptance
ii. A zone of rejection.
• The distance from the mean at which H0 is rejected
is called the level of significance.

• It falls in the zone of rejection for H0, shaded areas


under the curves and it is denoted by letter P which,
indicates the probability or relative frequency of
occurrence of the difference by chance.
• Greater the Z value, lesser will be the P.
i.Zone of acceptance: If the result of a sample falls in the plain area, i.e. within
the mean ± 1.96 SE the null hypothesis is accepted, hence this area is called the
zone of acceptance for
null hypothesis.

ii. Zone of rejection: If the result of a sample falls in the shaded area, i.e. beyond
mean
± 1.96 SE it is significantly different from the universe value. Hence, the H0 of no
difference is rejected and the alternate H1 is accepted. This shaded area, therefore, is
called the zone of rejection for null hypothesis.
• Degree of freedom:
Defined as the number of independent members in
the sample.

EXAMPL
E:-
X+Y+Z/3=5
Out of 3 values, we can choose only 2 of them
freely, but the choice of the third depends upon
the fact that the total of the three values should be
15.
SIGNIFICANCE OF DIFFERENCE BETWEEN MEANS OF
SMALL SAMPLES BY STUDENT’S t-TEST
• Small samples or their Z values do not follow normal
distribution as the large ones do.

• So, the Z value based on normal distribution will not give


the correct level of significance or probability of a small
sample value occurring by chance.

• In case of small samples, t-test is applied instead of Z-


test.

• It was designed by W.S.Gossett whose pen name


• There are two types of student t Test
Unpaired t test
Paired t test

Criteria for
applying t-
test
• 1. Random samples
• 2. Quantitative data
• 3. Variable normally
distributed
Unpaired t test

• This test is applied to unpaired data of independent


observations made on individuals of two different
or separate groups or samples drawn from two
populations, to test if the difference between the
two means is real or it can be attributed to sampling
variability .
• EXAMPLE: between means of the control
and experimental groups.
Paired t test

• It is applied to paired data of dependent


observation from one sample only when each
individual given a pair of observations.

• The individual gives a pair of observation i.e.


observation before and after taking a drug
The CHI SQUARE TEST FOR QUALITATIVE DATA (X² TEST)

• Developed by Karl Pearson.

• Chi-square (x²) Test offers an alternate method of testing


the significance of difference between two proportions. It
has the advantage that it can also be used when more than
two groups are to be compared.

• It is most commonly used when data are in frequencies


such as in the number of responses in two or more
categories.
• Important applications in medical statistics as test
of:
• 1. Proportion
• 2. Association
• 3. Goodness of fit.

• Test of Proportions
• As an alternate test to find the significance of
difference in two or more than two proportions.
• Test of Association
• The test of association between two events in
binomial or multinomial samples is the most
important application of the test in statistical
methods. It measures the probability of association
between two discrete attributes.
• Two events can often be studied for their
association such as smoking and cancer, treatment
and outcome of a disease, vaccination and
immunity, nutrition and intelligence, etc.
• Test of Goodness of Fit
• Chi-square (χ2) test is also applied as a test of
“goodness of fit”, to determine if actual
numbers are similar to the expected or
theoretical numbers—goodness of fit to a
theory.
Analysis of Variance (ANOVA) Test

• Not confined to comparing two sample means, but


more than two samples drawn from corresponding
normal populations.

• Eg. In experimental situations where several


different treatments (various therapeutic
approaches to a specific problem or various drug
levels of a particular drug) are under comparison.

• It is the best way to test the equality of three or


more means of more than two groups.
• Requirements
– Data for each group are assumed to be independent and
normally distributed
– Sampling should be at random

• One way ANOVA


– Where only one factor will effect the result between 2
groups

• Two way ANOVA


– Where we have 2 factors that affect the result or
outcome

• Multi way ANOVA


– Three or more factors affect the result or outcomes between
CORRELATION AND REGRESSION
• Correlation: When dealing with measurement on 2
sets of variable in a same person, one variable may
be related to the other in same way. (i.e change
in one variable may result in change in the value of
other variable.)
• Correlation is the relationship between two sets of
variable.
• Correlation coefficient is the magnitude or degree
of relationship between 2 variables. (varies from -1
to +1).
• Obtained by plotting scatter diagram (i.e one variable
on x-axis and other on y-axis).

• Perfect Positive Correlation


• In this, the two variables denoted by letter X and Y are
directly proportional and fully correlated with each
other.
• The correlation coefficent (r) = + 1, i.e. both variables
rise or fall in the same proportion.

• Perfect Negative Correlation


• Values are inversely proportional to each other, i.e.
when one rises, the other falls in the same proportion,
i.e. the correlation coefficient (r) = –1.
TYPES OF CORRELATION
Regression
• To know in an individual case the value of one variable,
knowing the value of the other, we calculate what is known
as the regression coefficient of one measurement to the
other.
• It is customary to denote the independent variate by x and
the dependent variate by y.

• The value of b is called the regression coefficient of y upon


x. Similarly, we can obtain the regression of x upon y.
REFERENCES
• Essentials Of Preventive Community Dentistry –
Dr.Soben Peter. Third Edition
• Essentials Of Preventive Community Dentistry –
Dr.Soben Peter. Fourth Edition
• Mahajan's Methods in Biostatistics for Medical
Students and Research Workers. 8th edition.
• Parks textbook of preventive and social medicine.
18th edition.
THANK YOU

You might also like