K.Santoshi 1 Year PG: Biostatistics
K.Santoshi 1 Year PG: Biostatistics
SANTOSHI BIOSTATISTICS
1st Year PG
CONTENTS:-
•Introduction
•Common statistical terms
•Sources and collection of data
•Presentation of data
•Sampling & Sampling methods
•Sampling errors
•Analysis and interpretation
•Statistical averages
•Measures of dispersion
•Test of significance
•Correlation ®ression
•Conclusion
• What is Research?
“ A fundamental state of mind involving continual
examination of doctrines and axioms up on which current thought and action
are based”.
- Theobald smith.
2.Determining objectives
3. Formulating an hypothesis
4.Design a study
5. Collecting data
• Variable:- A characteristic that takes on different values in different persons, places/ things.
• Constant:- Quantities that do not vary such as π = 3.141 etc. these do not require statistical study.
• In Biostatistics, mean, standard deviation, standard error, correlation coefficient and proportion of a
particular population are considered constant.
• Observation:- An event and its measurement. for eg.. BP and its measurement.
• Observational unit:- The “sources” that gives observation for e.g. Object, person etc. in medical statistics
terms like individuals, subjects etc are used more often.
• Population:- It is an entire group of people or study elements persons, things or measurements for which
we have an interest at particular time.
• Sample:- it is may be defined as a part of population.
• Parameter:- It is summary value or constant of a variable, that describes the sample such as its mean,
standard deviation, standard error, correlation coefficient etc.
• Parametric tests:- It is one in which population constants such as described above are used :- mean,
variances etc. data tend to follow one assumed or established distribution such as normal, binominal etc.
• Non- parametric tests:- Tests such as CHI- SQUARE test, in which no constant of population is used.
Data do not follow any specific distribution and no assumptions are made in non- parametric tests.
e.g .good, better and best.
SOURCES AND COLLECTION OF DATA
• Data-: measured / counted fact or piece of information.
Such as height of a person.
Types of Data
Interval Ratio
• Qualitative Data:-
• Also called as Enumeration data.
• Represents a particular quality or attribute.
• There is no notion of magnitude or size of the characteristic , as they can’t be measured.
• Expressed as numbers without unit of measurements. Eg: religion, sex , Blood group etc.
• Quantitative Data:-
• Also called as Measurement data.
• These data have a magnitude.
• can be expressed as number with or without unit of measurement.
• Eg: Height in cm ,Hb ingm%, weight in kg etc.
Quantitative data Qualitative data
Hb level in gm% Anemic or non anemic
Ht in cm Tall or short
Fluroide conc. Hypo, normo or hypertensive
weight Idiot, genius or normal
• Continuous data:-
• It can take any value possible to measure or possibility of getting fractions
• Eg. Hb level, height , weight etc.
• Discrete data:-
• Here we always get a whole number.
• Eg : Number of beds in hospital, number of students in a school etc.
Common data collection methods
Survey Test
Case study Photo graphs, video tapes, slides
Interview Diaries, journals, logs
Observation Document review and analysis
Group assessment
• The main sources of data are:-
1) Surveys
2) experiments
3) records in OPD
• Ex. Tuberculosis
• 1st: Mantoux test for all cases
• 2nd: x- ray chest in Mantoux positive group
• 3rd: sputum examination in x- ray positive group
• The selection of the sampling made by the researcher, who decides the quotas for
selecting sample from specified sub group of the population.
• For example , an interviewer might be need data from 40 adults and 20 adolescents
in order to study students television viewing habits.
• Selection will be
• 20 adult men and 20 adult women
• 10 adolescent girls and 10 adolescent boys
• Purposive sampling: In this sampling method, the researcher selects a typical group
of individuals who might represent the larger population and then collects data from
this group. Also known as Judgmental sampling.
• Snow ball sampling:-
• In snow ball sampling , the researchers identifying and selecting available
respondents who meet the criteria for inclusion.
• After the data have been collected from the subject the researcher asks for a referal
of other individuals, who would also meet the criteria and represent the population
of concern.
BLINDING
• Also called as Masking or concealment of treatment.
• Is intended to avoid bias caused by subjective judgement in Reporting , evaluation, Data
processing and analysis due to knowledge of treatment.
Blinding techniques:
Single-blind: Subject
Double-blind: Subject& investigator
Triple-blind: Subject, investigator& statistician.
• Sampling error:-
• sampling error refers to differences between the sample and the population that exist only
because of the observations that happened to be selected for the sample
• Repeated samples from same population
• This type of variation from one sample to another is called sampling error
• Statistical errors are sample errors
• Factors influencing the sample error are
• Size of the sample
• Natural variability of individual readings
• As sample size increases the sample error will decrease.
Non sampling error:-
• Non sampling error refers to biases and mistakes in selection of sample.
• Respondent error
• Interviewer bias
• Measurement error
Setup a hypothesis:
The first thing in hypothesis testing is to set up a hypothesis about a population parameter
and use this information to decide how likely it is that our hypothesized population
parameter is correct
Null hypothesis :
The hypothesis asserts that there is no real difference in the sample and the population in
the particular matter under consideration and that the difference found is accidental and
unimportant acting out of fluctuations of sampling. The notation used for this is H0.
Alternative Hypothesis :
If Null hypothesis found false what Alternative would be true ? The Alternative
hypothesis directed by H1 is the opposite of H0 that must be true when H0 is false
Types of errors
• In testing the hypothesis we are likely to commit two types of errors they are :
• Type I error :
Type I error is the mistake of rejecting the null hypothesis when it is true. The
symbol (alpha) is used to represent the probability of a type I error
• Type II error :
Type II error is the mistake of failing to reject the null hypothesis when it is false.
The symbol (beta) is used to represent the probability of a type II error.
Level of significance
The level of significance is the maximum probability of making a type I error and it is
denoted by α .(i.e., probability of rejecting HO = when it is true ) . It is a concept in the
context of hypothesis tesfiebefore a test procedure so that the results may not influence
the decisi
• In p concept in the context of hypothesis testing.
• The level of significance is usually specified before a test procedure so that the
results may not influence the decision.
• In practical we take either 5% or 1& or 10% as level of significance so that the
results may not influence the decision.
• a test procedure so that the results may not influence the decision.
• In practical we take either 5% or 1& or 10% as level of significance
TEST OF SIGNIFICANCE
Parametric tests: A statistical test in which assumptions are made about the underlying normal distribution
of observation data.
Un paired t test
Paired t test
Z test
ANOVA
Non- parametric test : These are equivalent parametric tests, which are used to analyse data that does not fit
a normal distribution. They are based on the rank order of measurements rather than their values.
1)Sign test
2)Mc Nemer test
3)Wilcoxan Matched pairs test (or Signed rank test)
4)Rank Sum tests
a) Mann Whitney test (U test)
b) Kruskal Wallis test (H test)
• Spearman’s rank correlation test
• Kendall’s coefficient of concordance
• Chi square test
Student ‘ t’ test:
Very common test used in biomedical research.
Applied to test the significance of difference between twomeans
It has the advantage that it can be used for small samples
Types : Paired ‘t’ test.
Unpaired ‘t’ test
‘Z’test: Are used when we have large sample size(n>30).
ANOVA: (Analysis of variance):
When comparisons of more than two independent groups on a continuous out come is
required,we make use of the ANOVA.
Types: 1 way ANOVA
2 way ANOVA
Chi squre test.
• Chi-square is an important continuous probability distribution, first formulated by Helmert and then developed by Karl
Pearson.
• Chi square is a non parametric test not based on any summary values of population.
• To find whether there is significant association or not between two variables, we calculate co- efficient of
correlation, which is represented by symbol “r”.
• r = Ʃ (x - x ) (y - y )\ √ Ʃ( x-x)2 Ʃ(y-y)2
• The correlation coefficient r tends to lie between – 1.0 and +1.0.
• Types of correlation :
• Perfect positive correlation:
The correlation co-efficient(r) = +1 i.e. both variables rise or fall in the same proportion.
• Perfect negative correlation:
The correlation co-efficient(r) = -1 i.e. variables are inversely proportional to each other, when one rises,
the other falls in the same proportions.
• Moderately positive correlation: Correlation co-efficient value lie between 0< r< 1
• Moderately negative correlation: Correlation coefficient value lies between -1< r< 0
• Absolutely no correlation:
r = 0, indicating that no linear relationship exits between the 2 variables.
Conclusion