Biostats 2
Biostats 2
Biostats 2
Measures of Dispersion:
• Standard Deviation (SD):
Probably won’t have to calculate standard deviation but must know how to interpret it. You’re
basically taking each point and subtracting the mean (which tells you how far you are from mean), then
square that number. Then you are taking the sum of all of those differences squared and dividing it by
the number of samples there are minus 1. Then you square root the entire value. That will give you the
SD.
• Variance: Is just standard deviation squared – so same equation as SD except just everything under the
square root without the square root itself (a.k.a. square root of variane = standard deviation)
• Standard error of the mean (SEM): This tells us how precisely we know the true population mean. SD
divided by square root of n. (important for determining confidence intervals)
• Confidence Interval: Mean values are often reported with 95% confidence intervals (CIs)
o For example: a mean of 120mg/dl +/- 5mg/dl
o Range in which 95% of repeated measurements would be expected to fall
o Confidence intervals are for estimating population mean from a sample data set
o You multiply whatever the SEM is by 1.96 and your confidence interval for that mean would be
+ or – the mean and you’re 95% sure. So there exists an upper and lower limit.
o Don’t confuse SD with CI – confidnce intervals don’t describe the sample – it is an inferred value
of where the true mean lies for a population whereas the SD tells you about the sample.
o When n is large (n>30), you can substitute sample standard deviation for the population
standard deviation.
o Commonly used confidence levels
Stella Yun
BIOSTATS
Hypothesis Testing
Determines whether differences in mean are due to chance. Hypothesis testing mathematically calculates
probabilities that the two means are truly different and not just different by chance. Probabilities by
hypothesis testing depends on the difference between the means, scatter of data, and number of subjects
tested.
• Scatter: Scatter of data points influences likelihood that there is a true difference between means.
First plot: Probably not due to chance because the scatter of the data of two groups are very far apart
– their means are different/far apart. The data points of each group is distributed tightly around their
respective means.
Second plot: Maybe due to chance because there’s a lot of overlap between two groups
• Number of samples: Number of data of data points influences likelihood hat there is a true difference
between means.
Null Hypothesis: (H0)Difference in means was due ton chance – true means are the same.
Alternative Hypothesis: (H1) Difference in means is real.
Given this fact, there are four possible outcomes of an experiment.
1. There is a difference in reality and our experiment detects it. This means the alternative hypothesis I
found true by our study.
2. There is no difference in reality and our experiment also finds no difference. This means the null
hypothesis is found true by our study.
3. There is no difference in reaity but our study finds a difference. This is an error – type 1 (alpha) error
(false positive).
4. There is a difference in reality but our study misses it. This is an error – type 2 (beta) error (false
negative).
Power = percent ability to detect a difference that truly exists – the power is the likelihood that you will reject
the null hypothesis appropriately.
• Power Is increased when there is
o Increased sample size
o Large difference of means
o Less scatter of data (more precise measurements)
• Maximize power to detect a true difference
Stella Yun
BIOSTATS
• In a study design, you have little/no control over the scatter of data or difference between means. You
DO have control over the number of subjects. Number of subjects are chosen to give a high power –
this is called a power calculation.
Statistical Errors:
• Type 1 (alpha) error
o False positive
o Finding a difference/effect when there is none in reality
o Rejecting null hypothesis (H0) when you should not have
o Null hypothesis generally not rejected unless alpha <0.05
o Similar (but different) from p value
§ P-value calculated by comparison
§ Alpha set by the study design
• Type 2 (beta error)
o False negative
o Finding no difference/effect when there is one in reality
o Accepting null hypothesis (ho) when you shouldn’t have
o Can get type 2 error if too few patients
Tests of significance
Data types:
• Quantitative variables:
o 1,2,3,4
• Categorical variables
o High, medium, low
o Positive, negative
o Yes, no
• Quantitative variables are often reported as a number
o I.e. Mean age was 62 years old
• Categorical variables are often reported as percentages
o I.e. 40% of patients taking drug A
o 20% of patients are heavy exercisers
Tests
Stella Yun
BIOSTATS
• T-test:
o Typically used instead of the z test in certain situations
§ In hypothesis test using samples where the sample size will not permit a z-test (i.e. n
<30)
§ In hypothesis test comparing means from two samples where the sample sizes are small
o Compares two MEAN quantitative values
§ Null hypothesis: two groups have the same mean value of the outcome
§ Alternative hypothesis: two groups have different mean values of the outcome
o Yields p-value: p-value is chance that the null hypothesis is correct (no difference between
means)
o If p<0.05 we usually reject the null hypothesis and state that the difference in means is
statistically significant – doesn’t mean 0 percent but it means it’s low enough
§ Example question:
• ANOVA
o “analysis of variance”
§ Null hypothesis: the means of all three groups are not significantly different
§ Alternative hypothesis: at least one group is different
o Used to compare more than two quantitative means
§ For example: consider the plasma level of creatinine determined in non-regnant,
pregnant, and post-partum women
§ Three means determined!!
§ Cannot use t-test (because t-test = two means only) so use ANOVS instead
§ Yield a p-value like t-tests
o ANOVA calculators partition two observed variance (variability) of the data into two categories:
§ Within-group variance: If most of the variance is within group, then the null hypothesis
can’t be rejected
§ Between-group variance: if most of the variance is between groups, then the null
hypothesis is rejected
• Chi-square
o Compare two or more categorical variables
o Must use this test if results are not hard number
o When asked to choose statistical test for a dataset always ask yourself whether data is
quantitative or categorical
o Categorical data is often in percentages
Stella Yun
BIOSTATS
o Steps: Using the question of “Does career vary by neighborhood? “as an example
• Paired T-test
o The Paired Samples t Test compares two means that are from the same individual, object, or
related units. The two means typically represent two different times (e.g., pre-test and post-
test with an intervention between the two time points) or two different but related conditions
or units (e.g., left and right ears, twins). The purpose of the test is to determine whether there
is statistical evidence that the mean difference between paired observations on a particular
outcome is significantly different from zero.
Stella Yun
BIOSTATS
Power
In order for us to reject the null hypothesis (which in this case is there is no difference in serum cholesterol
between younger and older people), the mean value for serum cholesterol in older patients must be greater
than 195.1. If the mean of the older population is 211, wherever there is a value in older population
distribution less than 195.1, you have a type II error. This is still a standard, normal distribution so using th
same SD of 46 and sample size 25, we can calculate the probability of getting a value below 195.1.
The probability of the mean falling below 195.1 when it is actually 211, is 0.042 – we get that by calculating a
z-score of 195.1 with a mean of 211. This would be our probability of wrongly concluding that null hypothesis
is true when in fact, it’s false. (type H error )
Power, will therefore be 1-b = 0.958 = 95.8% chance of rejecting the null hypothesis when in fact, it is false.
Stella Yun