Biostats 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

BIOSTATS

Measures of central tendency:


• Mean: average of all numbers
• Median: Middle number of data set when all lined up in order
• Mode: Most commonly found number
In a standard distribution, the mean median and mode are equal. However, when data is skewed, they are not
equal. (Tail will tell you if it’s negative or positive skew)
-Mode is the highest point (peak of curve), mean is always farthest away from the mode, and median is in the
middle. This is how you figure out the order of magnitude in skewed curves.
-Mode is least likely to be affected by outliers, however, adding on outlier changes the mean and median. It
only affect mode if it changes the most common number.

Measures of Dispersion:
• Standard Deviation (SD):

Probably won’t have to calculate standard deviation but must know how to interpret it. You’re
basically taking each point and subtracting the mean (which tells you how far you are from mean), then
square that number. Then you are taking the sum of all of those differences squared and dividing it by
the number of samples there are minus 1. Then you square root the entire value. That will give you the
SD.
• Variance: Is just standard deviation squared – so same equation as SD except just everything under the
square root without the square root itself (a.k.a. square root of variane = standard deviation)
• Standard error of the mean (SEM): This tells us how precisely we know the true population mean. SD
divided by square root of n. (important for determining confidence intervals)

o More samples à means less SEM (closer to the true mean).


o Bigger SD means big SEM - need lots of samples (n) for small SEM
o Small SD means small SEM - need fewer samples (n) for small SEM
• Z-score:
o Z-score of 0 = mean
o Z-score of +1 = 1SD above mean
o Z-score of -1 is 1SD below the mean
o Formula = (x-u)/SD
o Can also measure probability between two values based on a z-table
Stella Yun
BIOSTATS

• Confidence Interval: Mean values are often reported with 95% confidence intervals (CIs)
o For example: a mean of 120mg/dl +/- 5mg/dl
o Range in which 95% of repeated measurements would be expected to fall
o Confidence intervals are for estimating population mean from a sample data set

o You multiply whatever the SEM is by 1.96 and your confidence interval for that mean would be
+ or – the mean and you’re 95% sure. So there exists an upper and lower limit.
o Don’t confuse SD with CI – confidnce intervals don’t describe the sample – it is an inferred value
of where the true mean lies for a population whereas the SD tells you about the sample.

o When n is large (n>30), you can substitute sample standard deviation for the population
standard deviation.
o Commonly used confidence levels

For group comparisons:


o Many studies report differences between groups
§ Can average differences and calculate CIs
§ If includes 0, no statisticslly significsnt difference is present
§ Ie:

o Some studies report group means with CIs


§ If ranges overlap, no statistically significant difference

Stella Yun
BIOSTATS

Hypothesis Testing
Determines whether differences in mean are due to chance. Hypothesis testing mathematically calculates
probabilities that the two means are truly different and not just different by chance. Probabilities by
hypothesis testing depends on the difference between the means, scatter of data, and number of subjects
tested.
• Scatter: Scatter of data points influences likelihood that there is a true difference between means.

First plot: Probably not due to chance because the scatter of the data of two groups are very far apart
– their means are different/far apart. The data points of each group is distributed tightly around their
respective means.
Second plot: Maybe due to chance because there’s a lot of overlap between two groups
• Number of samples: Number of data of data points influences likelihood hat there is a true difference
between means.
Null Hypothesis: (H0)Difference in means was due ton chance – true means are the same.
Alternative Hypothesis: (H1) Difference in means is real.
Given this fact, there are four possible outcomes of an experiment.
1. There is a difference in reality and our experiment detects it. This means the alternative hypothesis I
found true by our study.
2. There is no difference in reality and our experiment also finds no difference. This means the null
hypothesis is found true by our study.
3. There is no difference in reaity but our study finds a difference. This is an error – type 1 (alpha) error
(false positive).
4. There is a difference in reality but our study misses it. This is an error – type 2 (beta) error (false
negative).

Power = percent ability to detect a difference that truly exists – the power is the likelihood that you will reject
the null hypothesis appropriately.
• Power Is increased when there is
o Increased sample size
o Large difference of means
o Less scatter of data (more precise measurements)
• Maximize power to detect a true difference

Stella Yun
BIOSTATS

• In a study design, you have little/no control over the scatter of data or difference between means. You
DO have control over the number of subjects. Number of subjects are chosen to give a high power –
this is called a power calculation.

Statistical Errors:
• Type 1 (alpha) error
o False positive
o Finding a difference/effect when there is none in reality
o Rejecting null hypothesis (H0) when you should not have
o Null hypothesis generally not rejected unless alpha <0.05
o Similar (but different) from p value
§ P-value calculated by comparison
§ Alpha set by the study design
• Type 2 (beta error)
o False negative
o Finding no difference/effect when there is one in reality
o Accepting null hypothesis (ho) when you shouldn’t have
o Can get type 2 error if too few patients

Tests of significance
Data types:
• Quantitative variables:
o 1,2,3,4
• Categorical variables
o High, medium, low
o Positive, negative
o Yes, no
• Quantitative variables are often reported as a number
o I.e. Mean age was 62 years old
• Categorical variables are often reported as percentages
o I.e. 40% of patients taking drug A
o 20% of patients are heavy exercisers

Tests

Stella Yun
BIOSTATS

• T-test:
o Typically used instead of the z test in certain situations
§ In hypothesis test using samples where the sample size will not permit a z-test (i.e. n
<30)
§ In hypothesis test comparing means from two samples where the sample sizes are small
o Compares two MEAN quantitative values
§ Null hypothesis: two groups have the same mean value of the outcome
§ Alternative hypothesis: two groups have different mean values of the outcome
o Yields p-value: p-value is chance that the null hypothesis is correct (no difference between
means)
o If p<0.05 we usually reject the null hypothesis and state that the difference in means is
statistically significant – doesn’t mean 0 percent but it means it’s low enough
§ Example question:

• ANOVA
o “analysis of variance”
§ Null hypothesis: the means of all three groups are not significantly different
§ Alternative hypothesis: at least one group is different
o Used to compare more than two quantitative means
§ For example: consider the plasma level of creatinine determined in non-regnant,
pregnant, and post-partum women
§ Three means determined!!
§ Cannot use t-test (because t-test = two means only) so use ANOVS instead
§ Yield a p-value like t-tests
o ANOVA calculators partition two observed variance (variability) of the data into two categories:
§ Within-group variance: If most of the variance is within group, then the null hypothesis
can’t be rejected
§ Between-group variance: if most of the variance is between groups, then the null
hypothesis is rejected
• Chi-square
o Compare two or more categorical variables
o Must use this test if results are not hard number
o When asked to choose statistical test for a dataset always ask yourself whether data is
quantitative or categorical
o Categorical data is often in percentages

Stella Yun
BIOSTATS

o Steps: Using the question of “Does career vary by neighborhood? “as an example

o The chi-square table requires degrees of freedom to be characterized


o Degrees of freedom are calculated by multiplying the (number of columns -1) by the (number of
rows -1)
o For the example above, neighborhood and career have two rows and columns respectively, so
degrees of freedom = 1
o Step 4: consult the chi-square table
§ for a value of 2.66 with one degree of freedom, the critical statistic is 3.84 at an alpha =
0.05 so in critical value test 2.66 < 3.84 and p value for 2.66 > 0.10 hypothesis

• Paired T-test
o The Paired Samples t Test compares two means that are from the same individual, object, or
related units. The two means typically represent two different times (e.g., pre-test and post-
test with an intervention between the two time points) or two different but related conditions
or units (e.g., left and right ears, twins). The purpose of the test is to determine whether there
is statistical evidence that the mean difference between paired observations on a particular
outcome is significantly different from zero.

Stella Yun
BIOSTATS

Factors affecting Type II Error

Power

In order for us to reject the null hypothesis (which in this case is there is no difference in serum cholesterol
between younger and older people), the mean value for serum cholesterol in older patients must be greater
than 195.1. If the mean of the older population is 211, wherever there is a value in older population
distribution less than 195.1, you have a type II error. This is still a standard, normal distribution so using th
same SD of 46 and sample size 25, we can calculate the probability of getting a value below 195.1.

The probability of the mean falling below 195.1 when it is actually 211, is 0.042 – we get that by calculating a
z-score of 195.1 with a mean of 211. This would be our probability of wrongly concluding that null hypothesis
is true when in fact, it’s false. (type H error )

Power, will therefore be 1-b = 0.958 = 95.8% chance of rejecting the null hypothesis when in fact, it is false.

Stella Yun

You might also like