Cheat Sheet 1
Cheat Sheet 1
-The study or experiment consists of finite number smaller experiments (trials) each of
Numeric: discrete(1,2,3), continuous(1,1.25,2,.2.8) Pr(AandB) = Pr(A)*Pr(B|A) which has only two possible outcomes, such as dead/alive, diseased/non-diseased,
Non-numeric: nominal(binary,polytomous), ordinal(low-medium-high) Independent events: Pr(AandB) = Pr(A)*Pr(B) success/failure, etc.
1 numeric: box plot, histogram, stem&leaf plot Justification of independence: Pr(A|B) = Pr(A) -The outcomes of the trials are independent.
1 categorical: bar plot, pie chart Pr(B|A) = Pr(B) -The probabilities of the outcomes of the trials remain the same from trial to trial.
1 numeric+1 categorical: side by side box plot, stem&leaf, histogram Pr(X=1) = p , contrary, Pr(X=0) = 1-p
2 numeric: scatter plot Test validity measures
Sensitivity-“Among diseased, how many of them would be tested positive?” , where n = number of
Central tendency measures: mean, median and mode. trials/subjects, x = number of successes, p = probability of success.
Mean is just arithmetic average of all values. Mean = n*p
Specificity-“Among not diseased, how many of them would be tested negative?” Variance= n*p (1−p)
Standard deviation=
Median – a point in the middle of sequence.
PPV-“Among who tested positive, how many of them are actually diseased?” Poisson distribution
For even- take average of (n/2)th and (n/2 +1)th values
For odd- (n/2+0.5) -The probability of an event in a short interval is proportional to the length of the
Mode- the most frequently occurring value interval.
Dispersion or spread of values NPV-“Among tested negatives, how many of them are actually not diseased?” -An infinite number of events can occur in the interval.
Range- a difference between largest and smallest values -In any extremely small portion of the interval, the probability of more than one
Variance-spread between numbers in a data set. occurrence of the event is approximately zero.
-Whether or not an event occurs in an interval is independent of events in all other
intervals. Apart from independence between observations, in Poisson distribution
Standard deviation is an average difference between observations and sample probability of event occurrence in one interval should not affect that of other intervals.
mean. -Mean and variance are equal to each other.
T-Distribution
CLT
The Sampling Distribution of the Sample statistic follows Normality, regardless of the
shape of the population distribution, given that there is a sufficient sample size (n≥25) α=1 - confidence level
or population distribution is normal. α and confidence level are complimentary to each other.
If population distribution is normal- CLT. If population distribution is not normal, we If α is 0.05, then the confidence level is equal to 1-0.05, which is 0.95 or 95%.
need n≥25 at least to apply CLT. Construct 95% CI.
Hypothesis testing If the confidence level is 0.99, then α is 1-0.99, then α is 0.01.
The P-value is the probability of getting a sample statistic (x), given we believe that H0
is likely true by chance alone. Statistical importance is when the statistical test for one problem has a p-value less
Pr(observing data|H0 is true) than α. Rejecting the Null hypothesis and stating that the alternative hypothesis is
If the p-value is less than 0.05 (p<0.05) a statistically significant result. likely true is a statistically significant result.
If the p-value is larger than 0.05 (p>0.05) not a statistically significant result. Biological/Clinical/Public Health importance is a result of finding that makes a
difference in practice.
The significance level or alpha (α) is the probability of rejecting H0 when H0 is true –
Pr(reject H0|H0 is true)(type I error)
P-value <α => Reject H0, accept HA
P-value >α=> fail to reject H0, cannot accept HA
To increase power- decrease Type II error
Critical values are Z-statistics that correspond to the significance level.
1) Increasing α leads to decrease in β
2) The other way to decrease β and increase power is to increase the difference
between 0 and A
3) The next manipulation to increase power is to affect the Standard Deviation of
sample mean
Sample size calculation:
|Z-statistic| > |Critical value| => Reject H0, accept HA
|Z-statistic| ≤ |Critical value| => fail to reject H0, cannot accept HA
Confidence interval includes the population mean with some level of confidence.
Factors that affect the required sample size include:
lower one-sided confidence interval: