0% found this document useful (0 votes)
12 views2 pages

Cheat Sheet 1

Uploaded by

aizharyk.zhabay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views2 pages

Cheat Sheet 1

Uploaded by

aizharyk.zhabay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Types of data Dependent events: Pr(AandB) = Pr(B)*Pr(A|B) -The study or experiment consists of finite number smaller experiments (trials)

-The study or experiment consists of finite number smaller experiments (trials) each of
Numeric: discrete(1,2,3), continuous(1,1.25,2,.2.8) Pr(AandB) = Pr(A)*Pr(B|A) which has only two possible outcomes, such as dead/alive, diseased/non-diseased,
Non-numeric: nominal(binary,polytomous), ordinal(low-medium-high) Independent events: Pr(AandB) = Pr(A)*Pr(B) success/failure, etc.
1 numeric: box plot, histogram, stem&leaf plot Justification of independence: Pr(A|B) = Pr(A) -The outcomes of the trials are independent.
1 categorical: bar plot, pie chart Pr(B|A) = Pr(B) -The probabilities of the outcomes of the trials remain the same from trial to trial.
1 numeric+1 categorical: side by side box plot, stem&leaf, histogram Pr(X=1) = p , contrary, Pr(X=0) = 1-p
2 numeric: scatter plot Test validity measures
Sensitivity-“Among diseased, how many of them would be tested positive?” , where n = number of
Central tendency measures: mean, median and mode. trials/subjects, x = number of successes, p = probability of success.
Mean is just arithmetic average of all values. Mean = n*p
Specificity-“Among not diseased, how many of them would be tested negative?” Variance= n*p (1−p)
Standard deviation=
Median – a point in the middle of sequence.
PPV-“Among who tested positive, how many of them are actually diseased?” Poisson distribution
For even- take average of (n/2)th and (n/2 +1)th values
For odd- (n/2+0.5) -The probability of an event in a short interval is proportional to the length of the
Mode- the most frequently occurring value interval.
Dispersion or spread of values NPV-“Among tested negatives, how many of them are actually not diseased?” -An infinite number of events can occur in the interval.
Range- a difference between largest and smallest values -In any extremely small portion of the interval, the probability of more than one
Variance-spread between numbers in a data set. occurrence of the event is approximately zero.
-Whether or not an event occurs in an interval is independent of events in all other
intervals. Apart from independence between observations, in Poisson distribution
Standard deviation is an average difference between observations and sample probability of event occurrence in one interval should not affect that of other intervals.
mean. -Mean and variance are equal to each other.

Most sensitive to outliers: mean, variance, sd, range


not sensitive: median, mode, IQR, Q1, Q3 , where e is a constant (~2.718), λ is an average or
"Quantile"-percentill,quartiles,tertiles(33%) expected number of occurrences of the random event in the interval, x is a number of
P25-Q1-lower hinge events in a question.
P50-Q2-median Poisson distribution is a choice when there is a binomial random variable with very
P75-Q3-upper hinge large n and very low probability.
IQR=Q3-Q1
Upper fence = Upper hinge + 1.5*IQR
Lower Fence = Lower hinge – 1.5*IQR
Normal distribution
Continuous random variables-histogram, box plot, QQ plot
the rule “68-95-99.7”
Probability distributions:
1.Discrete random variables include outcomes that are finite and that can be counted.
Binomial distribution is used for binary variables, which contains only two possible
outcomes. Examples are diseased/non-diseased, dead/alive, infected/non-infected,
etc. To standardize any normal distribution:
Poisson probability distribution is utilized to predict the counts of events or rates. For
Symmetric: Mean = Median= Mode example, number of physician visits, vaccination rate during a year, number of suicidal
Left-skewed: Mean < Median < Mode attempts in a last five months. Characteristics of SRS:
Right-skewed: Mean > Median > Mode 2.Continuous random variables are able to take any value and not limited to integers. 1.Randomly selected sample usually has similar distribution shape as the population
Population parameters: μ, σ2, σ Gaussian (Normal) probability distribution describes probability function for continuous where it came from.
Sample estimates, statistics: x̄ , s2, s variables, where not only integers are outcome values. A bell-shaped curve is used to 2.The sample statistics will be trying to approximate the population parameters.
calculate probability of having a value within a specific range. 3. Variability in estimates always exists between different samples, even though they
Probability-a measure of the uncertainty associated with the occurrence of events. came from the same population-Sampling variability
Joint probability – likelihood of collection of events occurring at the same time point. Factorials is a number of possible arrangements of n objects. Sampling Distribution of the Sample Mean
The formula is: n!=n(n-1)(n-2)(n-3)…(2)(1)
0! = 1
Union probability – the probability of events A or B or … or X or all together occur. 4!=4*3!=4*3*2!=4*3*2*1=24 The mean of the Sampling Distribution of the Sample Mean is equal to the population
Permutations-the ways of arranging things in orders. mean:
Addition rule (Union Probability) , where permutations of n objects taken r at a time.
Mutually exclusive events: Pr(AorB)=Pr(A)+Pr(B) Combination is an arrangement of n objects take r at a time without regard to order. The variance of the Sampling Distribution of the Sample Mean equals the population
Non-mutually exclusive events: Pr (A or B) = Pr (A) + Pr (B) – Pr (A and B) variance divided by sample size.
Conditional Probability: Pr(A|B) = Pr(AandB)/ Pr(B) , where is combination of n objects taking r.
Multiplication rule (Joint Probability) Binomial Distribution
Standard deviation of the Sampling Distribution of Sample mean, also known as
Standard Error (SE)
upper one-sided confidence interval:

two-sided confidence interval:

T-Distribution

CLT
The Sampling Distribution of the Sample statistic follows Normality, regardless of the
shape of the population distribution, given that there is a sufficient sample size (n≥25) α=1 - confidence level
or population distribution is normal. α and confidence level are complimentary to each other.
If population distribution is normal- CLT. If population distribution is not normal, we If α is 0.05, then the confidence level is equal to 1-0.05, which is 0.95 or 95%.
need n≥25 at least to apply CLT. Construct 95% CI.
Hypothesis testing If the confidence level is 0.99, then α is 1-0.99, then α is 0.01.
The P-value is the probability of getting a sample statistic (x), given we believe that H0
is likely true by chance alone. Statistical importance is when the statistical test for one problem has a p-value less
Pr(observing data|H0 is true) than α. Rejecting the Null hypothesis and stating that the alternative hypothesis is
If the p-value is less than 0.05 (p<0.05) a statistically significant result. likely true is a statistically significant result.
If the p-value is larger than 0.05 (p>0.05) not a statistically significant result. Biological/Clinical/Public Health importance is a result of finding that makes a
difference in practice.

The significance level or alpha (α) is the probability of rejecting H0 when H0 is true –
Pr(reject H0|H0 is true)(type I error)
P-value <α => Reject H0, accept HA
P-value >α=> fail to reject H0, cannot accept HA
To increase power- decrease Type II error
Critical values are Z-statistics that correspond to the significance level.
1) Increasing α leads to decrease in β
2) The other way to decrease β and increase power is to increase the difference
between 0 and A
3) The next manipulation to increase power is to affect the Standard Deviation of
sample mean
Sample size calculation:
|Z-statistic| > |Critical value| => Reject H0, accept HA
|Z-statistic| ≤ |Critical value| => fail to reject H0, cannot accept HA

Confidence interval includes the population mean with some level of confidence.
Factors that affect the required sample size include:
lower one-sided confidence interval:

You might also like