Statistics Quick Guide With Graphs (1)
Statistics Quick Guide With Graphs (1)
Z = (X̄ - μ) / (σ / sqrt(n))
Step 2: Find Z-Critical Value
Step 4: Decision
If P < α → Reject H₀
If P > α → Fail to reject H₀
t = (X̄ - μ) / (s / sqrt(n))
Step 4: Decision
If P < α → Reject H₀
If P > α → Fail to reject H₀
“It is highly significant that the sample mean is different from the population mean. Since
the P-value is less than 0.05, we reject H₀ and conclude that there is a statistically significant
difference between the sample and population means.”
If P > α:
“There is not enough evidence to conclude a significant difference. Since the P-value is
greater than 0.05, we fail to reject H₀, meaning the sample mean is not significantly different
from the population mean.”
If P < α:
“It is highly significant that the sample mean is lower than the population mean. Since the P-
value is less than 0.05, we reject H₀ and conclude that the sample mean is significantly
lower.”
If P > α:
“There is not enough evidence to conclude that the sample mean is lower. Since the P-value
is greater than 0.05, we fail to reject H₀, meaning the sample mean is not significantly lower
than the population mean.”
If P < α:
“It is highly significant that the sample mean is higher than the population mean. Since the
P-value is less than 0.05, we reject H₀ and conclude that the sample mean is significantly
higher.”
If P > α:
“There is not enough evidence to conclude that the sample mean is higher. Since the P-value
is greater than 0.05, we fail to reject H₀, meaning the sample mean is not significantly higher
than the population mean.”
A normal distribution is a symmetric, bell-shaped curve centered around the mean (μ). The
probability density decreases as we move away from the mean.
Interpretation Example: If a Z-score is **+2**, it means the value is **2 standard deviations
above the mean**, and the probability of a higher value is approximately **2.5%**.
- **Critical region (shaded area in tail(s))**: If the test statistic falls in this area, we reject \
( H_0 \).
- **Non-critical region (center part of the curve)**: If the test statistic falls here, we fail to
reject \( H_0 \).
A confidence interval (CI) is represented as a range on a number line. The wider the
interval, the more uncertainty in the estimate.
- **If the CI includes the hypothesized population mean (μ), we fail to reject \( H_0 \).**
Example: If a 95% confidence interval for the mean is (72, 78), and μ = 70 is outside this
range, we reject \( H_0 \) and conclude the sample mean is significantly different from 70.
-3.00 0.0013
-2.58 0.0049
-2.33 0.0099
-2.00 0.0228
-1.96 0.0250
-1.64 0.0500
-1.00 0.1587
0.00 0.5000
+1.00 0.8413
+1.64 0.9500
+1.96 0.9750
+2.00 0.9772
+2.33 0.9901
+2.58 0.9951
+3.00 0.9987
```r
# Define values
X <- 60000 # Value to check
mu <- 50000 # Population mean
sigma <- 5000 # Standard deviation
# Compute Z-Score
z_score <- (X - mu) / sigma
z_score
```
This will return **2**, confirming that 60,000 is 2 standard deviations above the mean.
8️⃣Graph Interpretation
### **Box Plot Interpretation**
A box plot shows the spread of data using quartiles and helps detect outliers.
```r
data <- c(10, 20, 22, 25, 30, 35, 40, 50, 100)
boxplot(data, main='Box Plot', col='lightblue')
```
```r
labels <- c('A', 'B', 'C', 'D')
values <- c(40, 30, 20, 10)
pie(values, labels=labels, col=rainbow(4))
```
```r
categories <- c('A', 'B', 'C', 'D')
values <- c(10, 20, 30, 40)
barplot(values, names.arg=categories, col='blue', main='Bar Chart')
```
```r
data <- c(10, 12, 15, 18, 22, 25, 30, 35, 40, 50, 55, 60)
hist(data, col='green', main='Histogram', xlab='Values')
```
- **68% of data within 1σ, 95% within 2σ, 99.7% within 3σ.**
**R Code for a Normal Curve:**
```r
x <- seq(-4, 4, length=100)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type='l', col='red', lwd=2, main='Normal Curve')
```
```r
library(moments)
data <- c(10, 12, 15, 20, 35, 50, 80)
skewness(data)
```
A confidence interval gives a range where the true population parameter is likely to fall.
```r
data <- c(50, 52, 53, 54, 55, 56, 58, 60, 62, 64)
t.test(data, conf.level=0.95)
```
A Z-score tells how far a value is from the mean in terms of standard deviations.
```r
x <- seq(-4, 4, length=100)
y <- dnorm(x)
plot(x, y, type='l', col='blue')
abline(v=2, col='red', lwd=2, lty=2) # Marks Z = 2
```
```r
x <- seq(-4, 4, length=100)
y <- dt(x, df=10)
plot(x, y, type='l', col='blue', lwd=2, main='T-Distribution')
```