0% found this document useful (0 votes)
13 views

Statistics Quick Guide With Graphs (1)

This document is a quick reference guide for statistics, hypothesis testing, and R code, covering key formulas, interpretations, and steps for Z-Tests and T-Tests. It includes detailed explanations of statistical concepts, graphical interpretations, and R code examples for various statistical analyses. Additionally, it provides templates for interpreting hypothesis testing results and visualizing data through different types of charts.

Uploaded by

nirmalameh02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Statistics Quick Guide With Graphs (1)

This document is a quick reference guide for statistics, hypothesis testing, and R code, covering key formulas, interpretations, and steps for Z-Tests and T-Tests. It includes detailed explanations of statistical concepts, graphical interpretations, and R code examples for various statistical analyses. Additionally, it provides templates for interpreting hypothesis testing results and visualizing data through different types of charts.

Uploaded by

nirmalameh02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

📌 Quick Reference Guide – Statistics, Hypothesis Testing & R Code

This document serves as a comprehensive guide for quick calculations, statistical


interpretations, hypothesis testing steps (Z-Test & T-Test), and corresponding R code for
execution.

1 Quick Reference Table – Formulas, Interpretation & R Code


1️⃣
Topic Formula / Interpretation R Code
Calculation

Mean (Average) X̄ = ΣX / n Central value of the mean(c(5,10,15))


dataset

Standard Deviation σ = sqrt(Σ (X - X̄ )² / Spread of data sd(c(5,10,15))


(σ) n) (higher = more
spread out)

Variance (σ²) σ² = (Standard Higher variance = var(c(5,10,15))


Deviation)² greater
inconsistency in
data

Probability (Single P(A) = Favorable Likelihood of an choose(4,1)/


Event) Outcomes / Total event occurring choose(52,1)
Outcomes

Binomial Probability P(X = k) = C(n,k) Probability of k dbinom(3,5,0.6)


p^k (1-p)^(n-k) successes in n trials

Expected Value E(X) = n * p Average number of 10 * 0.6


(Binomial Mean) successes in n trials

Z-Score Z = (X - μ) / σ How far X is from pnorm(85,


the mean mean=70, sd=8,
lower.tail=FALSE)

T-Test t = (X̄ - μ) / (s / Used when sample t.test(sample,


sqrt(n)) size < 30 mu=50)

2️⃣Step-by-Step Hypothesis Testing (Z-Test & T-Test)


### **Z-Test (Large Sample, n ≥ 30)**

Step 1: Compute Z-Value

Z = (X̄ - μ) / (σ / sqrt(n))
Step 2: Find Z-Critical Value

For α = 0.05 → Z-critical = ±1.96


For α = 0.01 → Z-critical = ±2.58

Step 3: Compute P-Value

P = 2 × (1 - P(Z < computed Z))

Step 4: Decision
If P < α → Reject H₀
If P > α → Fail to reject H₀

### **T-Test (Small Sample, n < 30)**

Step 1: Compute T-Value

t = (X̄ - μ) / (s / sqrt(n))

Step 2: Find T-Critical Value (from t-table)

Step 3: Compute P-Value

Step 4: Decision
If P < α → Reject H₀
If P > α → Fail to reject H₀

3️⃣Interpretation Based on Hypothesis Type


### **Two-Tailed Test**
- If P < α → Reject H₀ → Significant difference exists.

### **Left-Tailed Test**


- If P < α → Reject H₀ → Sample mean is significantly lower.

### **Right-Tailed Test**


- If P < α → Reject H₀ → Sample mean is significantly higher.

4️⃣Final Interpretation Statement Templates


### **Two-Tailed Test Interpretation**

If P < α (e.g., 0.05):

“It is highly significant that the sample mean is different from the population mean. Since
the P-value is less than 0.05, we reject H₀ and conclude that there is a statistically significant
difference between the sample and population means.”

If P > α:
“There is not enough evidence to conclude a significant difference. Since the P-value is
greater than 0.05, we fail to reject H₀, meaning the sample mean is not significantly different
from the population mean.”

### **Left-Tailed Test Interpretation**

If P < α:

“It is highly significant that the sample mean is lower than the population mean. Since the P-
value is less than 0.05, we reject H₀ and conclude that the sample mean is significantly
lower.”

If P > α:

“There is not enough evidence to conclude that the sample mean is lower. Since the P-value
is greater than 0.05, we fail to reject H₀, meaning the sample mean is not significantly lower
than the population mean.”

### **Right-Tailed Test Interpretation**

If P < α:

“It is highly significant that the sample mean is higher than the population mean. Since the
P-value is less than 0.05, we reject H₀ and conclude that the sample mean is significantly
higher.”

If P > α:

“There is not enough evidence to conclude that the sample mean is higher. Since the P-value
is greater than 0.05, we fail to reject H₀, meaning the sample mean is not significantly higher
than the population mean.”

5️⃣Graph Interpretation for Statistical Tests


### **Normal Distribution Curve Interpretation**

A normal distribution is a symmetric, bell-shaped curve centered around the mean (μ). The
probability density decreases as we move away from the mean.

- **68% of data** falls within 1 standard deviation (σ) of the mean.

- **95% of data** falls within 2 standard deviations (σ) of the mean.

- **99.7% of data** falls within 3 standard deviations (σ) of the mean.

Interpretation Example: If a Z-score is **+2**, it means the value is **2 standard deviations
above the mean**, and the probability of a higher value is approximately **2.5%**.

### **Z-Test and T-Test Graph Interpretation**


In hypothesis testing, we use the standard normal distribution or t-distribution to
determine whether our test statistic falls in the rejection region.

- **Critical region (shaded area in tail(s))**: If the test statistic falls in this area, we reject \
( H_0 \).

- **Non-critical region (center part of the curve)**: If the test statistic falls here, we fail to
reject \( H_0 \).

### **Confidence Interval Interpretation on Graphs**

A confidence interval (CI) is represented as a range on a number line. The wider the
interval, the more uncertainty in the estimate.

- **If the CI includes the hypothesized population mean (μ), we fail to reject \( H_0 \).**

- **If the CI does not include μ, we reject \( H_0 \).**

Example: If a 95% confidence interval for the mean is (72, 78), and μ = 70 is outside this
range, we reject \( H_0 \) and conclude the sample mean is significantly different from 70.

6️⃣Z-Score Lookup Table


This table provides the cumulative probability (area under the normal curve) for different
Z-scores. The values represent the probability that a random variable is **less than or equal
to** the given Z-score.

Z-Score Probability (P-value)

-3.00 0.0013

-2.58 0.0049

-2.33 0.0099

-2.00 0.0228

-1.96 0.0250

-1.64 0.0500

-1.00 0.1587

0.00 0.5000

+1.00 0.8413

+1.64 0.9500

+1.96 0.9750
+2.00 0.9772

+2.33 0.9901

+2.58 0.9951

+3.00 0.9987

7️⃣R Code for Z-Score Calculation


You can calculate Z-scores in R using the following command:

```r
# Define values
X <- 60000 # Value to check
mu <- 50000 # Population mean
sigma <- 5000 # Standard deviation

# Compute Z-Score
z_score <- (X - mu) / sigma
z_score
```

This will return **2**, confirming that 60,000 is 2 standard deviations above the mean.

8️⃣Graph Interpretation
### **Box Plot Interpretation**

A box plot shows the spread of data using quartiles and helps detect outliers.

- **Median (Q2 line):** The middle value of the dataset.

- **Box (IQR = Q3 - Q1):** Contains the middle 50% of the data.

- **Whiskers:** Extend to the minimum and maximum values within 1.5×IQR.

- **Outliers:** Values outside the whiskers.

**R Code to Generate a Box Plot:**

```r
data <- c(10, 20, 22, 25, 30, 35, 40, 50, 100)
boxplot(data, main='Box Plot', col='lightblue')
```

### **Pie Chart Interpretation**

A pie chart represents categorical data as proportional slices.


- **Larger slices indicate higher proportions.**

- **Good for visualizing part-to-whole relationships.**

**R Code to Create a Pie Chart:**

```r
labels <- c('A', 'B', 'C', 'D')
values <- c(40, 30, 20, 10)
pie(values, labels=labels, col=rainbow(4))
```

### **Bar Chart Interpretation**

A bar chart is used to compare values across different categories.

- **Height of bars represents frequency or value.**

- **Good for comparing groups.**

**R Code for a Bar Chart:**

```r
categories <- c('A', 'B', 'C', 'D')
values <- c(10, 20, 30, 40)
barplot(values, names.arg=categories, col='blue', main='Bar Chart')
```

### **Histogram Interpretation**

A histogram displays the distribution of numerical data.

- **Used for detecting skewness and spread.**

- **Higher bars indicate more frequent values.**

**R Code to Generate a Histogram:**

```r
data <- c(10, 12, 15, 18, 22, 25, 30, 35, 40, 50, 55, 60)
hist(data, col='green', main='Histogram', xlab='Values')
```

### **Normal Curve Interpretation**

A normal distribution curve is symmetric around the mean.

- **Bell-shaped curve with most values near the center.**

- **68% of data within 1σ, 95% within 2σ, 99.7% within 3σ.**
**R Code for a Normal Curve:**

```r
x <- seq(-4, 4, length=100)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type='l', col='red', lwd=2, main='Normal Curve')
```

### **Skewness Interpretation**

Skewness indicates the direction and extent of asymmetry in data.

- **Symmetric Data:** Skewness = 0.

- **Right-Skewed (Positive):** Mean > Median > Mode.

- **Left-Skewed (Negative):** Mean < Median < Mode.

**R Code to Calculate Skewness:**

```r
library(moments)
data <- c(10, 12, 15, 20, 35, 50, 80)
skewness(data)
```

### **Confidence Interval Interpretation**

A confidence interval gives a range where the true population parameter is likely to fall.

- **If the CI includes the hypothesized value, we fail to reject H₀.**

- **If the CI excludes it, we reject H₀.**

**R Code to Compute a Confidence Interval:**

```r
data <- c(50, 52, 53, 54, 55, 56, 58, 60, 62, 64)
t.test(data, conf.level=0.95)
```

### **Z-Score Interpretation with Graph**

A Z-score tells how far a value is from the mean in terms of standard deviations.

- **Z = 0:** Value is exactly at the mean.

- **Z > 0:** Value is above the mean.

- **Z < 0:** Value is below the mean.


**R Code to Plot a Z-Score on a Normal Curve:**

```r
x <- seq(-4, 4, length=100)
y <- dnorm(x)
plot(x, y, type='l', col='blue')
abline(v=2, col='red', lwd=2, lty=2) # Marks Z = 2
```

### **T-Statistic Interpretation with Graph**

A T-distribution is used when the sample size is small (n < 30).

- **T-distribution is wider than the normal curve for small samples.**

- **For large samples, T-distribution approaches normality.**

**R Code to Plot a T-Distribution:**

```r
x <- seq(-4, 4, length=100)
y <- dt(x, df=10)
plot(x, y, type='l', col='blue', lwd=2, main='T-Distribution')
```

You might also like