0% found this document useful (0 votes)
12 views6 pages

STD Normal Dist Formul Proof

The document explains the properties of the normal distribution, specifically the 68-95-99.7 Rule, which states that approximately 68.27% of data lies within one standard deviation, 95.45% within two, and 99.73% within three. It provides mathematical proof using integration of the normal probability density function and cumulative distribution function. Additionally, it includes a Python simulation that verifies these percentages with real data, showing results that closely match the theoretical values.

Uploaded by

monkeiydluffy752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

STD Normal Dist Formul Proof

The document explains the properties of the normal distribution, specifically the 68-95-99.7 Rule, which states that approximately 68.27% of data lies within one standard deviation, 95.45% within two, and 99.73% within three. It provides mathematical proof using integration of the normal probability density function and cumulative distribution function. Additionally, it includes a Python simulation that verifies these percentages with real data, showing results that closely match the theoretical values.

Uploaded by

monkeiydluffy752
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Normal Distribution Explained

in standart deviation , we know 68.8% data lies in 1


standard deviation , 95% in 2 std . is there any proof that 1
std contain that percentage data and so on

Yes! The 68-95-99.7 Rule (Empirical Rule) comes from the properties of the normal
distribution and can be proven using integrals of the normal probability density function
(PDF).

1️⃣ The Standard Normal Distribution


A normal distribution is given by:

1 −
(x−μ)2
f (x) = e 2σ2 ​

σ 2π

where:

μ is the mean
σ is the standard deviation
The area under the entire curve is 1 (since it's a probability distribution).

2️⃣ Proof Using Integration


To find the probability that data lies within 1, 2, or 3 standard deviations, we integrate the
standard normal distribution:

b
1 −z 2 /2
P (a ≤ Z ≤ b) = ∫ e dz

​ ​

a ​

X−μ
where Z is the standard normal variable (Z = σ ).

📌 Computing the probabilities:


1. For ±1σ (68.27%)

1/6
1
1 −z 2 /2
P (−1 ≤ Z ≤ 1) = ∫ e dz ≈ 0.6827

​ ​

−1 ​

2. For ±2σ (95.45%)

2
1 −z 2 /2
P (−2 ≤ Z ≤ 2) = ∫ e dz ≈ 0.9545

​ ​

−2 ​

3. For ±3σ (99.73%)

3
1 −z 2 /2
P (−3 ≤ Z ≤ 3) = ∫ e dz ≈ 0.9973

​ ​

−3 ​

These values match the Empirical Rule.

3️⃣ Alternative Proof Using the Cumulative Distribution Function


(CDF)
Instead of integrating manually, we can use the CDF of the normal distribution, Φ(z), which is
precomputed:

P (−k ≤ Z ≤ k) = Φ(k) − Φ(−k)

Using standard normal tables:

Φ(1) − Φ(−1) = 0.8413 − 0.1587 = 0.6827 (~68%)


Φ(2) − Φ(−2) = 0.9772 − 0.0228 = 0.9545 (~95%)
Φ(3) − Φ(−3) = 0.99865 − 0.00135 = 0.9973 (~99.7%)

4️⃣ Real-Life Example & Verification


Would you like me to generate a dataset and plot how real data follows this rule? I can
simulate normally distributed data and verify the percentages within 1, 2, and 3 standard
deviations. 🚀
yess

2/6
python

import numpy as np Always show details

import matplotlib.pyplot as plt


import seaborn as sns

# Generate a large sample of normally distributed data (mean=100, std=15)


mean, std, size = 100, 15, 10000
data = np.random.normal(mean, std, size)

# Compute the standard deviation ranges


one_std_range = (mean - std, mean + std)
two_std_range = (mean - 2*std, mean + 2*std)
three_std_range = (mean - 3*std, mean + 3*std)

# Count percentages within each range


within_one_std = np.mean((data >= one_std_range[0]) & (data <= one_std_range[1])) *
100
within_two_std = np.mean((data >= two_std_range[0]) & (data <= two_std_range[1])) *
100
within_three_std = np.mean((data >= three_std_range[0]) & (data <=
three_std_range[1])) * 100

# Plot histogram with KDE


plt.figure(figsize=(10, 5))
sns.histplot(data, bins=50, kde=True, color='blue', edgecolor='black', alpha=0.6)

# Highlight standard deviation regions


plt.axvline(one_std_range[0], color='red', linestyle='dashed', label=f'±1σ
({within_one_std:.2f}%)')
plt.axvline(one_std_range[1], color='red', linestyle='dashed')

plt.axvline(two_std_range[0], color='orange', linestyle='dashed', label=f'±2σ


({within_two_std:.2f}%)')
plt.axvline(two_std_range[1], color='orange', linestyle='dashed')

plt.axvline(three_std_range[0], color='green', linestyle='dashed', label=f'±3σ


({within_three_std:.2f}%)')
plt.axvline(three_std_range[1], color='green', linestyle='dashed')

# Labels and legend


plt.title("Normal Distribution & Standard Deviation Ranges")
plt.xlabel("Value")

3/6
plt.ylabel("Frequency")
plt.legend()
plt.show()

# Return computed percentages


within_one_std, within_two_std, within_three_std

python

# Re-import necessary libraries since execution state was reset Always show details

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate a large sample of normally distributed data (mean=100, std=15)


mean, std, size = 100, 15, 10000
data = np.random.normal(mean, std, size)

# Compute the standard deviation ranges


one_std_range = (mean - std, mean + std)
two_std_range = (mean - 2*std, mean + 2*std)
three_std_range = (mean - 3*std, mean + 3*std)

# Count percentages within each range


within_one_std = np.mean((data >= one_std_range[0]) & (data <= one_std_range[1])) *
100
within_two_std = np.mean((data >= two_std_range[0]) & (data <= two_std_range[1])) *
100
within_three_std = np.mean((data >= three_std_range[0]) & (data <=
three_std_range[1])) * 100

# Plot histogram with KDE


plt.figure(figsize=(10, 5))
sns.histplot(data, bins=50, kde=True, color='blue', edgecolor='black', alpha=0.6)

# Highlight standard deviation regions


plt.axvline(one_std_range[0], color='red', linestyle='dashed', label=f'±1σ
({within_one_std:.2f}%)')
plt.axvline(one_std_range[1], color='red', linestyle='dashed')

plt.axvline(two_std_range[0], color='orange', linestyle='dashed', label=f'±2σ

4/6
({within_two_std:.2f}%)')
plt.axvline(two_std_range[1], color='orange', linestyle='dashed')

plt.axvline(three_std_range[0], color='green', linestyle='dashed', label=f'±3σ


({within_three_std:.2f}%)')
plt.axvline(three_std_range[1], color='green', linestyle='dashed')

# Labels and legend


plt.title("Normal Distribution & Standard Deviation Ranges")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()

# Return computed percentages


within_one_std, within_two_std, within_three_std

Result

(67.86999999999999, 95.56, 99.76)

Here’s the histogram of the normally distributed data along with the standard deviation
ranges. The computed percentages of data within each range are:

1 standard deviation (±1σ): ≈ 67.87% (Expected: 68.27%)

2 standard deviations (±2σ): ≈ 95.56% (Expected: 95.45%)

5/6
3 standard deviations (±3σ): ≈ 99.76% (Expected: 99.73%)

These values closely match the theoretical Empirical Rule, confirming that a normal
distribution follows these probabilities in real-world data. 🚀​

6/6

You might also like