0% found this document useful (0 votes)
22 views10 pages

Central Limit Theorem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views10 pages

Central Limit Theorem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Lecture on Central Limit Theorem (CLT)

Introduction

The Central Limit Theorem (CLT) is one of the most fundamental results in probability theory
and statistics. It tells us that under certain conditions, the distribution of the sum (or average) of a
large number of independent and identically distributed (i.i.d.) random variables approaches a
normal distribution, regardless of the original distribution of the variables.

In simpler terms: No matter the shape of the original population distribution, the
distribution of the sample mean will tend to become more normal (bell-shaped) as the
sample size increases, as long as the random variables are independent and identically
distributed.

Formal Statement of the CLT

Let X1,X2,…,XnX_1, X_2, \dots, X_nX1,X2,…,Xn be a random sample of size nnn from a
population with mean μ\muμ and standard deviation σ\sigmaσ, where:

 μ\muμ is the population mean


 σ\sigmaσ is the population standard deviation

The sample mean is:

Xˉ=1n∑i=1nXi\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_iXˉ=n1i=1∑nXi

The Central Limit Theorem states that as nnn becomes large:

Xˉ−μσn→N(0,1)\frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \to N(0, 1)nσXˉ−μ→N(0,1)

This means that the distribution of the sample mean approaches a normal distribution with
mean μ\muμ and standard deviation σn\frac{\sigma}{\sqrt{n}}nσ (i.e., a normal distribution
with mean μ\muμ and standard error σn\frac{\sigma}{\sqrt{n}}nσ).

Key Points of the CLT:


1. Sample Size: The larger the sample size nnn, the closer the sampling distribution of the
sample mean is to a normal distribution, even if the original data are not normally
distributed.
2. Independence: The sampled observations must be independent of each other.
3. Identically Distributed: The random variables must come from the same distribution
(i.e., they must be i.i.d. random variables).
4. Approximation: For small sample sizes (e.g., n<30n < 30n<30), the CLT may not apply
perfectly unless the original population is already approximately normal. However, as the
sample size increases, the approximation becomes more accurate.
5. Practical Use: The CLT is widely used in statistical inference, including hypothesis
testing and confidence intervals.

Example to Illustrate CLT

Suppose you are measuring the height of adult women in a certain country. The population of
heights is not normally distributed (perhaps it's skewed). However, by taking a sample of 50
women and calculating the sample mean, you can use the CLT to assert that the sampling
distribution of the sample mean will be approximately normal, with mean μ\muμ (the population
mean) and standard deviation σ50\frac{\sigma}{\sqrt{50}}50σ, where σ\sigmaσ is the
population standard deviation.

Solved Problems on Central Limit Theorem

Here are 10 problems with solutions that demonstrate the application of the Central Limit
Theorem:

Problem 1: Basic CLT application

The average height of women in a certain country is 160 cm with a standard deviation of 10 cm.
A random sample of 36 women is selected. Find the probability that the sample mean height is
greater than 162 cm.

Solution:

1. The population mean μ=160\mu = 160μ=160 cm, and the population standard deviation
σ=10\sigma = 10σ=10 cm.
2. Sample size n=36n = 36n=36.
3. The sampling distribution of the sample mean Xˉ\bar{X}Xˉ will have a mean
μXˉ=μ=160\mu_{\bar{X}} = \mu = 160μXˉ=μ=160 cm, and a standard deviation
(standard error) of σn=1036=106=1.67\frac{\sigma}{\sqrt{n}} = \frac{10}{\sqrt{36}}
= \frac{10}{6} = 1.67nσ=3610=610=1.67 cm.

We need to find the probability that Xˉ>162\bar{X} > 162Xˉ>162. First, we standardize the
value using the z-score formula:

z=162−1601.67=21.67≈1.20z = \frac{162 - 160}{1.67} = \frac{2}{1.67} \approx


1.20z=1.67162−160=1.672≈1.20

Using the standard normal distribution table, the probability of z>1.20z > 1.20z>1.20 is
approximately 0.1151. Therefore, the probability that the sample mean is greater than 162 cm is
0.1151 or 11.51%.

Problem 2: Applying the CLT to a non-normal distribution

The population distribution of incomes in a region is positively skewed with a mean income of
$40,000 and a standard deviation of $6,000. If you take a random sample of 50 households, what
is the probability that the average income for the sample is between $39,000 and $41,000?

Solution:

1. The population mean μ=40,000\mu = 40,000μ=40,000, and the population standard


deviation σ=6,000\sigma = 6,000σ=6,000.
2. Sample size n=50n = 50n=50.
3. The sampling distribution of the sample mean Xˉ\bar{X}Xˉ will have:
o μXˉ=40,000\mu_{\bar{X}} = 40,000μXˉ=40,000
o Standard error σn=6,00050≈849.24\frac{\sigma}{\sqrt{n}} = \frac{6,000}{\
sqrt{50}} \approx 849.24nσ=506,000≈849.24

We need to find the probability that 39,000≤Xˉ≤41,00039,000 \leq \bar{X} \leq


41,00039,000≤Xˉ≤41,000. First, we calculate the z-scores for 39,000 and 41,000:

z1=39,000−40,000849.24=−1,000849.24≈−1.18z_1 = \frac{39,000 - 40,000}{849.24} = \frac{-


1,000}{849.24} \approx -1.18z1=849.2439,000−40,000=849.24−1,000≈−1.18
z2=41,000−40,000849.24=1,000849.24≈1.18z_2 = \frac{41,000 - 40,000}{849.24} = \
frac{1,000}{849.24} \approx 1.18z2=849.2441,000−40,000=849.241,000≈1.18

Now, using the standard normal distribution table, the probability corresponding to z1=−1.18z_1
= -1.18z1=−1.18 is approximately 0.1190, and the probability corresponding to z2=1.18z_2 =
1.18z2=1.18 is approximately 0.8810.

Thus, the probability that the sample mean is between $39,000 and $41,000 is:
P(39,000≤Xˉ≤41,000)=P(z2)−P(z1)=0.8810−0.1190=0.7620P(39,000 \leq \bar{X} \leq 41,000)
= P(z_2) - P(z_1) = 0.8810 - 0.1190 = 0.7620P(39,000≤Xˉ≤41,000)=P(z2)−P(z1
)=0.8810−0.1190=0.7620

So, the probability is 0.7620 or 76.20%.

Problem 3: Finding sample size with desired margin of error

A researcher wants to estimate the average age of employees in a large company. The population
standard deviation is known to be 5 years. If the researcher wants the margin of error to be no
more than 1 year, what sample size should be used to achieve this?

Solution:

The formula for the margin of error (ME) when estimating the population mean is:

ME=zα/2×σnME = z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}ME=zα/2×nσ

Where:

 zα/2z_{\alpha/2}zα/2 is the critical value corresponding to the desired confidence level


(e.g., for a 95% confidence level, zα/2=1.96z_{\alpha/2} = 1.96zα/2=1.96),
 σ\sigmaσ is the population standard deviation,
 nnn is the sample size.

We are given:

 σ=5\sigma = 5σ=5,
 ME=1ME = 1ME=1,
 zα/2=1.96z_{\alpha/2} = 1.96zα/2=1.96.

Rearranging the formula to solve for nnn:

1=1.96×5n1 = 1.96 \times \frac{5}{\sqrt{n}}1=1.96×n5 n=1.96×51=9.8\sqrt{n} = \frac{1.96 \


times 5}{1} = 9.8n=11.96×5=9.8 n=9.82=96.04n = 9.8^2 = 96.04n=9.82=96.04

Since the sample size must be an integer, we round up to n=97n = 97n=97. Therefore, the
required sample size is 97.

Problem 4: CLT and the shape of the distribution


You are sampling from a population where the data follows a uniform distribution between 0 and
10. If you take a sample of size 25, what is the distribution of the sample mean?

Solution:

The Central Limit Theorem tells us that regardless of the population's distribution (in this case,
uniform), the distribution of the sample mean will approach a normal distribution as the sample
size increases.

 The population mean μ=0+102=5\mu = \frac{0 + 10}{2} = 5μ=20+10=5.


 The population standard deviation σ=(b−a)12=1012≈2.887\sigma = \frac{(b - a)}{\
sqrt{12}} = \frac{10}{\sqrt{12}} \approx 2.887σ=12(b−a)=1210≈2.887.
 The sample size n=25n = 25n=25.

The sampling distribution of the sample mean Xˉ\bar{X}Xˉ will be approximately normal with:

 Mean μXˉ=5\mu_{\bar{X}} = 5μXˉ=5,


 Standard error σn=2.88725=0.577\frac{\sigma}{\sqrt{n}} = \frac{2.887}{\sqrt{25}} =
0.577nσ=252.887=0.577.

Thus, the distribution of the sample mean is approximately normal with mean 5 and standard
deviation 0.577.

Problem 5: CLT and large sample sizes

The mean daily number of customers at a restaurant is 80, and the standard deviation is 20. What
is the probability that the sample mean of a sample of 100 days is greater than 82?

Solution:

1. The population mean μ=80\mu = 80μ=80, and the population standard deviation σ=20\
sigma = 20σ=20.
2. Sample size n=100n = 100n=100.
3. The sampling distribution of the sample mean Xˉ\bar{X}Xˉ will have:
o Mean μXˉ=80\mu_{\bar{X}} = 80μXˉ=80,
o Standard error σn=20100=2\frac{\sigma}{\sqrt{n}} = \frac{20}{\sqrt{100}} = 2n
σ=10020=2.

We need to find the probability that Xˉ>82\bar{X} > 82Xˉ>82. First, calculate the z-score:

z=82−802=1z = \frac{82 - 80}{2} = 1z=282−80=1

The probability of z>1z > 1z>1 is approximately 0.1587. Therefore, the probability that the
sample mean is greater than 82 is 0.1587 or 15.87%.
Problem 6: Confidence Interval Using the CLT

Suppose the average height of adult males in a city is normally distributed with a mean of 70
inches and a standard deviation of 4 inches. A random sample of 64 adult males is selected.
Construct a 95% confidence interval for the sample mean.

Solution:

We are given:

 Population mean μ=70\mu = 70μ=70,


 Population standard deviation σ=4\sigma = 4σ=4,
 Sample size n=64n = 64n=64,
 Desired confidence level = 95%.

The standard error (SE) of the sample mean is:

SE=σn=464=48=0.5SE = \frac{\sigma}{\sqrt{n}} = \frac{4}{\sqrt{64}} = \frac{4}{8} =


0.5SE=nσ=644=84=0.5

For a 95% confidence level, the critical value zα/2z_{\alpha/2}zα/2 is 1.96 (for a normal
distribution).

Now, the margin of error (ME) is:

ME=zα/2×SE=1.96×0.5=0.98ME = z_{\alpha/2} \times SE = 1.96 \times 0.5 = 0.98ME=zα/2


×SE=1.96×0.5=0.98

The 95% confidence interval for the sample mean is:

μXˉ±ME=70±0.98\mu_{\bar{X}} \pm ME = 70 \pm 0.98μXˉ±ME=70±0.98

Thus, the confidence interval is:

[70−0.98,70+0.98]=[69.02,70.98][70 - 0.98, 70 + 0.98] = [69.02, 70.98]


[70−0.98,70+0.98]=[69.02,70.98]

So, the 95% confidence interval for the sample mean is (69.02, 70.98) inches.

Problem 7: Large Sample Size and Normal Approximation


A population of students has an average score of 85 on a standardized test with a standard
deviation of 12. If you select a sample of 250 students, what is the probability that the sample
mean will be between 83 and 87?

Solution:

We are given:

 Population mean μ=85\mu = 85μ=85,


 Population standard deviation σ=12\sigma = 12σ=12,
 Sample size n=250n = 250n=250.

The sampling distribution of the sample mean Xˉ\bar{X}Xˉ will have:

 Mean μXˉ=85\mu_{\bar{X}} = 85μXˉ=85,


 Standard error SE=σn=12250≈0.758SE = \frac{\sigma}{\sqrt{n}} = \frac{12}{\
sqrt{250}} \approx 0.758SE=nσ=25012≈0.758.

We need to find the probability that the sample mean Xˉ\bar{X}Xˉ is between 83 and 87. First,
we standardize the values using the z-score formula:

For Xˉ=83\bar{X} = 83Xˉ=83:

z1=83−850.758≈−2.64z_1 = \frac{83 - 85}{0.758} \approx -2.64z1=0.75883−85≈−2.64

For Xˉ=87\bar{X} = 87Xˉ=87:

z2=87−850.758≈2.64z_2 = \frac{87 - 85}{0.758} \approx 2.64z2=0.75887−85≈2.64

Using the standard normal distribution table:

 The probability of z1=−2.64z_1 = -2.64z1=−2.64 is approximately 0.0041,


 The probability of z2=2.64z_2 = 2.64z2=2.64 is approximately 0.9959.

Thus, the probability that 83≤Xˉ≤8783 \leq \bar{X} \leq 8783≤Xˉ≤87 is:

P(83≤Xˉ≤87)=P(z2)−P(z1)=0.9959−0.0041=0.9918P(83 \leq \bar{X} \leq 87) = P(z_2) - P(z_1)


= 0.9959 - 0.0041 = 0.9918P(83≤Xˉ≤87)=P(z2)−P(z1)=0.9959−0.0041=0.9918

So, the probability is 0.9918 or 99.18%.

Problem 8: Sampling Distribution with Finite Population


The average monthly expenditure on groceries for a family in a town is $500 with a standard
deviation of $80. If you select a sample of 36 families, calculate the probability that the sample
mean will be less than $490. Assume that the population size is large enough that the finite
population correction factor can be ignored.

Solution:

We are given:

 Population mean μ=500\mu = 500μ=500,


 Population standard deviation σ=80\sigma = 80σ=80,
 Sample size n=36n = 36n=36.

The sampling distribution of the sample mean Xˉ\bar{X}Xˉ will have:

 Mean μXˉ=500\mu_{\bar{X}} = 500μXˉ=500,


 Standard error SE=σn=8036=806=13.33SE = \frac{\sigma}{\sqrt{n}} = \frac{80}{\
sqrt{36}} = \frac{80}{6} = 13.33SE=nσ=3680=680=13.33.

We need to find the probability that the sample mean Xˉ\bar{X}Xˉ is less than $490. First,
calculate the z-score for Xˉ=490\bar{X} = 490Xˉ=490:

z=490−50013.33=−1013.33≈−0.75z = \frac{490 - 500}{13.33} = \frac{-10}{13.33} \approx -


0.75z=13.33490−500=13.33−10≈−0.75

Using the standard normal distribution table, the probability of z<−0.75z < -0.75z<−0.75 is
approximately 0.2266.

Thus, the probability that the sample mean is less than $490 is 0.2266 or 22.66%.

Problem 9: CLT and Non-Normal Distribution

A factory produces light bulbs, and the lifetime of each bulb is measured. The lifetime of the
bulbs follows an exponential distribution with a mean of 1,000 hours and a standard deviation of
1,000 hours. If you select a random sample of 36 bulbs, what is the probability that the average
lifetime of the bulbs in the sample exceeds 1,050 hours?

Solution:

We are given:

 Population mean μ=1000\mu = 1000μ=1000,


 Population standard deviation σ=1000\sigma = 1000σ=1000,
 Sample size n=36n = 36n=36.
The sampling distribution of the sample mean Xˉ\bar{X}Xˉ will have:

 Mean μXˉ=1000\mu_{\bar{X}} = 1000μXˉ=1000,


 Standard error SE=σn=100036=166.67SE = \frac{\sigma}{\sqrt{n}} = \frac{1000}{\
sqrt{36}} = 166.67SE=nσ=361000=166.67.

We need to find the probability that Xˉ>1050\bar{X} > 1050Xˉ>1050. First, calculate the z-
score for Xˉ=1050\bar{X} = 1050Xˉ=1050:

z=1050−1000166.67=50166.67≈0.30z = \frac{1050 - 1000}{166.67} = \frac{50}{166.67} \


approx 0.30z=166.671050−1000=166.6750≈0.30

Using the standard normal distribution table, the probability of z>0.30z > 0.30z>0.30 is
approximately 0.3821.

Thus, the probability that the sample mean exceeds 1,050 hours is 0.3821 or 38.21%.

Problem 10: CLT and Confidence Interval for Proportions

A random sample of 400 voters is selected, and 60% of the sample is found to support a
particular candidate. Construct a 99% confidence interval for the proportion of voters in the
entire population who support this candidate.

Solution:

We are given:

 Sample proportion p^=0.60\hat{p} = 0.60p^=0.60,


 Sample size n=400n = 400n=400,
 Confidence level = 99%.

For a 99% confidence level, the critical value zα/2=2.576z_{\alpha/2} = 2.576zα/2=2.576.

The standard error for the proportion is:

SE=p^(1−p^)n=0.60(1−0.60)400=0.24400=0.0006≈0.0245SE = \sqrt{\frac{\hat{p}(1 - \hat{p})}


{n}} = \sqrt{\frac{0.60(1 - 0.60)}{400}} = \sqrt{\frac{0.24}{400}} = \sqrt{0.0006} \approx
0.0245SE=np^(1−p^)=4000.60(1−0.60)=4000.24=0.0006≈0.0245

The margin of error (ME) is:

ME=zα/2×SE=2.576×0.0245≈0.0633ME = z_{\alpha/2} \times SE = 2.576 \times 0.0245 \


approx 0.0633ME=zα/2×SE=2.576×0.0245≈0.0633
Thus, the 99% confidence interval for the population proportion is:

p^±ME=0.60±0.0633\hat{p} \pm ME = 0.60 \pm 0.0633p^±ME=0.60±0.0633

So, the confidence interval is:

[0.5367,0.6633][0.5367, 0.6633][0.5367,0.6633]

Thus, the 99% confidence interval for the proportion of voters supporting the candidate is
(0.5367, 0.6633).

Conclusion

The Central Limit Theorem is a powerful tool that allows us to make inferences about
population parameters based on sample data, even when the underlying population distribution is
not normal. It tells us that, under certain conditions (large sample size, independence, and
identical distribution), the sample mean will approximate a normal distribution, allowing us to
calculate probabilities and construct confidence intervals for population parameters. The key
takeaway is that the CLT is essential in statistical practice, especially when dealing with large
samples.

You might also like