0% found this document useful (0 votes)
59 views9 pages

Balaji Statistics With R-Package Central Limit Theorem (CLT) : Solved Example

The central limit theorem states that the sampling distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. According to the CLT, the mean of sample means equals the population mean, and the standard deviation of sample means equals the population standard deviation divided by the square root of the sample size. The CLT is applicable for sample sizes of 30 or more. The CLT allows statisticians to use normal distributions to analyze sample means even if the underlying population is not normally distributed.

Uploaded by

Ashutosh Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views9 pages

Balaji Statistics With R-Package Central Limit Theorem (CLT) : Solved Example

The central limit theorem states that the sampling distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. According to the CLT, the mean of sample means equals the population mean, and the standard deviation of sample means equals the population standard deviation divided by the square root of the sample size. The CLT is applicable for sample sizes of 30 or more. The CLT allows statisticians to use normal distributions to analyze sample means even if the underlying population is not normally distributed.

Uploaded by

Ashutosh Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Balaji

Statistics with R-package


Module 5

Central limit theorem[CLT]


The Central Limit Theorem is the sampling distribution of the sampling means approaches a
normal distribution as the sample size gets larger, no matter what the shape of the data
distribution. An essential component of the Central Limit Theorem is the average of sample
means will be the population mean.
Similarly, if you find the average of all of the standard deviations in your sample, you will find the
actual standard deviation for your population.

 Mean of sample is same as the mean of the population. μx


 The standard deviation of the sample is equal to the standard deviation σx of the
population divided by the square root of the sample size.n
Central limit theorem is applicable for a sufficiently large sample sizes (n ≥ 30). The formula for
central limit theorem can be stated as follows:

μȲ ¯¯¯=μx
and
σȲ ¯¯¯=σx/√n
Where,
μx = Population mean
σx = Population standard deviation
μȲ ¯¯ = Sample mean
σȲ ¯¯¯ = Sample standard deviation
n = Sample size

Solved Example
Question: The record of weights of the male population follows the normal distribution. Its mean
and standard deviations are 70 kg and 15 kg respectively. If a researcher considers the records
of 50 males, then what would be the mean and standard deviation of the chosen sample?

Solution:
Mean of the population μx = 70 kg
Standard deviation of the populationσ x = 15 kg
sample size n = 50
Mean of the sample is given by:
Ȳ ¯ = 70 kg
Standard deviation of the sample is given by:
σȲ ¯¯¯ = σx√n
σȲ ¯¯¯ = 15√50 =SE
σȲ ¯¯¯ = 2.122 = 2.1 kg (approx)=SE

The demonstration of CLT


Section -A
#SAMPLING DITRIBUTION: SAMPLE MEAN
#Consider a normal population of 1000 students
#who appear for a written test of 100 marks.
#X1,X2,....,X1000. Examiner says mean is 62
#with sd=12.generate a random samples
#of 20 students each.
pdata<-rnorm(1000,62,12)
hist(pdata)

abline(v=62,col="red")
iter<-500
n<-40
am<-rep(NA,iter)
for(i in 1:iter){
d<-sample(pdata,n)
d
am[i]<-mean(d)
}
hist(am,col="red")
n1<-length(am)
n1
#AM OF ALL 500 SAMPLE MEANS
sam<-mean(am)
sam
hist(am)
abline(v=sam,col="green")
#SD OF 500 SAMPLE MEANS
ssd<-sd(am)
ssd
#IT IS KNOW AS SE OF SAMPLE MEAN
#SE=sigma/sqrt(n)
se<-12/sqrt(40)
se
# probability distribution of sample mean is
#ND[MU,SE] is the conclusion.
Section-B

#sampling distribution of non normal distribution


data(trees)
names(trees)
x<-trees$"Girth"
x
mu<-mean(x)
mu
sigma<-sd(x)
sigma
hist(x)
# x data is not ND
n<-20
iter<-300

am<-rep(NA,iter)
for(i in 1:iter){
d<-sample(x,n)
d
am[i]<-mean(d)
}
hist(am,col="red")
n1<-length(am)
n1
#AM OF ALL 500 SAMPLE MEANS
sam<-mean(am)
sam
hist(am,col="green")
abline(v=sam,col="red")
#SD OF 500 SAMPLE MEANS
ssd<-sd(am)
ssd
#IT IS KNOW AS SE OF SAMPLE MEAN
#SE=sigma/sqrt(n)
se<-sigma/sqrt(n)
se
# probability distribution of sample mean is
#ND[MU,SE] is the conclusion.
Testing of hypothesis on Population mean[s].
General idea:- [i] Null hypothesis Ho
[ii] Alternate hypothesis H1
[iii] level of significance α.=P[reject Ho under
the assumption it is true.]
[iv] Critical /Rejection region for given α
[v] Critical value and then Decision rule.

[i] #No sample data available:[CV Approach]


One sample µ=µ0

Ex:-1 An inventor has developed a new,


energy-efficient lawn mower engine. He claims
that the engine will run continuously for 5 hours
(300 minutes) on a single gallon of regular
gasoline. From his stock of 2000 engines, the
inventor selects a simple random sample of 50
engines for testing. The engines run for an
average of 295 minutes, with a standard
deviation of 20 minutes. Test the null hypothesis
that the mean run time is 300 minutes against
the alternative hypothesis that the mean run
time is not 300 minutes. Use a 0.05 level of
significance. (Assume that run times for the
population of engines are normally distributed.)
[a] X=Run time per engine in minutes
2000 values in population, unknown

H0: µx=300 H1: µx≠300 ,


µx=300 , µx>300 µx<300 possibility.
Only one is true. Finding the truth with the
help of a random sample is called Testing of
hypothesis.
[b] GIVEN l.o.s= α= 0.05 = P[reject Ho under
the assumption it is true.]
Two sided test
[c] Z=(Ȳ- µ0)/se=sd/sqrt(n)
SND for critical values.
Conclusion will be Accept H0,if Z in acceptance
region-[1-α] reject H0 if Z lie in [α]

EX:-2 Bon Air Elementary School has


1000 students. The principal of the school
thinks that the average IQ of students at
Bon Air is at least 110. To prove her point,
she administers an IQ test to 20 randomly
selected students. Among the sampled
students, the average IQ is 108 with a
standard deviation of 10. Based on these
results, should the principal accept or reject
her original hypothesis? Assume a
significance level of 0.05. (Assume that test
scores in the population of engines are
normally distributed.)
H0: µx≥110 H0: µx
˂110
l.o.s= 0.05 Left sided SMALL SAMLE T TEST
t=(Ȳ- µ0)/sd/sqrt(n) t- for cv because n<30

[ii] Direct sample data available.[P-value Approach]

data(sleep)
names(sleep)
x<-sleep$"extra"
y<-sleep$"group"
z<-sleep$"ID"

one sample t-test


t.test(x,mu=1.5)

[iii] #Two samples test µ1=µ2


Directly from samples data
with (sleep,t.test(x[y==1],x[y==2]))
plot(x~y)
Ex:-3 Within a school district, students were randomly
assigned to one of two Math teachers - Mrs. Smith and Mrs. Jones.
After the assignment, Mrs. Smith had 30 students, and Mrs. Jones
had 35 students.

At the end of the year, each class took the same standardized test.
Mrs. Smith's students had an average test score of 78, with a
standard deviation of 10; and Mrs. Jones' students had an average
test score of 85, with a standard deviation of 15.

Test the hypothesis that Mrs. Smith and Mrs. Jones are equally
effective teachers. Use a 0.10 level of significance. (Assume that
student performance is approximately normal.)

Mrs. Smith Mrs. Jones HO: µ1=µ2 H1: µ1≠µ2


N 30 35

AM 78 85

SD 10 15

Ex:-4 The Acme Company has developed a new battery. The


engineer in charge claims that the new battery will operate
continuously for at least 7 minutes longer than the old battery.

To test the claim, the company selects a simple random sample of


100 new batteries and 100 old batteries. The old batteries run
continuously for 190 minutes with a standard deviation of 20
minutes; the new batteries, 200 minutes with a standard deviation
of 40 minutes.

Test the engineer's claim that the new batteries run at least 7
minutes longer than the old. Use a 0.05 level of significance.
(Assume that there are no outliers in either sample.)

OLD[ µ1] NEW [µ2] HO: µ1-µ2≤-7, H1: µ1-µ2˃-7


N 100 100

AM 190 200

SD 20 40

****************************************

TEST OF HYPOTHESIS ON PARAMETER,MEAN


Q1. What marks you expect for this paper out of 100?
Ans is 80.

Q2. How much confidence are you?


Ans is 90%.

Q3 .What is the meaning of other 10%?


Ans is Possibility that the true value is either <80 or
>80. This means probability of not getting the target is
0.10.
Q4. To know the true mark, what is to be done.?
Ans is only after the experiment ,examination.

The above discussion is indicating that one of the


three possibility, < 80 >80 or=80 is true with
probabilities .05,.05 and .90.
Q5. Researcher wish to reach the objective with a
high probability. But while doing the research
possibly he may get a different output. This
probability will be small. Based on available
information one has to verify
Whether Yes or No for the result on objective. This
process of deciding the truth is Statistically called Test
of hypothesis.

Notations:
1. Null hypothesis
H0: Claim on certain parameter. In the discussion
Mark=80, Need not be true.
2. Alternate hypothesis.
H1: It is negation of H0. It may be one sided or
two sided.
3. X1,X2,X3,….Xn is available information .Sample
data.
4. In test of hypothesis The probability of
acceptance is known as Acceptance region. Other
probability is Rejection region in the graph of
analysis. It is denoted by 1-α and α. In two sided
study the three probabilities are 1-α ,α/2, α/2.
This α in general is called LEVEL OF SIGNIFICANCE
IN testing of hypothesis.

Z=[Variable-AM]/sd for data values.


Z=[Sample mean-AM]/SE. for sample mean data.
SE=sd/sqrt[n]

You might also like