Chapter 4stats and Prob

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

CHAPTER 4

ESTIMATION
PREPARED BY: ALMISAR A. HAWARI

STATISTICS AND PROBABILITY


Estimate
Population Parameters

Are estimated using


either
Point Estimate or Interval Estimates
Are constructed with
a specific

Level of Confidence

Margin of Error
Estimation and Confidence Interval

 Instatistics, the process of using sample


data to calculate a number that can be
used to represent an unknown population
parameter is called estimation. The number
that we can obtain from a sample is then
called a point estimate or simply, as
estimate of the population parameter.
Definition:

A point estimate is a single value


estimate for the unknown value of
the population parameter.
Vocabulary

 Unbiased – The expected value or the mean of the estimates


obtained from sample of a given size is equal to the parameter being
estimated.
 Consistent – As the sample size increase, the value of the estimator
approaches the value of the parameter being estimated.
 Relatively Efficient – When we consider the concept of efficiency,
we refer to the sampling variability of an estimator. If two competing
estimators are both unbiased, the one with the smaller variance
(given sample size) is said to b e relatively more efficient. The smaller
the variance of the estimator, the concentrated is the distribution of
the estimator around the parameter being estimated and, therefore,
the better thus estimator is.
Example 1.

 Ina city, 5, 238 private cars are


registered. The owner of Petroshell
Gasoline wants to know the proportion
of theses cars that use diesel. How do
you propose to find a point estimated
of the population parameter?
Solution:

 First we need to define our population of interest. This is


the 5,238 registered private cars in the city. The variable
of interest is the type of fuel that has two possible values-
non-diesel or diesel. As for the parameter of interest, we
want the population proportion p of the 5,238 cars that
uses diesel. Since it is difficulty to reach out every owner
of the cars to ask whether or not uses diesel or another
fuel, a sample , a certain number or owners (say, n = 30 or
more) may be selected at random. From the sample, we
can get the point estimate given by number of cars using,
𝑥
x, divided by n, that is, the point estimator 𝑝 = 𝑛
Example 2.
 The principal of GLJ High School wan to know the English
proficiency of all the students in his school. Instead of
administering an examination to all the students, he
decided to take a random sample 60 students so that their
scores in s standardized English proficiency test are
follows:

46 52 58 63 65 67 73 74 74 75 85 90
47 55 59 64 66 68 73 74 74 75 88 91
49 56 60 64 66 70 73 74 75 78 90 91
49 57 60 64 67 72 73 74 75 82 90 91
49 57 62 64 67 73 74 74 75 83 90 93
solution

𝒏
𝒊=𝟏 𝒙𝒊
𝒙 =
𝒏

𝟒𝟔 + 𝟒𝟕 + ⋯ + 𝟗𝟏 + 𝟗𝟑
= 𝟕𝟎. 𝟐𝟖
𝟔𝟎
Definition
 A confidence interval for a population parameter is an
interval estimate with an associated confidence that it
contains the unknown parameter. We call (1 -𝛼)100%, the
confidence level (or degree of confidence) of the interval.

 For example, an estimate of the population mean NAT


score of Grade 6 students is a value between 70 and 82,
that is
70 < 𝜇 < 82

There is a “ Chance” that the actual value of 𝜇 is contained


in this interval. But we can assign of confidence (usually,
90%, 95%, or 99%) before computing such an interval
estimate.
We think of the confidence level 1 - 𝛼 as
the area under the standard normal
curve between two critical values, −𝑧𝛼
2
and +𝑧𝛼 .
2

If the degree of confidence is 1-𝛼 = 0.90, then the area in the


unshaded region is also 1-𝛼 = 0.90. The sum of the areas on
the shaded region is 𝛼 = 0.10 so that the area in each tail is
𝛼
= 0.05. The critical value separating the left tail is −𝑧𝛼 =
2 2
−𝑧𝛼 = −1.64;
2

while the critical value separating the right tail is 𝑧𝛼 = 𝑧0.05 =


2
− 1.645
 When 𝜇 is estimated using the statisitics, 𝑥, then the
sampling error is the difference 𝑥 − 𝜇. This difference
changes from sample to sample and so our interest is to
calculate the maximum value of the error e which is
defined as follows.

Definition:
 The margin error e is the maximum error of estimate
given by
𝜎
𝑒 = 𝑧𝛼 𝜎𝑥 = 𝑧𝛼 ∙
2 2 𝑛
Factors that Affect Confidence Intervals (CI)

 Population size: this does not usually affect the CI but can be a factor if
you are working with small and known groups of people.
 Sample Size: the smaller your sample, the less likely it is you can be
confident the results reflect the true population parameter.
 Percentage: Extreme answers come with better accuracy. For example,
if 99 percent of voters are for gay marriage, the chances of error are
small. However, if 49.9 percent of voters are “for” and 50.1 percent are
“against” then the chances of error are bigger.
Example 3.

 Use the data in Example 2 and a 95% confidence level to find the margin of
error E for the estimation of the mean English proficiency score of all GLJ
High School students. Assume the population standard deviation is 𝜎 = 1.25
 Solution:
 The z-score = 95%
 𝑧𝛼 = 1.96, this means that 95% of the area under the standard normal curve.
2

 Compute:
𝜎 1.25
𝑒 = 𝑧𝛼 ∙ = 1.96 = 0.3163
2 𝑛 60
This means that the principal can be 95% confident that the margin of error for
the estimation of the population mean is about 0.3163.
Example 4.

https://www.statisticshowto.c
om/probability-and-
statistics/statistics-
definitions/confidence-level/
Confidence Interval for the population
Mean
Definition

 The
confidence interval for the population
mean 𝜇 with margin of error is

𝑥−𝑒 < 𝜇 <𝑥+𝑒


Large Samples (n ≥30) or the Population Variance 𝜎 2 is known.

Step 1. Find the sample statistics n and 𝑥.


𝑛
𝑖=1 𝑥𝑖
𝑥= 𝑛
Step 2. Select a confidence level that describes the uncertainty of the sampling
method. (Usually, 90%, 95%, or 99%can be used.)

Step 3. Compute the margin of error e using the critical value 𝑧𝛼 and the
2
known population standard deviation 𝜎. Apply the formula
𝜎
𝑒 = 𝑧𝛼
2 𝑛
Now, if the popu;ation standard deviation is not knpown but n ≥30 use the
sample standard deviation s to estimate 𝜎.

(𝑥 − 𝑥)2
𝑠=
𝑛−1

Step 4. Formulate the confidence interval as


𝑥−𝑒 < 𝜇 <𝑥+𝑒
Example 5

 A social media analyst wanted to know the average


number of Facebook friends that young people of ages 13
– 16 have. She took a random sample of 100 children of
ages 13 -16 years old an d found out that the average
number of friends they have is 𝑥 = 120 with a sample
standard deviation is s = 12.4. Construct a 95% confidence
interval for the mean of friends of all Facebook users of
ages 13- 16.
Solution
n = 100, 𝑥 = 120, s = 12.4 𝑧𝛼 = 1.96
2

Margin of error is

𝜎 (1.96)(12.4)
𝑒= 𝑧𝛼 = = 2.4304
2 𝑛 100
Thus, a 95% confidence interval for the mean number of friends is:
𝑥−𝑒 < 𝜇 <𝑥+𝑒
120 − 2.4303 < 𝜇 < 120 + 2.4304
117.5696 < 𝜇 < 122.4304
The above confidence interval is interpreted as : With 95% confidence level, the
mean number of Facebook friend of young people ages 13-16 is between 118 and
123.
The t-Distributed Revisited
Small Samples (n < 30) and the Population Variance is Unknown.
Step 1. Find the values of n, 𝑥, and s.

𝑛 𝑛
𝑖=1 𝑥𝑖 − 𝑥)2
𝑖=1(𝑥𝑖
𝑥= ,𝑠 =
𝑛 𝑛−1

Step 2. Select confidence level that describes the uncertainty of the sampling
method.
Step 3. Compute the margin of error e using the critical value 𝑡𝛼 at df = n -1
2
degrees of freedom.
And the computed sample standard deviation s. Apply the formula
𝑠
𝑒 = 𝑡𝛼
2 𝑛

Step 4. Find the left and right endpoints form the confidence interval
𝑥−𝑒 < 𝜇 <𝑥+𝑒
Example 6

 Find the critical value 𝑡𝑎 for a 95% confidence level when the sample
size is n = 20.

 Solution:
 With 𝛼 0.05 using the t-distribution table at n -1 = 20 – 1 = 19 degrees
of freedom, we obtain 𝑡0.05 = 2.093.
Confidence Interval for the Population
Proportions
 Lets us consider this situation again: out of the 5,238 registered
private cars, the CEO of Petroshell Gasoline wants to know the
proportion of the car owners who opt to use diesel. So, a point
estimate for this is the proportion of success taken from a sample in
a Binomial process. By obtaining a sample from the population and
recording a score of “ 1 “ for each confirmation that says, “Yes, I am
using diesel,” and “ 0 “ otherwise, then we can view the sample
proportion statistics 𝑝 as the average of responses:
𝑛
𝑖=1 𝑥𝑖
𝑝=
𝑛
Where 𝑥𝑖 is either a 1 or a 0 given that the sample size is n. By writing
𝑥 = 𝑛𝑖=1 𝑥𝑖 which counts the number of “ Yes “ responses, we have the
𝑥
point estimate 𝑝 = which can be interpreted as success probability so
𝑛
that the probability of failures is given by 𝑞 = 1 − 𝑝.
Recall

 The binomial distribution can be approximated by a


normal distribution if
𝑛𝑝 ≥ 5 and 𝑛𝑞 ≥ 5

When 𝑛𝑝 ≥ 5 and 𝑛𝑞 ≥ 5, the sampling distribution of 𝑝 is


approximately normal with a mean of 𝜇𝑝 = 𝑝 and a standard
𝑝𝑞
error of 𝜎𝑝 = .
𝑛
Definition

 The confidence interval for a population proportion p is


𝑝−𝑒 <𝑝 <𝑝+𝑒
𝑝𝑞
Where e = 𝑧𝛼 𝑛
2
Proportion with n𝑝 ≥ 5 and n𝑞 ≥ 5

 Step 1. Find the values of n, x, 𝑝, and 𝑞 wher


𝑛
𝑥
𝑥= 𝑥𝑖 , 𝑝 = , 𝑞 = 1 − 𝑝.
𝑛
𝑖=1

At this point verify n𝑝 ≥ 5 and n𝑞 ≥ . Once satisfied, proceed to the next step.

Step Select a confidence level that describe the uncertainty of the sampling
method.
Step 3. Compute the margin of error e using the critical value 𝑧𝑎 .
2

𝑝𝑞
e = 𝑧𝛼
2 𝑛

Step 4. Find the left and right endpoints and form the confidence interval.
𝑝−𝑒<𝑝<𝑝+𝑒
Example 7.

 The vice president of a private club feels that he


is now ready to become the top official in his
club. He is interested in knowing the proportion
of club members who will vote for him in case he
runs for president. A random sample of 50 club
members yielded 36 who said they will vote for
him. Construct a 90% confidence interval for the
proportion of club members who will vote for him
as president.
Solution
 We verify that the sampling distribution of 𝑝 can be approximated by a
36
normal distribution. With 𝑝 = 50 = 0.72 and n =50, we have

n𝑝 = (50)(0.72) = 36 > 5 and n𝑞 = 𝑛 1 − 𝑝 = 50 0.28 = 14 > 5.


For a 90% confidence interval and 𝛼 = 0.10, we take 𝑧𝛼 = 𝑧0.05 = 1.645 and
2
compute
𝑝𝑞
e = 𝑧𝛼
2 𝑛

(0.72)(0.28)
=1.645 50

=0.1045≈0.10
We have
𝑝−𝑒<𝑝<𝑝+𝑒
0.72 − 0.10 < 𝑝 < 0.72 + 0.10
0.62 < 𝑝 < 0.82
As the 90% confidence interval for the proportion of club members who are in
favor of the vice president running as president of the club.
Sample Size
 Minimum Sample Size for the Estimation of Population
Mean (𝜇)
 Given a (1 - 𝛼)100% confidence level and a margin of error
e, the minmum sample size n needed to estimate 𝜇 is

𝑧𝛼 ∙ 𝜎
2
𝑛=( )2
𝑒
Example 8.

 Clear water wants to estimate the mean number


of gallons of alkaline water orders they get each
day. Suppose that they want a 99% confidence
level that the estimate is accurate within 3
gallons knowing that the standard deviation is
about 8 gallons. What should be the minimum
sample size that they need in order to make this
estimate?
Solution

 For 99% confidence level, our critical value is 𝑧𝛼 =


2
2.575. We have e = 3 and 𝜎 = 8, so
𝑧𝛼 ∙ 𝜎
𝑛 = ( 2 )2
𝑒
2.575 8 2
=( )
3
= 47.15.
 Minimum Sample Size for the Estimation of
Population Proportion p
 Given a (1 - 𝛼)100% confidence level and a margin
of error e, the minmum sample size n needed to
estimate 𝑝 is

𝑧𝛼
𝑛 = 𝑝𝑞( 2 )2
𝑒
Example 9.

 You want to estimate the proportion of students in your


school who will support your “CLAY GO” (Clean As You Go)
campaign. Suppose you want a 99% confidence for this
estimate with accuracy within 4% of population
proportion, find the minimum sample size needed if
 A. You do not have any preliminary estimate available;
 B. a preliminary estimate gives 𝑝 = 0.90
 Compare the result you obtain in (a) and (b).
a. You do not have any preliminary estimate
available;

Because You do not have any preliminary estimate, then we


use 𝑝 = 𝑞 = 0.5 to follow for ,maximum variance . Using
𝑧0.005 =2.575 and e = 0.04, we solve for the minimum sample
size n as follows:

𝑧𝛼
2 2
𝑛 = 𝑝𝑞( )
𝑒
2.575 2
= 0.5(0.5)( )
0.04
= 1,036.04
≈ 1,037 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
b. a preliminary estimate gives 𝑝 = 0.90

 This time, we use the estimate 𝑝 = 0.90. The minimum


sample size is as follows:

𝑧𝛼
2 2
𝑛 = 𝑝𝑞( )
𝑒
2.575 2
= 0.9(0.1)( )
0.04
= 372.97
≈ 373 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
Compare the result you obtain in (a) and (b).

The result we got from (a) and (b) tell us that without any preliminary estimate
the minimum sample size must be 1,037 students. But with preliminary estimate
𝑝=0.9, the sample size should only be at least 373. therefore, you will need a
larger sample if you do not have any preliminary estimate.
Activity 1. (Choose your pair and put your answer on the
yellow pad)
 1. Communication
 Explain the difference between each of the following pairs:
a. 𝑧𝛼 𝑎𝑛𝑑 𝑡𝛼
2 2

b. Confidence interval for proportion and confidence interval for mean.


c. Point estimate and interval estimate
2. Synthesis
Make flowchart that shows the step in constructing an interval estimate for
the population mean 𝜇 and for the population proportion p.
3.Research
Conduct research on the life of the inventor of the t-distribution, W.S
Gosset. Determine the circumstances that motivated him in formulating
this distribution. Why is this distribution also referred as Student’s t-
distribution.
THANK YOU

You might also like