Statistics 1 - Sesi 12 - Sampling Distribution, Estimation and Confidence Interval
Statistics 1 - Sesi 12 - Sampling Distribution, Estimation and Confidence Interval
and
the Central Limit Theorem
STATISTICS 1
Online Business
Semester Genap 2022/2023
GOALS
• Explain why a sample is the only feasible way to learn about
a population.
• Describe methods to select a sample.
• Define and construct a sampling distribution of the sample
mean.
• Explain the central limit theorem.
• Use the Central Limit Theorem to find probabilities of
selecting possible sample means from a specified
population.
Why Sample the Population?
• The physical impossibility of checking all items in the
population.
• The cost of studying all the items in a population.
• The sample results are usually adequate.
• Contacting the whole population would often be time-
consuming.
• The destructive nature of certain tests.
Probability Sampling
• A probability sample is a sample selected such that
each item or person in the population being studied
has a known likelihood of being included in the
sample.
Methods of Probability Sampling
• Simple Random Sample: A sample formulated so that each
item or person in the population has the same chance of
being included.
• Systematic Random Sampling: The items or individuals of the
population are arranged in some order. A random starting
point is selected and then every kth member of the
population is selected for the sample.
• Stratified Random Sampling: A population is first divided into
subgroups, called strata, and a sample is selected from each
stratum.
• Cluster Sampling: A population is first divided into primary
units then samples are selected from the primary units.
Methods of Probability Sampling
• In nonprobability sample inclusion in the sample is based on
the judgment of the person selecting the sample.
• The sampling error is the difference between a sample
statistic and its corresponding population parameter.
Methods of Probability Sampling
• In a probability sample, all members of the population have a known
chance of being selected for the sample.
• There are several probability sampling methods.
• In a simple random sample, all members of the population have the
same chance of being selected for the sample.
• In a systematic sample, a random starting point is selected, and then
every kth item is selected for the sample.
• In a stratified sample, the population is divided into several groups, or
strata, and then a sample is selected from each stratum.
• In cluster sampling, the population is divided into primary units, and
then samples are drawn from the primary units.
• In nonprobability sampling, inclusion in the sample is based on the
judgement of the person conducting the sample. Nonprobability
samples may lead to biased results.
SIMPLE RANDOM SAMPLING
• Suppose a population consists of 845 employees of Nitra
Industries. A sample of 52 employeesis to be selected from
the population.
• One way of ensuring that every employee has a chance of
being chosen is to first write the name of each one on a small
slip of paper and deposit all of the slips in a box. After they
have been throughly mixed, the first selection is made by
drawing a slip out of the box without looking at it. This
process is repeated until the sample of 52 is chosen.
• A more convenient method of selecting a random sample is to
use the identification number of each employee and a table of
random numbers.
SIMPLE RANDOM SAMPLING
(cont.)
50525 57454 28455 68226 34656 38884 39018
72507 53380 53827 42486 54465 71819 91199
34986 74297 00144 38676 89967 98869 39744
68851 27305 03759 04723 96108 78489 18910
06738 62879 03910 17350 49169 03850 19801
11448 10734 05837 24397 10420 16712 94496
First, choose a starting point in the table. Any starting point will
do.
SYSTEMATIC RANDOM
SAMPLING
• Suppose the population of interest consists of 2000 invoices
located in file drawers. Drawing a simple random sample
would first require numbering the invoices from 0000 to
1999. this would be a very time-consuming task.
• Instead, a systematic random sample could be selected by
simply going through the file drawers and selecting every
20th invoice for study.
• The first invoice should be chosen using a random process. If
the 10th invoice were chosen as the starting point the sample
would consist of the 10th, 30th, 50th, 70th, ... Invoices.
• Since the first item is chosen at random, all items have the
same likelihood of being selected for the sample. Thus, it is a
probability sample.
STRATIFIED RANDOM
SAMPLING
• A population is divided into subgroups, called strata, and a
sample is selected from each stratum.
• As the name implies, a proportional sampling procedure
requires that the number of items in each stratum be in the
same proportion as in the population.
• Suppose the objective of the study is to determine whether
firms with high returns on equity (a measure of profitability)
spent more of each sales dollar on advertising than firms
with a low return.
• Assume that the 352 firms were divided into five strata. If,
50 firms are to be selected for intensive study.
STRATIFIED RANDOM
SAMPLING
Stratum Profitability Number of firms Number sampled
1 30% atau lebih 8 8/352 x 50 = 1
2 20% – 30% 35 35/352 x 50 = 5
3 10% – 20% 189 189/352 x 50 = 27
4 0 – 10% 115 115/352 x 50 = 16
5 Deficit 5 5/352 x 50 =1
Total 352 50
CLUSTER SAMPLING
• It is often employed to reduce the cost of sampling a
population scattered over a large geographic area.
• Suppose, you want to determine the views of industrialists in
Texas about state and federal environmental protection
policies.
• Selecting a random sample of industrialists in Texas and
personally contacting each one would time consuming and
very expensive. Instead, you could employ cluster sampling by
subdividing the state into small units–either countries aor
regions. These are often called primary units.
• Suppose you divided Texas into 12 primary units, then selected
at random 4 regions – 2, 7, 4, 12 and concentrated your efforts
in these primary units, continued with take a random sample
of industrialists in each of the regions and interview them.
SAMPLING “ERROR”
• It is unlikely that the mean of a sample would be identical
to the population mean. Likewise, the sample standard
deviation or other measure computed from a sample would
probably not be exactly equal to the corresponding
population value.
• Therefore some difference between a sample statistic, such
as the sample mean or sample standard deviation, and the
corresponding population parameter.
• The difference between a sample statistic and a population
parameter is called sampling error.
EXAMPLE
• Jane and Joe Miley operate the Foxtrot Inn, a bed and
breakfast in Tryon, North Carolina. There are eight rooms
available for rent at this B&B. Listed below is the number of
these eight rooms that was rented each day during June
2015.
• Find the mean of the population.
• Select three random samples of five days.
• Calculate the mean of each sample and compare it to the
population mean.
• What is the sampling error in each case?
EXAMPLE
June Rentals June Rentals June Rentals
1 0 11 3 21 3
2 2 12 4 22 2
3 3 13 4 23 3
4 2 14 4 24 6
5 3 15 7 25 0
6 4 16 0 26 4
7 2 17 5 27 1
8 3 18 3 28 1
9 4 19 6 29 3
10 7 20 2 30 3
EXAMPLE
Sample 1 Sample 2 Sample 3
4 3 0
7 3 0
4 2 3
3 3 3
1 6 3
Total 19 17 9
Sample mean 3,8 3,4 1,8
EXAMPLE
Sampling Distribution of the
Sample Means
• The sampling distribution of the sample mean is a
probability distribution consisting of all possible
sample means of a given sample size selected from
a population.
Sampling Distribution of the
Sample Means - Example
Tartus Industries has seven production employees (considered the
population). The hourly earnings of each employee are given in the table
below.
1. What is the population mean?
2. What is the sampling distribution of the sample mean for samples of
size 2?
3. What is the mean of the sampling distribution?
4. What observations can be made about the population and the sampling
distribution?
Sampling Distribution of the
Sample Means - Example
Sampling Distribution of the
Sample Means - Example
Sampling Distribution of the
Sample Means - Example
Central Limit Theorem
• For a population with a mean μ and a variance σ2 the
sampling distribution of the means of all possible samples of
size n generated from the population will be approximately
normally distributed.
• The mean of the sampling distribution equal to μ and the
variance equal to σ2/n.
Central Limit Theorem
Using the Sampling
Distribution of the Sample Mean (Sigma
Known)
• If a population follows the normal distribution, the sampling
distribution of the sample mean will also follow the normal
distribution.
• To determine the probability a sample mean falls within a
particular region, use:
Using the Sampling Distribution of the
Sample Mean (Sigma Known) - Example
• The Quality Assurance Department for Cola, Inc., maintains records
regarding the amount of cola in its Jumbo bottle. The actual amount of
cola in each bottle is critical, but varies a small amount from one bottle
to the next. Cola, Inc., does not wish to underfill the bottles. On the
other hand, it cannot overfill each bottle. Its records indicate that the
amount of cola follows the normal probability distribution. The mean
amount per bottle is 31.2 ounces and the population standard deviation
is 0.4 ounces. At 8 A.M. today the quality technician randomly selected
16 bottles from the filling line. The mean amount of cola contained in
the bottles is 31.38 ounces.
• Is this an unlikely result? Is it likely the process is putting too much soda
in the bottles? To put it another way, is the sampling error of 0.18
ounces unusual?
Using the Sampling Distribution of the
Sample Mean (Sigma Known) - Example
Step 1: Find the z-values corresponding to the sample mean
of 31.38
Using the Sampling Distribution of the
Sample Mean (Sigma Known) - Example
Step 2: Find the probability of observing a Z equal to or
greater than 1.80
Using the Sampling Distribution of the
Sample Mean (Sigma Known) - Example
• What do we conclude?
• It is unlikely, less than a 4 percent chance, we could select a
sample of 16 observations from a normal population with a
mean of 31.2 ounces and a population standard deviation of
0.4 ounces and find the sample mean equal to or greater
than 31.38 ounces.
• We conclude the process is putting too much cola in the
bottles.
Using the Sampling
Distribution of the Sample Mean (Sigma
Unknown)
• If the population does not follow the normal distribution,
but the sample is of at least 30 observations, the sample
means will follow the normal distribution.
• To determine the probability a sample mean falls within a
particular region, use:
Estimation
and
Confidence Intervals
STATISTICS 1
ONLINE BUSINESS
2021
GOALS
• Define a point estimate.
• Define level of confidence.
• Construct a confidence interval for the population mean
when the population standard deviation is known.
• Construct a confidence interval for a population mean when
the population standard deviation is unknown.
• Construct a confidence interval for a population proportion.
• Determine the sample size for attribute and variable
sampling.
Point and Interval Estimates
• A point estimate is the statistic, computed from sample
information, which is used to estimate the population
parameter.
• A confidence interval estimate is a range of values
constructed from sample data so that the population
parameter is likely to occur within that range at a specified
probability. The specified probability is called the level of
confidence.
Factors Affecting Confidence Interval
Estimates
The factors that determine the width of a confidence interval
are:
• The sample size, n.
• The variability in the population, usually σ estimated by s.
• The desired level of confidence.
Interval Estimates - Interpretation
• For a 95% confidence interval about 95% of the similarly constructed
intervals will contain the parameter being estimated.
• Also 95% of the sample means for a specified sample size will lie within
1.96 standard deviations of the hypothesized population
Characteristics of the t-distribution
• It is, like the z distribution, a continuous distribution.
• It is, like the z distribution, bell-shaped and symmetrical.
• There is not one t distribution, but rather a family of t
distributions. All t distributions have a mean of 0, but their
standard deviations differ according to the sample size, n.
• The t distribution is more spread out and flatter at the
center than the standard normal distribution.
• As the sample size increases, however, the t distribution
approaches the standard normal distribution,
Comparing the z and t Distributions
when n is small
Confidence Interval Estimates for the
Mean
• Use Z-distribution • Use t-distribution
• If the population standard • If the population standard
deviation is known or the deviation is unknown and
sample is greater than 30. the sample is less than 30.
When to Use the z or t Distribution for
Confidence Interval Computation
Confidence Interval for the Mean –
Example using the z-distribution
• A study involves selecting a random sample of 256 sales representatives
under the age of 35. One item of interest is their annual income. The
sample mean is $55.420, and the sample standard deviation is $2.050
• What is the point estimate of mean income of all the representatives
under the age of 35?
• What is the 95% confidence interval for the population mean (rounded
to the nearest $10)?
• t 0,05/2; 255 = 1.969 ~ 1.96
Confidence Interval for the Mean –
Example using the t-distribution
• A tire manufacturer wishes to investigate the tread life of its tires.
• A sample of 10 tires driven 50,000 miles revealed a sample mean of
0.32 inch of tread remaining with a standard deviation of 0.09 inch.
• Construct a 95 percent confidence interval for the population mean.
• Would it be reasonable for the manufacturer to conclude that after
50,000 miles the population mean amount of tread remaining is 0.30
inches?
• The endpoints of the confidence interval are .782 and .818. The lower
endpoint is greater than 0.75. Hence, we conclude that the merger
proposal will likely pass because the interval estimate includes values
greater than 75 percent of the union membership.
FINITE-POPULATION
CORRECTION FACTOR
• A finite population can be rather small; it could be all the
students registered for this class.
• It can also be very large, such as all senior citizens living in
Florida.
• For a finite population, where the total number of objects is
N and the size of the sample is n. This adjustment is called
the finite-population correction factor :
• The usual rule is if the ratio of n/N is less than .05, the
correction factor is ignored.
EXAMPLE
• There are 250 families in Scandia, Pennsylvania. A poll of 40 families
reveals the mean annual church contribution is $450 with a standard
deviation of $75.
• Construct a 90% confidence interval for the mean annual contribution.
• First, note that the population is finite. That is, there is a limit to the
number of people in Scandia.
• Second, note that the sample constitutes more than 5 percent of
the population; that is, n/N = 40/250 = 0.16.
• The endpoints of the confidence interval are $432.03 and $467.97.
It is likely that the population mean falls within this interval.
Finite-Population Correction
Factor
• A population that has a fixed upper bound is said to be
finite.
• For a finite population, where the total number of objects is
N and the size of the sample is n, the following adjustment
is made to the standard errors of the sample means and the
proportion:
• However, if n/N < .05, the finite-population correction factor
may be ignored.
Effects on FPC when n/N Changes
Observe that FPC approaches 1 when n/N becomes smaller
Confidence Interval Formulas for Estimating
Means and Proportions with Finite Population
Correction
• C.I for the Mean (µ)