DOM105
Session 12
Population parameters and sample statistics
Population refers to all items that have been chosen for
study.
A parameter is the characteristic of a population. Eg: 127cm
is the mean height of all students in a secondary school, and
is a parameter of the population “all students in the school”
Sample is a portion chosen from the population.
A statistic is the characteristic of a sample. Eg: 129cm is the
mean height of 100 random students from a secondary
school, and is a statistic of the sample of 100 students.
Types of Sampling
Simple Random Sampling With/Without Replacement
Systematic Sampling
Stratified Sampling
Cluster Sampling
In systematic sampling, elements are selected from the population in
uniform intervals, such as after a certain time, or in a certain order.
Systematic sampling inappropriate when elements have sequential patterns.
In stratified sampling, the population divided into homogenous strata, then
each strata sampled in proportion to their size (or sampled equally and
weighted in proportion to their size)
In cluster sampling, population divided into clusters, each of which are
representative of the population, and a random selection of clusters chosen
for sampling.
A strata has low variance but differs greatly from other strata, clusters have
high variance but are similar to each other.
Sampling Distributions
A probability distribution of all possible means of the samples taken from
a larger population is a distribution of the sample means.
You can also have distributions of the sample median, the sample
proportion, or some other statistic of the sample.
The standard deviation of the distribution of a sample statistic is called the
standard error of that sample statistic, e.g., the standard error of the
sample mean.
Sampling distributions are used to make samples cost-effective by finding
the best balance between reducing errors in the sample and reducing cost.
The standard error (S.E.) is a measure of how theoretical sample means
deviate from the population mean. The larger the sample, the more likely
it is that any sample mean is close to the population mean, and so smaller
the S.E. In other words, the S.E. is the standard deviation of the sample
mean.
This is different from sample standard deviation S, which is the measure
of how individual data points deviate from the mean of that single sample.
Sampling Distribution of the proportion
Applicable in the case of categorical variable with only 2
categories
Sample proportion
Population proportion
Standard error of proportion
Z transformation, Z=
Central Limit Theorem
As sample size gets large enough, sampling distribution of the mean
is approximately normally distributed even if population has non-
normal distribution.
Generally n>30 sufficiently large.
Distribution of sample mean is still normal, but only approximately
Approximation is better and better as n becomes larger and larger
Always true regardless of the underlying distribution from which
sampling is done
Point and Interval Estimates
Estimation refers to the calculation of unknown population
parameters such as mean and standard deviation from a study of the
sample.
Point estimate: A single statistic value that is the “best guess” for
the parameter value, such as population mean
Interval estimate: An interval of numbers around the point estimate,
that has a fixed “confidence level” of containing the parameter value.
Called a confidence interval.
Point Estimators – Most common to use sample values
Sample mean estimates population mean m
ˆ y y i
n
• Sample std. dev. estimates population std. dev. s
ˆ s i
( y y ) 2
n 1
• Sample proportion ˆ estimates population
proportion
Confidence Interval for the Mean
In large random samples, the sample mean has
approximately a normal sampling distribution with mean
m and standarderror
y
n
Thus, is the confidence interval of probability
α=(1-confidence level)
P ( 1.96 y y 1.96 y ) .95
• We can be 95% confident that the sample mean lies
within 1.96 standard errors of the (unknown) population
mean
If sample mean is 100, sample size is 50, pop. Stdev is 30, what is 90%
confidence interval if pop mean?
Confidence interval: , =30, =1-0.9=0.1
, interval is
Confidence Interval (small sample size)
The t distribution (table pg. 696)
Bell-shaped, symmetric about 0
Standard deviation a bit larger than 1 (slightly thicker tails than standard normal
distribution, which has mean = 0, standard deviation = 1)
Precise shape depends on degrees of freedom (df). For inference about mean,
df = n-1
Gets narrower and more closely resembles standard normal distribution as df
increases
(nearly identical when df > 30)
α is upper tail area = 1 – confidence %
CI for mean has margin of error t(se), interval estimate
Excel command: t.inv(p,df)
If sample mean is 100, sample size is 50, sample Stdev is 30, what is 90%
confidence interval if pop mean?
Pop. Stdev unknown, use t-table, Df=49
=t.inv.2t(0.1,49)=1.676
Interval:
Confidence Interval for a Proportion
ˆ
Recall that the sample proportion is a mean when we let
y = 1 for observation in category of interest, y = 0 otherwise
Recall the population proportion is mean µ of prob. dist having
P (1) and P(0) 1
The standard deviation of this probability distribution is
(1 ) (e.g., 0.50 when 0.50)
The standard error of the sample proportion is
ˆ / n (1 ) / n
Proportions continued
Thus, confidence interval for proportion is
If out of 100 people, 62 say they prefer Coke to Pepsi, what is the 99%
confidence interval?
p=62/100=0.62
Interval:
Determining Sample Size
The sampling error for mean is Err.
The sample size is
The sampling error for proportion is Err.=
Sample size:
If in a survey it is know roughly 60% of the pop. prefers Coke to Pepsi,
then what is the sample size to ensure that the sample proportion is within
12.5% of the population prop., 99.9% of the time.
,
Min. sample size: 167
20 sampled households in a locality have average annual income of 7.2 lakh,
with std dev 80,000. What is the 99% confidence interval for the population
average annual income.
Df = n-1 = 19, α=1 – 0.99 = 0.01
The std.dev. For the average overtime of all workers in a company is 26min.
The average error is 18. How many workers must be sampled to get a 90%
confidence interval for the average overtime?
If 6 out of 20 sampled workers are working overtime, what is the 95%
confidence interval for the proportion of the company’s workers that are
working overtime.
Conf Int =
Conf Int =