0% found this document useful (0 votes)
122 views20 pages

DOM105 2024 Session 12

The document discusses population parameters and sample statistics, different types of sampling methods, sampling distributions, point and interval estimates, and how to determine sample sizes. It provides information on key statistical concepts related to sampling and making inferences about populations from samples.

Uploaded by

nr871
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views20 pages

DOM105 2024 Session 12

The document discusses population parameters and sample statistics, different types of sampling methods, sampling distributions, point and interval estimates, and how to determine sample sizes. It provides information on key statistical concepts related to sampling and making inferences about populations from samples.

Uploaded by

nr871
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

DOM105

Session 12
Population parameters and sample statistics
Population refers to all items that have been chosen for
study.
A parameter is the characteristic of a population. Eg: 127cm
is the mean height of all students in a secondary school, and
is a parameter of the population “all students in the school”
Sample is a portion chosen from the population.
A statistic is the characteristic of a sample. Eg: 129cm is the
mean height of 100 random students from a secondary
school, and is a statistic of the sample of 100 students.
Types of Sampling
Simple Random Sampling With/Without Replacement

Systematic Sampling

Stratified Sampling

Cluster Sampling
In systematic sampling, elements are selected from the population in
uniform intervals, such as after a certain time, or in a certain order.
Systematic sampling inappropriate when elements have sequential patterns.
In stratified sampling, the population divided into homogenous strata, then
each strata sampled in proportion to their size (or sampled equally and
weighted in proportion to their size)
In cluster sampling, population divided into clusters, each of which are
representative of the population, and a random selection of clusters chosen
for sampling.
A strata has low variance but differs greatly from other strata, clusters have
high variance but are similar to each other.
Sampling Distributions
A probability distribution of all possible means of the samples taken from
a larger population is a distribution of the sample means.
You can also have distributions of the sample median, the sample
proportion, or some other statistic of the sample.
The standard deviation of the distribution of a sample statistic is called the
standard error of that sample statistic, e.g., the standard error of the
sample mean.
Sampling distributions are used to make samples cost-effective by finding
the best balance between reducing errors in the sample and reducing cost.
The standard error (S.E.) is a measure of how theoretical sample means
deviate from the population mean. The larger the sample, the more likely
it is that any sample mean is close to the population mean, and so smaller
the S.E. In other words, the S.E. is the standard deviation of the sample
mean.

This is different from sample standard deviation S, which is the measure


of how individual data points deviate from the mean of that single sample.
Sampling Distribution of the proportion

Applicable in the case of categorical variable with only 2


categories
Sample proportion
Population proportion
Standard error of proportion
Z transformation, Z=
Central Limit Theorem

As sample size gets large enough, sampling distribution of the mean
is approximately normally distributed even if population has non-
normal distribution.
 Generally n>30 sufficiently large.
Distribution of sample mean is still normal, but only approximately
Approximation is better and better as n becomes larger and larger
Always true regardless of the underlying distribution from which
sampling is done
Point and Interval Estimates
Estimation refers to the calculation of unknown population
parameters such as mean and standard deviation from a study of the
sample.

Point estimate: A single statistic value that is the “best guess” for
the parameter value, such as population mean

Interval estimate: An interval of numbers around the point estimate,


that has a fixed “confidence level” of containing the parameter value.
Called a confidence interval.
Point Estimators – Most common to use sample values

Sample mean estimates population mean m


ˆ  y   y i

n
• Sample std. dev. estimates population std. dev. s

ˆ  s   i
( y  y ) 2

n 1

• Sample proportion ˆ estimates population


proportion 
Confidence Interval for the Mean
In large random samples, the sample mean has
approximately a normal sampling distribution with mean

m and standarderror
y
n

Thus, is the confidence interval of probability


α=(1-confidence level)

P (   1.96 y  y    1.96 y )  .95

• We can be 95% confident that the sample mean lies


within 1.96 standard errors of the (unknown) population
mean
If sample mean is 100, sample size is 50, pop. Stdev is 30, what is 90%
confidence interval if pop mean?
Confidence interval: , =30, =1-0.9=0.1
, interval is
Confidence Interval (small sample size)
The t distribution (table pg. 696)
 Bell-shaped, symmetric about 0
 Standard deviation a bit larger than 1 (slightly thicker tails than standard normal
distribution, which has mean = 0, standard deviation = 1)
 Precise shape depends on degrees of freedom (df). For inference about mean,
df = n-1
 Gets narrower and more closely resembles standard normal distribution as df
increases
(nearly identical when df > 30)
 α is upper tail area = 1 – confidence %
 CI for mean has margin of error t(se), interval estimate
 Excel command: t.inv(p,df)
If sample mean is 100, sample size is 50, sample Stdev is 30, what is 90%
confidence interval if pop mean?
Pop. Stdev unknown, use t-table, Df=49
=t.inv.2t(0.1,49)=1.676
Interval:
Confidence Interval for a Proportion
ˆ
 Recall that the sample proportion is a mean when we let
y = 1 for observation in category of interest, y = 0 otherwise

 Recall the population proportion is mean µ of prob. dist having

P (1)   and P(0)  1  


 The standard deviation of this probability distribution is

   (1   ) (e.g., 0.50 when   0.50)

 The standard error of the sample proportion is

 ˆ   / n   (1   ) / n
Proportions continued
Thus, confidence interval for proportion is

If out of 100 people, 62 say they prefer Coke to Pepsi, what is the 99%
confidence interval?
p=62/100=0.62
Interval:
Determining Sample Size
The sampling error for mean is Err.
The sample size is

The sampling error for proportion is Err.=


Sample size:
If in a survey it is know roughly 60% of the pop. prefers Coke to Pepsi,
then what is the sample size to ensure that the sample proportion is within
12.5% of the population prop., 99.9% of the time.
,

Min. sample size: 167


20 sampled households in a locality have average annual income of 7.2 lakh,
with std dev 80,000. What is the 99% confidence interval for the population
average annual income.
Df = n-1 = 19, α=1 – 0.99 = 0.01

The std.dev. For the average overtime of all workers in a company is 26min.
The average error is 18. How many workers must be sampled to get a 90%
confidence interval for the average overtime?
If 6 out of 20 sampled workers are working overtime, what is the 95%
confidence interval for the proportion of the company’s workers that are
working overtime.
Conf Int =

Conf Int =

You might also like