0% found this document useful (0 votes)
17 views62 pages

MTPDF6 - Sampling Distribution and Point Estimation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views62 pages

MTPDF6 - Sampling Distribution and Point Estimation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Engineering Data Analysis

Sampling
Distributions
MPS Department | FEU Institute of Technology
Subtopic 1
OBJECTIVES

 Determine the sampling distribution of a given population


 Relate sampling distribution and the Central Limit Theorem
Subtopic 1
Sampling Distribution and
the Central Limit Theorem
 Types of Distribution
 Sampling Distribution
 Central Limit Theorem
Statistics is a Science of Inference

• Statistical Inference: On basis of sample statistics


– Predict and forecast values of
population parameters... derived from limited and
– Test hypotheses about values incomplete sample
of population parameters... information
– Make decisions...

Make On the basis of


generalizations observations of a
about the sample, a part of a
characteristics of a population
https://app.prntscr.com/en/index.html
population...
The Literary Digest Poll (1936)

Unbiased
Sample
Unbiased, representative
sample drawn at random
Democrats Republicans from the entire
Population
population.

Biased
People who have Sample Biased, unrepresentative
phones and/or cars
and/or are Digest sample drawn from
readers.
Democrats
people who have cars
Republicans
Population and/or telephones and/or
read the Digest.
Sample Statistics as Estimators of Population
Parameters
A sample statistic is a numerical
A population parameter is
measure of a summary a numerical measure of a
characteristic of summary characteristic of
a sample.
a population.

• An estimator of a population parameter is a sample statistic


used to estimate or predict the population parameter.
• An estimate of a parameter is a particular numerical value of a
sample statistic obtained through sampling.
• A point estimate is a single value used as an estimate of a
population parameter.
Estimators

• The sample mean, X , is the most common


estimator of the population mean, 
• The sample variance, s2, is the most common
estimator of the population variance, 2.
• The sample standard deviation, s, is the most
common estimator of the population standard
deviation, .
• The sample proportion, ^p, is the most common
estimator of the population proportion, p.
Population and Sample Proportions

• The population proportion is equal to the number of


elements in the population belonging to the category of
interest, divided by the total number of elements in the
population:
X
p
N
• The sample proportion is the number of elements in the
sample belonging to the category of interest, divided by the
sample size:
Examples
A Population Distribution, a Sample from a Population, and the
Population and Sample Means

Population mean ()


Frequency distribution
of the population

X X X X X X X
X X X X X X X
X X X X

Sample points

Sample mean (X )
Sampling Distributions

• The sampling distribution of a statistic is the probability distribution of


all possible values the statistic may assume, when computed from
random samples of the same size, drawn from a specified population.
• The sampling distribution of X is the probability distribution of all
possible values the random variable X may assume when a sample of
size n is taken from a specified population.
Example

Uniform population of integers from 1 to 8:


X P(X) XP(X) (X-x) (X-x)2 P(X)(X-x)2 Uniform Distribution (1,8)
0.2
1 0.125 0.125 -3.5 12.25 1.53125
2 0.125 0.250 -2.5 6.25 0.78125
3 0.125 0.375 -1.5 2.25 0.28125
4 0.125 0.500 -0.5 0.25 0.03125

P(X)
5 0.125 0.625 0.5 0.25 0.03125 0.1
6 0.125 0.750 1.5 2.25 0.28125
7 0.125 0.875 2.5 6.25 0.78125
8 0.125 1.000 3.5 12.25 1.53125
0.0
1.000 4.500 5.25000 1 2 3 4 5 6 7 8
X

E(X) =  = 4.5
V(X) = 2 = 5.25
SD(X) =  = 2.2913
• There are 8*8 = 64 different but Each of these samples has a sample
equally-likely samples of size 2 mean. For example, the mean of the
that can be drawn (with
sample (1,4) is 2.5, and the mean of
replacement) from a uniform
population of the integers from 1 the sample (8,4) is 6.
to 8:
Samples of Size 2 from Uniform (1,8) Sample Means from Uniform (1,8), n = 2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
2 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
3 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
4 4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 4 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
5 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
6 6,1 6,2 6,3 6,4 6,5 6,6 6,7 6,8 6 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 7,8 7 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
8 8,1 8,2 8,3 8,4 8,5 8,6 8,7 8,8 8 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
The probability distribution of the sample mean is called the
sampling distribution of the the sample mean.
Sampling Distribution of the Mean
Sampling Distribution of the Mean
X P(X) XP(X) X-X (X-X)2 P(X)(X-X)2
0.10
1.0 0.015625 0.015625 -3.5 12.25 0.191406
1.5 0.031250 0.046875 -3.0 9.00 0.281250

P(X)
2.0 0.046875 0.093750 -2.5 6.25 0.292969
0.05
2.5 0.062500 0.156250 -2.0 4.00 0.250000
3.0 0.078125 0.234375 -1.5 2.25 0.175781
3.5 0.093750 0.328125 -1.0 1.00 0.093750
4.0 0.109375 0.437500 -0.5 0.25 0.027344 0.00

4.5 0.125000 0.562500 0.0 0.00 0.000000 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

5.0 0.109375 0.546875 0.5 0.25 0.027344 X


5.5 0.093750 0.515625 1.0 1.00 0.093750

E ( X )   X  4.5
6.0 0.078125 0.468750 1.5 2.25 0.175781
6.5 0.062500 0.406250 2.0 4.00 0.250000
7.0
7.5
0.046875
0.031250
0.328125
0.234375
2.5
3.0
6.25
9.00
0.292969
0.281250 V ( X )   2X  2.625
8.0 0.015625 0.125000 3.5 12.25 0.191406
SD( X )   X  1.6202
1.000000 4.500000 2.625000
Properties of the Sampling Distribution of
the Sample Mean
• Comparing the population 0.2
Uniform Distribution (1,8)

distribution and the sampling


distribution of the mean:

P(X)
0.1

• The sampling distribution is


more bell-shaped and 0.0
1 2 3 4 5 6 7 8
symmetric. X

• Both have the same center. Sampling Distribution of the Mean

• The sampling distribution of


the mean is more compact, 0.10

with a smaller variance.

P(X)
0.05

0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
X
Relationships between Population Parameters and the Sampling
Distribution of the Sample Mean

The expected value of the sample mean is equal to the population mean:

E( X )    
X X

The variance of the sample mean is equal to the population variance divided by
the sample size:

 2

V(X)  2
X
 X

n
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size:

SD( X )    X
X

n
Sampling from a Normal Population
When sampling from a normal population with mean  and standard deviation , the
sample mean, X, has a normal sampling distribution:


2

X ~ N (, )
n

This means that, as the sample size Sampling Distribution of the Sample Mean
increases, the sampling distribution of
0.4
the sample mean remains centered on Sampling Distribution: n =16
the population mean, but becomes 0.3
Sampling Distribution: n =4
more compactly distributed around

f(X)
0.2

that population mean 0.1


Sampling Distribution: n =2
Normal population
Normal population
0.0


• IQ scores: population vs. sample In a large population of adults, the mean IQ is 112 with
standard deviation 20. Suppose 200 adults are randomly selected for a market research
campaign.
‡The distribution of the sample mean IQ is:
A) Exactly normal, mean 112, standard deviation 20
B) B) Approximately normal, mean 112, standard deviation 20
C) C) Approximately normal, mean 112 , standard deviation 1.414
D) D) Approximately normal, mean 112, standard deviation 0.1

Answer:
C) Approximately normal, mean 112 , standard deviation 1.414 Population distribution : N(µ =
112; σ = 20)
Sampling distribution for n = 200 is N(µ = 112; σ / √ n = 1.414)
Example

• Hypokalemia is diagnosed when blood potassium levels are below


3.5mEq/dl. Let’s assume that we know a patient whose measured
potassium levels vary daily according to a normal distribution N(µ = 3.8,
σ = 0.2). If only one measurement is made, what is the probability that
this patient will be diagnosed with Hypokalemia?

• Instead, if measurements are taken on 4 separate days, what is the


probability of a correct diagnosis?
The Central Limit Theorem

n=5
When sampling from a population 0.25

with mean  and finite standard


0.20

P(X)
0.15
0.10

deviation , the sampling 0.05


0.00
X

distribution of the sample mean will n=20


tend to a normal distribution with 0.2


mean  and standard deviation n

P(X)
0.1

as the sample size becomes large 0.0


X

(n >30). Large n
0.4
0.3

For “large enough” n: X ~ N ( ,  )

f(X)
0.2
2
0.1
0.0

-
X
The Central Limit Theorem Applies to Sampling Distributions
from Any Population
Normal Uniform Skewed General

Population

n=2

n = 30

 X  X  X  X
https://slideplayer.com/slide/8348541/
Example

Mercury makes a 2.4 liter V-6 engine, the Laser XRi, used in speedboats. The
company’s engineers believe the engine delivers an average power of 220
horsepower and that the standard deviation of power delivered is 15 HP. A
potential buyer intends to sample 100 engines (each engine is to be run a single
time). What is the probability that the sample mean will be less than 217HP?
 
 X   217   
P( X  217)  P  
   
 n n 

   
 217  220   217  220
 P Z    P Z  
 15   15 
 100   10 

 P ( Z  2)  0.0228
Student’s t Distribution
If the population standard deviation, , is unknown, replace with the sample standard
deviation, s. If the population is normal, the resulting statistic: X 
t 
s
has a t distribution with (n - 1) degrees of freedom. n

• The t is a family of bell-shaped and symmetric


distributions, one for each number of degree of
freedom.
Standard normal
• The expected value of t is 0.
t, df=20
• The variance of t is greater than 1, but t, df=10
approaches 1 as the number of degrees of
freedom increases. The t is flatter and has fatter
tails than does the standard normal.
• The t distribution approaches a standard normal 

as the number of degrees of freedom increases
The Sampling Distribution of the Sample
Proportion,
n=2, p = 0.3

The sample proportion is the percentage of 0 .5

0 .4

successes in n binomial trials. I t is the 0 .3

P(X)
0 .2

number of successes, X, divided by the 0 .1

number of trials, n.
0 .0
0 1 2

n=10,p=0.3
0.3

Sample proportion: 0.2

P(X)
0.1

0.0
0 1 2 3 4 5 6 7 8 9 10
X

n=15, p = 0.3

0.2

P(X)
p(1  p)
0.1

n 0.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X

0 1 2 3 4 5 6 7 8 9 10 1112 13 1415
15 15 15 15 15 15 15 15 15 1515 1515 15 1515 ^p
Degrees of Freedom

Consider a sample of size n=4 containing the following data points:

x1=12 x2=14 x3=16 x4=?

and for which the sample mean is: x   x  14


n
Given the values of three data points and the sample mean, the
value of the fourth data point can be determined:
 x 12  14  16  x4
x=   14
n 4
x4=18
12  14  16  x  56
4
Degrees of Freedom

If only two data points and the sample mean are known:
x  14
x1=12 x2=14 x3=? x4=?

The values of the remaining two data points cannot be uniquely


determined:

x 12  14  x  x4
3
x=   14
n 4

12  14  x  x4  56
3
Degrees of Freedom

The number of degrees of freedom is equal to the total number of measurements (these
are not always raw data points), less the total number of restrictions on the
measurements. A restriction is a quantity computed from the measurements.

The sample mean is a restriction on the sample measurements, so after calculating the
sample mean there are only (n-1) degrees of freedom remaining with which to calculate
the sample variance. The sample variance is based on only (n-1) free data points:

s2 
 (x  x) 2

(n  1)
Example

A company manager has a total budget of $150.000 to be completely allocated to


four different projects. How many degrees of freedom does the manager have?

x1 + x2 + x3 + x4 = 150,000

A fourth project’s budget can be determined from the total budget and the
individual budgets of the other three. For example, if:

x1=40,000 x2=30,000 x3=50,000


Then:

x4=150,000-40,000-30,000-50,000=30,000

So there are (n-1)=3 degrees of freedom.


Elementary Statistics by Bluman
• https://app.prntscr.com/en/index.html
• https://slideplayer.com/slide/8348541/
Engineering Data Analysis

General Concept of
Point Estimation
MPS Department | FEU Institute of Technology
Subtopic 2
OBJECTIVES

 Apply classical methods in estimating parameters


 Define an unbiased estimator and identify common unbiased estimators
 Compute the standard error of a sample
 Compute the mean squared-error of an estimator
Subtopic 2
General Concept of
Point Estimation
 Unbiased Estimator
 Variance of a Point Estimator
 Standard Error
 Mean Squared Error of an Estimator
Estimators and Their Properties

• Desirable properties of estimators include:


• Unbiasedness
• Efficiency
• Consistency
• Sufficiency
Unbiasedness
An estimator is said to be unbiased if its expected value is equal to the
population parameter it estimates.

For example, E(X)=so the sample mean is an unbiased estimator of the


population mean. Unbiasedness is an average or long-run property. The
mean of any single sample will probably not equal the population mean,
but the average of the means of repeated independent samples from a
population will equal the population mean.

Any systematic deviation of the estimator from the population parameter


of interest is called a bias.
Unbiased and Biased Estimators

{
Bias

An unbiased estimator is on target on A biased estimator is off target


average. on average.
Efficiency
An estimator is efficient if it has a relatively small variance (and standard deviation).

An efficient estimator is, on average, An inefficient estimator is, on average,


closer to the parameter being farther from the parameter being estimated.
estimated..
Consistency and Sufficiency
An estimator is said to be consistent if its probability of being close to the
parameter it estimates increases as the sample size increases.

Consistency

n = 10 n = 100

An estimator is said to be sufficient if it contains all the information in the data


about the parameter it estimates.
Properties of the Sample Mean
For a normal population, both the sample mean and sample median are
unbiased estimators of the population mean, but the sample mean is
both more efficient (because it has a smaller variance), and sufficient.
Every observation in the sample is used in the calculation of the sample
mean, but only the middle value is used to find the sample median.

In general, the sample mean is the best estimator of the population


mean. The sample mean is the most efficient unbiased estimator of the
population mean. It is also a consistent estimator.
Properties of the Sample Variance
The sample variance (the sum of the squared deviations from the
sample mean divided by (n-1)) is an unbiased estimator of the
population variance. In contrast, the average squared deviation
from the sample mean is a biased (though consistent) estimator of
the population variance.
Degrees of Freedom
Consider a sample of size n=4 containing the following data points:

x1=12 x2=14 x3=16 x4=?

and for which the sample mean is: x 


x  14
n
Given the values of three data points and the sample mean, the value of the fourth data
point can be determined:
12 14 16  x
x = x  4  14
n 4 x4=18
12 14 16  x  56
4
Degrees of Freedom
If only two data points and the sample mean are known:

x1=12 x2=14 x3=? x4=?


x  14

The values of the remaining two data points cannot be uniquely determined:

 x 12 14  x  x
x=  3 4 14
n 4

12 14  x  x  56
3 4
Degrees of Freedom
The number of degrees of freedom is equal to the total number of
measurements (these are not always raw data points), less the total number of
restrictions on the measurements. A restriction is a quantity computed from the
measurements.

The sample mean is a restriction on the sample measurements, so after


calculating the sample mean there are only (n-1) degrees of freedom remaining
with which to calculate the sample variance. The sample variance is based on
only (n-1) free data points:
å (x - x)2
s2 =
( n - 1)
Example
A company manager has a total budget of $150.000 to be completely allocated to four different
projects. How many degrees of freedom does the manager have?

x1 + x2 + x3 + x4 = 150,000

A fourth project’s budget can be determined from the total budget and the individual budgets of
the other three. For example, if:

x1=40,000 x2=30,000 x3=50,000


Then:

x4=150,000-40,000-30,000-50,000=30,000

So there are (n-1)=3 degrees of freedom.


• The field of statistical inference consists of those methods used to make
decisions or to draw conclusions about a population.
• These methods utilize the information contained in a sample from the
population in drawing conclusions.
• Statistical inference may be divided into two major areas:
• Parameter estimation: point and interval
• Hypothesis testing
Definition
General Concepts of Point Estimation

Unbiased Estimators
Definition
General Concepts of Point Estimation
Example
General Concepts of Point Estimation
Example (continued)
General Concepts of Point Estimation
Variance of a Point Estimator
Definition

The sampling distributions of two


unbiased estimators
ˆ and 
 ˆ .
1 2
General Concepts of Point Estimation

Variance of a Point Estimator


General Concepts of Point Estimation
Standard Error: Reporting a Point Estimate
Definition
General Concepts of Point Estimation

Standard Error: Reporting a Point Estimate


General Concepts of Point Estimation

Example
General Concepts of Point Estimation

Example (continued)
General Concepts of Point Estimation

Mean Squared Error of an Estimator


Definition
General Concepts of Point Estimation

Mean Square Error of an Estimator


General Concepts of Point Estimation
Mean Square Error of an Estimator

ˆ .
A biased estimator that has smaller ̂1variance than the unbiased estimator  2
Elementary Statistics by Bluman

You might also like