0% found this document useful (0 votes)
37 views39 pages

Chapter 5 Data Analysis 2018

Uploaded by

Kamil Ibra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views39 pages

Chapter 5 Data Analysis 2018

Uploaded by

Kamil Ibra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

STATISTIC AND DATA

ANALYSIS
Faculty of Engineering-Semester5
Teacher: Elsy Wehbe
2017-2018
Chapter 5: Sampling and Sampling Distribution

Learning Objectives

In this chapter, you learn:


The concept of the sampling distribution
• To compute probabilities related to the sample mean
and the sample proportion
• The importance of the Central Limit Theorem
DCOVA
Why Sample?

• Selecting a sample is less time-consuming


& less costly than selecting every item in
the population (census).

• An analysis of a sample is less cumbersome


and more practical than an analysis of the
entire population.
A Sampling Process Begins With A Sampling DCOVA
Frame

• The sampling frame is a listing of items that make up the


population
• Frames are data sources such as population lists, directories, or
maps
• Inaccurate or biased results can result if a frame excludes certain
portions of the population
• Using different frames to generate data can lead to dissimilar
conclusions
DCOVA
Types of Samples
Samples

Non-Probability Probability Samples


Samples

Simple Stratified
Judgment Convenience Random

Systematic Cluster
Evaluating Survey Worthiness DCOVA

• What is the purpose of the survey?


• Is the survey based on a probability sample?
• Coverage error – appropriate frame?
• Nonresponse error – follow up
• Measurement error – good questions elicit good responses
• Sampling error – always exists

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


DCOVA
Types of Survey Errors

• Coverage error or selection bias


• Exists if some groups are excluded from the frame and have no chance of being
selected
• Nonresponse error or bias
• People who do not respond may be different from those who do respond

• Sampling error
• Variation from sample to sample will always exist

• Measurement error
• Due to weaknesses in question design, respondent error, and interviewer’s effects on
the respondent
DCOVA
Types of Survey Errors (continued)

Excluded from
frame
• Coverage error

Follow up on
• Nonresponse error nonresponses

• Sampling error Random differences


from sample to sample

Bad or leading question


• Measurement error
Sampling Distributions DCOVA

• A sampling distribution is a distribution of all of the possible values of


a sample statistic for a given size sample selected from a population.

• For example, suppose you sample 50 students from your college


regarding their mean GPA. If you obtained many different samples of
50, you will compute a different mean for each sample. We are
interested in the distribution of the mean GPA from all possible
samples of 50 students.
Developing a DCOVA
Sampling Distribution
D
• Assume there is a population … A B C

• Population size N=4


• Random variable, X,
is age of individuals
• Values of X: 18, 20,
22, 24 (years)
Developing a
Sampling Distribution (continued)
DCOVA
Summary Measures for the Population Distribution:

μ
 X i
P(x)
N .3
18  20  22  24 .2
  21
4 .1

σ
(X i  μ) 2

 2.236
18
A B
20
C
22
D
24 x
N
Uniform Distribution
Developing a
Sampling Distribution (continued)
DCOVA

Now consider all possible samples of size n=2


16 Sample Means
1st
2 Observation
nd

Obs
18 20 22 24
1st 2nd Observation
18 18,18 18,20 18,22 18,24 Obs 18 20 22 24
20 20,18 20,20 20,22 20,24 18 18 19 20 21
22 22,18 22,20 22,22 22,24
20 19 20 21 22
24 24,18 24,20 24,22 24,24
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24
Developing a DCOVA

Sampling Distribution (continued)

Sampling Distribution of All Sample Means

16 Sample Means Sample Means


Distribution
1st 2nd Observation _
P(X)
Obs 18 20 22 24
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
Developing a DCOVA

Sampling Distribution (continued)

Summary Measures of this Sampling Distribution:


18  19  19    24
μX   21
16

(18 - 21) 2  (19 - 21) 2    (24 - 21) 2


σX   1.58
16

Note: Here we divide by 16 because there are 16


different samples of size 2.
Comparing the Population Distribution
to the Sample Means Distribution DCOVA

Population Sample Means Distribution


N=4 n=2
μ  21 σ  2.236 μX  21 σ X  1.58
_
P(X) P(X)
.3 .3

.2 .2

.1 .1

0
18 20 22 24 X
0
18 19 20 21 22 23 24
_
X
A B C D
Sampling Distribution of The Mean:
DCOVA
Standard Error of the Mean
• Different samples of the same size from the same population will
yield different sample means
• A measure of the variability in the mean from sample to sample
is given by the Standard Error of the Mean:

σ
σX 
n
• Note that the standard error of the mean decreases as the
sample size increases
Sampling Distribution of The Mean:
DCOVA
If the Population is Normal

• If a population is normal with mean μ and standard deviation σ, the


sampling distribution of X is also normally distributed with

σ
and μX  μ σX 
n

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


Z-value for Sampling Distribution
of the Mean DCOVA

• Z-value for the sampling distribution of


X:
(X  μ X ) (X  μ)
Z 
σX σ
n

Where: X = sample mean


μ = population mean
σ = population standard deviation
n = sample size
Sampling Distribution Properties DCOVA

Normal Population


μx  μ Distribution

μ x
(i.e.
x is unbiased ) Normal Sampling
Distribution
(has the same mean)

μx
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
x
Determining An Interval Including A Fixed
DCOVA
Proportion of the Sample Means

Find a symmetrically distributed interval around µ that will include 95% of


the sample means when µ = 368, σ = 15, and n = 25.

• Since the interval contains 95% of the sample means 5% of the


sample means will be outside the interval.

• Since the interval is symmetric 2.5% will be above the upper limit
and 2.5% will be below the lower limit.

• From the standardized normal table, the Z score with 2.5% (0.0250)
below it is -1.96 and the Z score with 2.5% (0.0250) above it is 1.96.
Determining An Interval Including A Fixed
(continued)
Proportion of the Sample Means
DCOVA

• Calculating the lower limit of the interval


 15
XL    Z  368  (1.96)  362.12
n 25
• Calculating the upper limit of the interval

σ 15
XU  μ  Z  368  (1.96)  373.88
n 25

• 95% of all sample means of sample size 25 are between 362.12 and 373.88
Sampling Distribution of The Mean:
If the Population is not Normal DCOVA

• We can apply the Central Limit Theorem:


• Even if the population is not normal,
• …sample means from the population will be approximately
normal as long as the sample size is large enough.

Properties of the sampling distribution:

σ
μx  μ σx 
and n
Sample Mean Sampling Distribution:
(continued)
If the Population is not Normal
Population DCOVA
Sampling distribution Distribution
properties:

Central Tendency
μx  μ
μ x
Variatio Sampling Distribution
σ (becomes normal as n increases)
n
x σ  Larger
n Smaller sample
size
sample
size

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall

μx x
How Large is Large Enough? DCOVA

• For most distributions, n ≥ 30 will give a sampling


distribution that is nearly normal
• For fairly symmetric distributions, n ≥ 15
• For normal population distributions, the sampling
distribution of the mean is always normally
distributed

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


10 min Quiz DCOVA

• Suppose a population has mean μ = 8 and standard


deviation σ = 3. Suppose a random sample of size n =
36 is selected.

• What is the probability that the sample mean is


between 7.8 and 8.2?

Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall


Example (continued)
Solution: DCOVA

• Even if the population is not normally distributed, the


central limit theorem can be used (n ≥ 30)
• … so the sampling distribution of
normal
X is approximately
• … with mean μx = 8
• …and standard deviation σ 3
σx    0.5
n 36
Example (continued)
Solution (continued): DCOVA

 
 7.8 - 8 X -μ 8.2 - 8 
P(7.8  X  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  Z  0.4)  0.6554 - 0.3446  0.3108

Population Sampling Standard Normal


Distributio Distribution Distribution
n ???
? ??
? ? Sample Standardize
? ? ?
?
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
7.8 8.2 -0.4 0.4
μ8 X μX  8 x μz  0 Z
Population Proportions DCOVA

π = the proportion of the population having


a characteristic of interest
• Sample proportion (p) provides an estimate of π:

X number of items in the sample having the characteristic of interest


p 
n sample size
• 0≤ p≤1
• p is approximately distributed as a normal distribution when n is
large
(assuming sampling with replacement from a finite population or without
replacement from an infinite population)
Sampling Distribution of p DCOVA

• Approximated by a Sampling
P( ps)
normal distribution if: Distribution
.3
• n  5 .2
.1
and 0
0 .2 .4 .6 8 p
n(1   )  5 1

where
π (1  π )
and μp  π σp 
n
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall
(where π = population
proportion)
Z-Value for Proportions DCOVA

Standardize p to a Z value with the


formula:

p  p 
Z 
σp  (1   )
n
Example DCOVA

• If the true proportion of voters who support Proposition


A is π = 0.4, what is the probability that a sample of
size 200 yields a sample proportion between 0.40 and
0.45?

 i.e.: if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?
Example (continued)
DCOVA
• if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?

 (1   ) 0.4(1  0.4)
Find σ:p σp    0.03464
n 200

Convert to  0.40  0.40 0.45  0.40 


standardized P(0.40  p  0.45)  P Z  
normal:  0.03464 0.03464 
 P(0  Z  1.44)
Example (continued)
DCOVA
• if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
Utilize the cumulative normal table:
P(0 ≤ Z ≤ 1.44) = 0.9251 – 0.5000 = 0.4251

Standardized
Sampling Distribution Normal Distribution

0.4251

Standardize

0.40 0.45 0 1.44


p Z
Example: binge drinking by
college students
• Study by Harvard School of Public Health: 44% of college
students binge drink.
• 244 college students surveyed; 36% admitted to binge
drinking in the past week
• Assume the value 0.44 given in the study is the
proportion p of college students that binge drink; that is
0.44 is the population proportion p
• Compute the probability that in a sample of 244
students, 36% or less have engaged in binge drinking.
Example: binge drinking by
college students (cont.)
• Let p be the proportion in a sample of 244
that engage in binge drinking.
• We want to compute P ( pˆ  .36)

pq .44 *.56
• E(p) = p = .44; SD(p) = n

244
 .032

• Since np = 244*.44 = 107.36 and nq = 244*.56


= 136.64 are both greater than 10, we can
model the sampling distribution of p with a
normal distribution, so …
Example: binge drinking by
college students (cont.)

pˆ ~ N (.44,.032)

 pˆ  .44 .36  .44 


So P ( pˆ  .36)  P   
 .032 .032 
 P ( z  2.5)  .0062
CENTRAL LIMIT THEOREM
Chapter Summary

• Examined survey worthiness and types of survey


errors
• Described the sampling distribution of the mean
• For normal populations
• Using the Central Limit Theorem
• Described the sampling distribution of a proportion
• Calculated probabilities using sampling distributions

You might also like