Module 2 - Sample - Afterclass
Module 2 - Sample - Afterclass
1
Announcement
1. Thank you for your feedback. Your feedbacks are helpful for
me to improve this course.
6
2. Homework 2 due on Oct 5, ⑲ 10 am. Cover decision analysis,
statistics.
3. Hw1 solution has been released.
4. Groups have been formed.
5. Attendance and Participation
• Number of participations >= 3 & Number of unexcused absence <= 2,
the score will be greater than 3.5 (5).
-
o Please write your name and UID and submit the sticky notes at the end of
the class.
• More participation - bonus.
Objectives
1. Introduction to Statistics: Sample versus Population
3
Introduction to Statistics
Population
Sample
4
Introduction to Statistics
Population
Sample
Size N Size n
6
Introduction to Statistics
1. What is !?
2. What is "?
3. (?
What is the sample mean )
7
Introduction to Statistics
5.64
3. (?
What is the sample mean )
8
Notation: Population Versus Sample
Population (Size N) Sample (Size n)
The mean ! is the true average (mean) The sample mean #" is the average based
over all N. only on the sample n.
9
Today’s Objectives
1. Introduction to Statistics: Sample versus Population
10
Definitions
§ The sampling distribution of a statistic is the distribution of that statistic
across an arbitrarily large number of samples.
Population
&
Sample
11
Constructing our own Sampling Distribution
Let X be a discrete random variable giving the outcome when rolling a single
fair die. Below gives a table and graph of the pmf of X.
P(X=x)
X P(X = x)
0.18
-1 1/6 0.16
0.14
2 1/6 0.12
3 1/6 0.1
0.08
4 1/6 0.06
0.04
5 1/6 0.02
6 1/6 0
1 2 3 4 5 6
12
Constructing our own Sampling Distribution
Let X be a discrete random variable giving the outcome when rolling a single
fair die. Below gives a table and graph of the pmf of X.
P(X=x)
X P(X = x)
0.18
1 1/6 0.16
0.14
2 1/6 0.12
3 1/6 0.1
0.08
4 1/6 0.06
0.04
5 1/6 0.02
6 1/6 0
1 2 3 4 5 6
13
Constructing our own Sampling Distribution
X = Outcome of Rolling a Die
Step 1: Google “Roll Dice”, and click on the result (RANDOM.ORG)
Step 2: Roll a single die 15 times, record each of the options in a
single column in excel.
Step 3: Calculate the sample mean of the n=15 sample ($̅ ). Write the
value of $̅ in the blank below.
Step 4: Share the value of $̅ with the class
14
Constructing our own Sampling Distribution
What do we observe?
-
Population Distribution
of X
Sampling Distribution
of /.
Shape
Mean
15
Constructing our own Sampling Distribution
What do we observe?
16
Constructing our own Sampling Distribution
17
Today’s Objectives
1. Introduction to Statistics: Sample versus Population
18
Random Samples
– The random variables X1, X2, …, Xn are a random sample of n observations if
q
q
-
The random variables are independent, and
The random variables have the same distribution.
– We will say that “X1, X2, …, Xn are iid” where “iid” is short for “independent and
identically distributed” or random sample
-
– Random samples are a critical assumption for situations where we are interested
in making statements about the population and not merely describing the sample
of observations
q Scientific studies use random samples or more generally “probability samples”
q Many business applications do not need or have random samples
• Example: user generated content such as product reviews
19
Conceptual Model for Random Samples (not required)
1. Obtain a sampling frame that lists all N members of the population
--
2. Use a randomization device (e.g. uniform random numbers or =rand() in
EXCEL) to randomly select n of the N members of the population
§ Examples
– The U.S. Bureau of Labor Statistics conducts a monthly survey of about 60,000
households that are randomly sampled from all households in the U.S. to
estimate unemployment rates
q https://www.bls.gov/cps/cps_htgm.htm
– Polling firms randomly sample from voter registration rolls to estimate
preferences for candidates
– Marketing firms use random digit dialing of phone numbers
q They hope this mimics sampling from a sampling frame
20
N(Mx ,)
Normal Population Distribution
If the distribution of X is normal with mean *" and standard deviation ,"
then
- -
0) * of
04 %$ ~ ()*+,- (!) , ) -
Standard
N( ,
9 1 In deviation
/
-- .
Standard Error
• The standard error captures how “off” the sample statistics are from the true population
value.
• *The standard error decreases as n increases.*
21
Normal Population Distribution
10
M=35 0x
=
22
Normal Population Distribution: Increasing the Sample Size
X is a normally distributed random variable that captures “Waiting Time”
-
!
Sample size n = 5 X ~ Normal (35, 10^2)
!#= 35, "#= 10
XM (35 , (5)2
- Sample size n = 30
23
X ~ Normal (35, 10^2)
!#= 35, "#= 10 x-NBs, (E)
Normal Population Distribution: Increasing the Sample Size
X is a normally distributed random variable that captures “Waiting Time”
Sample size n = 30
X ~ Normal (35, 10^2) 10 !
"! ~ Normal (35, )
24 !#= 35, "#= 10 34
Mini Dooper Example
Mini Dooper, a car manufacturer, hires a marketing firm to conduct a
customer satisfaction survey of a randomly selected sample of n=25 US
customers. One of the survey questions is “What price did you pay for your
recently purchased mini?” Mini Dooper knows that the selling price for all of
-
the cars sold in the US is normally distributed with a mean of $27,500, with a
standard deviation of $7,500.
N(27500 , )
X -
N(27500 ,
F 15002) =
25
Mini Dooper Example
3. What is the standard deviation in selling price?
7500
4. What is the standard error of the average selling price?
e 21500
5. What is the probability a random customer’s purchase is greater than
$20,000?
P(X > 20000) =
26
Mini Dooper Example
Mini Dooper, a car manufacturer, hires a marketing firm to conduct a
customer satisfaction survey of a randomly selected sample of n=25 US
customers. One of the survey questions is “What price did you pay for your
recently purchased mini?” Mini Dooper knows that the selling price for all of
the cars sold in the US is normally distributed with a mean of $27,500, with a
standard deviation of $7,500.
27
1 P(X)200007
-
!
6(/ > 20000)
6 / > 20000 = 1 − 6 / ≤ 20000
= 1 − 5?@A. CD-E 20000,27500,7500,1 = 0.84
6. What is the probability the sample mean is greater than $20,000?
20000 27500
5
-
-=
-
28 <1-0
:
8003-0 .
99 P(2
---
= 5) = 4(2 = 3 99) 0 00003
-
-
.
=
& .
.
The Central Limit Theorem
The Central Limit Theorem (or CLT) also states:
No matter the distribution of X, as long as the sample n is “large enough”
-
then:
0) *
%$ ~ ()*+,- (!+ , )
1
A
What is “large enough”?
• Approximately need at least n = 30.
-
29
Unknown Distribution of X: Sample Size Matters!
X is a random variable that captures “Hourly Ad Revenue on Facebook”
X has unknown distribution but known mean and standard deviation
I
Sample size n = 5
X ~ unknown
distribution
!#= $7.35, "#= $3.34
- Sample size n = 30
X ~ unknown
distribution
x Nix
,))
32 !#= $7.35, "#= $3.34
Unknown Distribution of X: Sample Size Matters!
X is a random variable that captures “Hourly Ad Revenue on Facebook”
X has unknown distribution but known mean and standard deviation
Sample size n = 5
X ~ unknown Central Limit Theorem can’t
guarantee that "! is normally
distribution distributed
!#= $7.35, "#= $3.34
Sample size n = 30 !
X ~ unknown 3.34
"! ~ Normal (7.35, )
34
distribution By Central Limit Theorem
33 !#= $7.35, "#= $3.34
Risky Insurance Example
Based on market research, Risky Insurance knows that over all its homeowner’s
(HO) claims, the mean claim amount is $3,016 with a standard deviation of $227.
The distribution of claim amounts is unknown. They conduct a random survey of
⑧
homeowner claims (n = 100).
X-N13016 ,
((7) =27
2. What is the probability the sample mean of HO claims will be less than
see)
$3000?
P(X 3000) P(R <
0
24
=
= :
.
34
Risky Insurance Example
3. What is the probability the mean of the random sample will be within
$10 of the population mean HO claim amount?
35
Risky Insurance Example
Based on market research, Risky Insurance knows that over all its homeowner’s
(HO) claims, the mean claim amount is $3,016 with a standard deviation of $227.
The distribution of claim amounts is unknown. They conduct a random survey of
homeowner claims (n = 100).
-
. sample mean of HO claims
/=
. / ~Normal(3016, (227/√100)^2) by CLT
&$3000?
2. What is the probability the sample mean of HO claims will be less than
&
"
P(2 =
z)
36
Risky Insurance Example
3. What is the probability the mean of the random sample will be within
$10 of the population mean HO claim amount?
Population mean is $3,016, so we need to find: 6(3006 < /. < 3026)
p =
6 3006 < /. < 3026
-"
= 5?@A. CD-E 3026,3016, 22.7,1 − 5?@A. CD-E 3006, 3016,22.7,1
= 0.34
A
4. What is the probability a random sample of 400 customer claims will
have a sample mean withinSD
P2
=
&
$2 of the population mean? P(30064(3026)
6 3014 < /. < 3018 , where /. ~Normal(3016, (227/√400)^2) by CLT
6 3014 < /. < 3018
>
= 5?@A. CD-E 3018,3016, 11.35,1 − 5?@A. CD-E 3014, 3016,11.35,1
&
= 0.14
37 #l
***
Today’s Objectives
1. Introduction to Statistics: Sample versus Population
38