0% found this document useful (0 votes)
49 views36 pages

Module 2 - Sample - Afterclass

This document provides an introduction to statistics concepts related to sampling and the central limit theorem. It discusses the differences between populations and samples, how to calculate sample means and standard deviations, and introduces the concept of sampling distributions. It also defines what constitutes a random sample and introduces the central limit theorem. The key objectives covered are introducing statistics and the differences between samples and populations, understanding sampling distributions, and describing and applying the central limit theorem.

Uploaded by

Vanessa Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views36 pages

Module 2 - Sample - Afterclass

This document provides an introduction to statistics concepts related to sampling and the central limit theorem. It discusses the differences between populations and samples, how to calculate sample means and standard deviations, and introduces the concept of sampling distributions. It also defines what constitutes a random sample and introduces the central limit theorem. The key objectives covered are introducing statistics and the differences between samples and populations, understanding sampling distributions, and describing and applying the central limit theorem.

Uploaded by

Vanessa Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

IIMT 2641 Introduction to Business Analytics

Module 2: Intro to Statistics


Topic 2: Sampling

1
Announcement
1. Thank you for your feedback. Your feedbacks are helpful for
me to improve this course.
6
2. Homework 2 due on Oct 5, ⑲ 10 am. Cover decision analysis,
statistics.
3. Hw1 solution has been released.
4. Groups have been formed.
5. Attendance and Participation
• Number of participations >= 3 & Number of unexcused absence <= 2,
the score will be greater than 3.5 (5).
-

• Hand-in in-class practice problems count for participation.


-

• Distribute sticky notes to students who participated in the class.


-

o Please write your name and UID and submit the sticky notes at the end of
the class.
• More participation - bonus.
Objectives
1. Introduction to Statistics: Sample versus Population

2. Define and understand sampling distribution.

3. Describe and apply the Central Limit Theorem.

3
Introduction to Statistics

Population

Sample

4
Introduction to Statistics

Population

Sample

Size N Size n

Population Mean = ! Sample Mean = $̅


5
Population Standard Deviation = " Sample Standard Deviation = %
Introduction to Statistics

How do I calculate the sample mean and sample standard deviation?


1. Sample Mean: For a given sample, you can calculate the mean as
follows:
n
Or, if the sample data is in a column
å xi of excel you can calculate it using the
x= i =1
=average() function.
n

2. Sample Variance/Standard Deviation: For a given sample, you can


calculate the variance and standard deviation of the sample:

1 n Or, if the sample data is in a column


s =
2
å
n - 1 i =1
( xi - x ) 2 of excel you can calculate it using
the =var.s() or =stdev.s() functions.

6
Introduction to Statistics

Example: The height of HKUers is known to be normally distributed with a


mean of 5.5 feet and standard deviation of 0.4 feet. Feng stands at the
entrance of the KKL building and with a special technology documents the
heights of 10 people that walk by. The heights are: {5, 6.5, 5.3, 5.9, 4.9, 6.2,
6.1, 5.3, 5.5, 5.7}.

1. What is !?

2. What is "?

3. (?
What is the sample mean )

4. What is the sample standard deviation s?

7
Introduction to Statistics

Example: The height of HKUers is known to be normally distributed with a


mean of 5.5 feet and standard deviation of 0.4 feet. Feng stands at the
entrance of the KKL building and with a special technology documents the
heights of 10 people that walk by. The heights are: {5, 6.5, 5.3, 5.9, 4.9, 6.2,
6.1, 5.3, 5.5, 5.7}.
5.5
1. What is !?

2. What is "? 0.4

5.64
3. (?
What is the sample mean )

4. What is the sample standard deviation s?


0.53

8
Notation: Population Versus Sample
Population (Size N) Sample (Size n)

The mean ! is the true average (mean) The sample mean #" is the average based
over all N. only on the sample n.

The sample variance %! and sample


The variance $! and standard deviation $
standard deviation % are the variance and
are the true variance and standard
standard deviation estimates based only
deviation over all N.
on the sample n.

*, , ! , , are called $,̅ - ! , - are called statistics


parameters of a distribution of a sample

9
Today’s Objectives
1. Introduction to Statistics: Sample versus Population

2. Define and understand sampling distribution.

3. Describe and apply the Central Limit Theorem.

10
Definitions
§ The sampling distribution of a statistic is the distribution of that statistic
across an arbitrarily large number of samples.

§ The sampling distribution of the mean is the distribution of all possible


sample means theoretically possible from a sample of size n.

Population

&
Sample

11
Constructing our own Sampling Distribution
Let X be a discrete random variable giving the outcome when rolling a single
fair die. Below gives a table and graph of the pmf of X.
P(X=x)
X P(X = x)
0.18
-1 1/6 0.16

0.14
2 1/6 0.12

3 1/6 0.1

0.08

4 1/6 0.06

0.04
5 1/6 0.02

6 1/6 0
1 2 3 4 5 6

What is the expected value of X?


-

12
Constructing our own Sampling Distribution
Let X be a discrete random variable giving the outcome when rolling a single
fair die. Below gives a table and graph of the pmf of X.
P(X=x)
X P(X = x)
0.18

1 1/6 0.16

0.14
2 1/6 0.12

3 1/6 0.1

0.08

4 1/6 0.06

0.04
5 1/6 0.02

6 1/6 0
1 2 3 4 5 6

What is the expected value of X?

* = (1/6) * 1 + (1/6) * 2 + (1/6) * 3 + (1/6) * 4 + (1/6) * 5 + (1/6) * 6 = 3.5

13
Constructing our own Sampling Distribution
X = Outcome of Rolling a Die
Step 1: Google “Roll Dice”, and click on the result (RANDOM.ORG)
Step 2: Roll a single die 15 times, record each of the options in a
single column in excel.
Step 3: Calculate the sample mean of the n=15 sample ($̅ ). Write the
value of $̅ in the blank below.
Step 4: Share the value of $̅ with the class

Population (Size N) Sample (Size n = 15)

Population Mean ! = 3.5 Sample mean #̅ = __________

14
Constructing our own Sampling Distribution
What do we observe?

-
Population Distribution
of X
Sampling Distribution
of /.

Discrete/ Continuous Discrete .


continuous

Shape

Mean

15
Constructing our own Sampling Distribution
What do we observe?

Population Distribution Sampling Distribution


of X of /.

Discrete/ Continuous Discrete Continuous

Shape Flat, Evenly Bell Curve


Distributed

Mean 3.5 3.5

16
Constructing our own Sampling Distribution

What would the sampling distribution


look like if the population distribution
was different?

What would the sampling distribution


look like if the samples were bigger?

17
Today’s Objectives
1. Introduction to Statistics: Sample versus Population

2. Define and understand sampling distribution.

3. Describe and apply the Central Limit Theorem.

18
Random Samples
– The random variables X1, X2, …, Xn are a random sample of n observations if
q

q
-
The random variables are independent, and
The random variables have the same distribution.
– We will say that “X1, X2, …, Xn are iid” where “iid” is short for “independent and
identically distributed” or random sample
-

– Random samples are a critical assumption for situations where we are interested
in making statements about the population and not merely describing the sample
of observations
q Scientific studies use random samples or more generally “probability samples”
q Many business applications do not need or have random samples
• Example: user generated content such as product reviews

19
Conceptual Model for Random Samples (not required)
1. Obtain a sampling frame that lists all N members of the population

--
2. Use a randomization device (e.g. uniform random numbers or =rand() in
EXCEL) to randomly select n of the N members of the population
§ Examples
– The U.S. Bureau of Labor Statistics conducts a monthly survey of about 60,000
households that are randomly sampled from all households in the U.S. to
estimate unemployment rates
q https://www.bls.gov/cps/cps_htgm.htm
– Polling firms randomly sample from voter registration rolls to estimate
preferences for candidates
– Marketing firms use random digit dialing of phone numbers
q They hope this mimics sampling from a sampling frame

20
N(Mx ,)
Normal Population Distribution
If the distribution of X is normal with mean *" and standard deviation ,"
then
- -

0) * of
04 %$ ~ ()*+,- (!) , ) -
Standard
N( ,

9 1 In deviation

/
-- .

Standard Error

– The random variables X1, X2, …, Xn are a random sample of n observations


-

q The random variables are independent and Xi ~ N(&# , ((!)^2 ) for i = 1, 2, …, n


– ( is the sample mean of the random sample with n observations
0
"
q #" = #" + #$ + ⋯ + ##
#

– ( is called the standard error.


The standard deviation of 0 -

• The standard error captures how “off” the sample statistics are from the true population
value.
• *The standard error decreases as n increases.*

21
Normal Population Distribution

10
M=35 0x
=

22
Normal Population Distribution: Increasing the Sample Size
X is a normally distributed random variable that captures “Waiting Time”

-
!
Sample size n = 5 X ~ Normal (35, 10^2)
!#= 35, "#= 10
XM (35 , (5)2

- Sample size n = 30

23
X ~ Normal (35, 10^2)
!#= 35, "#= 10 x-NBs, (E)
Normal Population Distribution: Increasing the Sample Size
X is a normally distributed random variable that captures “Waiting Time”

Sample size n = 5 X ~ Normal (35, 10^2) !


10
"! ~ Normal (35, )
!#= 35, "#= 10 1

Sample size n = 30
X ~ Normal (35, 10^2) 10 !
"! ~ Normal (35, )
24 !#= 35, "#= 10 34
Mini Dooper Example
Mini Dooper, a car manufacturer, hires a marketing firm to conduct a
customer satisfaction survey of a randomly selected sample of n=25 US
customers. One of the survey questions is “What price did you pay for your
recently purchased mini?” Mini Dooper knows that the selling price for all of
-
the cars sold in the US is normally distributed with a mean of $27,500, with a
standard deviation of $7,500.

1. What is the distribution of the selling price?

N(27500 , )

2. What is the distribution of the sample mean selling price?

X -
N(27500 ,

F 15002) =

25
Mini Dooper Example
3. What is the standard deviation in selling price?
7500
4. What is the standard error of the average selling price?

e 21500
5. What is the probability a random customer’s purchase is greater than
$20,000?
P(X > 20000) =

6. What is the probability the sample mean is greater than $20,000?

P(X > 2000)

26
Mini Dooper Example
Mini Dooper, a car manufacturer, hires a marketing firm to conduct a
customer satisfaction survey of a randomly selected sample of n=25 US
customers. One of the survey questions is “What price did you pay for your
recently purchased mini?” Mini Dooper knows that the selling price for all of
the cars sold in the US is normally distributed with a mean of $27,500, with a
standard deviation of $7,500.

1. What is the distribution of the selling price?


X = selling price
X ~Normal(27500, 7500^2)

2. What is the distribution of the sample mean selling price?


. sample mean selling price
/=
/. ~Normal(27500, (7500/√25)^2)

27
1 P(X)200007
-

Mini Dooper Example #20000)


3.
, = 7500
=

What is the standard deviation in selling price? orio)


P/ I N10 1 .

4. What is the standard error of the average selling price? ↑(2 ! z)


,/√5 =7500/√25 = 1500
E
5. What is the probability a random customer’s purchase is greater than
$20,000?

!
6(/ > 20000)
6 / > 20000 = 1 − 6 / ≤ 20000
= 1 − 5?@A. CD-E 20000,27500,7500,1 = 0.84
6. What is the probability the sample mean is greater than $20,000?
20000 27500
5
-

-=
-

P( 6(/. > 20000) 1500

6 /. > 20000 = 1 − -6 /. ≤ 20000


1
-

P(2= -5) = 1 − 5?@A. CD-E 20000,27500, IJKK, 1 = 0.9999


·

28 <1-0
:
8003-0 .
99 P(2
---
= 5) = 4(2 = 3 99) 0 00003
-
-

.
=

& .

.
The Central Limit Theorem
The Central Limit Theorem (or CLT) also states:
No matter the distribution of X, as long as the sample n is “large enough”
-

then:
0) *
%$ ~ ()*+,- (!+ , )
1
A
What is “large enough”?
• Approximately need at least n = 30.
-

• The assumption that X1, X2, …, Xn is a random sample is critical


for the central limit theorem to hold along with large n

29
Unknown Distribution of X: Sample Size Matters!
X is a random variable that captures “Hourly Ad Revenue on Facebook”
X has unknown distribution but known mean and standard deviation

I
Sample size n = 5
X ~ unknown
distribution
!#= $7.35, "#= $3.34

- Sample size n = 30
X ~ unknown
distribution
x Nix
,))
32 !#= $7.35, "#= $3.34
Unknown Distribution of X: Sample Size Matters!
X is a random variable that captures “Hourly Ad Revenue on Facebook”
X has unknown distribution but known mean and standard deviation

Sample size n = 5
X ~ unknown Central Limit Theorem can’t
guarantee that "! is normally
distribution distributed
!#= $7.35, "#= $3.34

Sample size n = 30 !
X ~ unknown 3.34
"! ~ Normal (7.35, )
34
distribution By Central Limit Theorem
33 !#= $7.35, "#= $3.34
Risky Insurance Example
Based on market research, Risky Insurance knows that over all its homeowner’s
(HO) claims, the mean claim amount is $3,016 with a standard deviation of $227.
The distribution of claim amounts is unknown. They conduct a random survey of

homeowner claims (n = 100).

1. What is the sampling distribution of the sample mean amount of HO claims?

X-N13016 ,
((7) =27
2. What is the probability the sample mean of HO claims will be less than

see)
$3000?
P(X 3000) P(R <
0
24
=

= :
.

34
Risky Insurance Example
3. What is the probability the mean of the random sample will be within
$10 of the population mean HO claim amount?

4. What is the probability a random sample of 400 customer claims will


have a sample mean within $2 of the population mean?

35
Risky Insurance Example
Based on market research, Risky Insurance knows that over all its homeowner’s
(HO) claims, the mean claim amount is $3,016 with a standard deviation of $227.
The distribution of claim amounts is unknown. They conduct a random survey of
homeowner claims (n = 100).

1. What is the sampling distribution of the sample mean amount of HO claims?

-
. sample mean of HO claims
/=
. / ~Normal(3016, (227/√100)^2) by CLT

&$3000?
2. What is the probability the sample mean of HO claims will be less than

&
"

6(/. < 3000)


=

P(2 =
z)

6 /. < 3000 = 5?@A. CD-E 3000, 3016,22.7,1 = 0.24

36
Risky Insurance Example
3. What is the probability the mean of the random sample will be within
$10 of the population mean HO claim amount?
Population mean is $3,016, so we need to find: 6(3006 < /. < 3026)
p =
6 3006 < /. < 3026

-"
= 5?@A. CD-E 3026,3016, 22.7,1 − 5?@A. CD-E 3006, 3016,22.7,1
= 0.34

A
4. What is the probability a random sample of 400 customer claims will
have a sample mean withinSD
P2
=

&
$2 of the population mean? P(30064(3026)
6 3014 < /. < 3018 , where /. ~Normal(3016, (227/√400)^2) by CLT
6 3014 < /. < 3018
>
= 5?@A. CD-E 3018,3016, 11.35,1 − 5?@A. CD-E 3014, 3016,11.35,1
&
= 0.14

37 #l
***
Today’s Objectives
1. Introduction to Statistics: Sample versus Population

2. Define and understand sampling distribution.

3. Describe and apply the Central Limit Theorem.

38

You might also like