0% found this document useful (0 votes)
44 views15 pages

Topic06 Written

The document discusses populations, samples, parameters, and statistics. It defines key terms like population, sample, parameter, and statistic. It also describes common statistics like mean, median, variance, and standard deviation. Methods for calculating these statistics from sample data are provided.

Uploaded by

oreowhite111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views15 pages

Topic06 Written

The document discusses populations, samples, parameters, and statistics. It defines key terms like population, sample, parameter, and statistic. It also describes common statistics like mean, median, variance, and standard deviation. Methods for calculating these statistics from sample data are provided.

Uploaded by

oreowhite111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1

Topic 6: Sampling Distributions

6.1 Populations, Samples, and Processes


• We are constantly exposed to collections of facts, or data.

• The discipline of statistics provides methods for organizing and summarizing data
(descriptive statistics)

graphically numerically

and for drawing conclusions based on information contained in the data (infer-
ential statistics).

• An investigation will typically focus on a well-defined collection of objects constituting a


population of interest. For example, a population might consist of all students in our
campus.

• When the desired information is available for all objects in the population, we have
a census. Constraints on time, money, and lack of resources usually make a census
impractical or infeasible, then a subset of the population – a sample – is selected.
2

• In a probability problem, properties of the population under study are assumed known.
Questions regarding a sample taken from the population are posed and answered.

• In a statistics problem, having obtained a sample from a population, an investigator


would frequently like to use sample information to draw some type of conclusion (make
inferences) about the population.

• The relationship between the two disciplines can be summarized by saying that proba-
bility reasons from the population to the sample, whereas inferential statistics
reasons from the sample to the population.

Example 6.1. Consider drivers’ use of manual lap belts in cars equipped with automatic
shoulder belt systems.

– In a probability question, we might assume that 50% of all drivers of cars (popu-
lation) equipped in this way in a certain metropolitan area regularly use their lap
belt.
– We may want to ask questions about a sample selected from the population. For
example, how likely is it that a sample of 100 such drivers will include at least 70
who regularly use their lap belt?
– In inferential statistics, we have sample information available.
– We would like to use sample information to answer a question about the structure
of the entire population from which the sample was selected.
– For example, a sample of 100 drivers of such cars revealed that 65 regularly use
their lap belt. We may want to ask whether this provide substantial evidence for
concluding that more than 50% of all such drivers in this area regularly use their
lap belt.
3

• A variable is any characteristic whose value may change from one object to another in the
population. A variable that takes numerical values is called a quantitative variable; a
variable that takes non-numerical values is called a qualitative variable or categorical
variable. Examples include

x = market classification of a book (categorical )


y = age of the author of a book (quantitative)

Example 6.2. A manufacturer of computer chips claims that less than 10% of his products
are defective. When 1,000 chips were drawn from a large production, 7.5% were found to be
defective.

1. What is the population of interest?

2. What is the sample?

3. Explain briefly how the manufacturer can test the claim.

Example 6.3. For each of the following variables, decide if it is quantitative or categorical. If
it is quantitative, decide if it is discrete or continuous.

1. The number of joggers run per week.

2. The starting salaries of university graduates.

3. The months in which a company’s employees choose to take their vacations.

4. The grades received by students in a statistics course.


4

6.2 Population Parameters and Sample Statistics

Statistical inference is almost always directed toward drawing some type of conclusion about
one or more population parameters. To do so requires that an investigator obtain sample
data from each of the populations under study. Conclusions can then be based on the computed
values of various sample quantities or sample statistics.

Parameter Statistic
Mean µ X
Variance σ2 S2
Standard Deviation σ S
Proportion p P̂

Parameter Statistic

• Target • Known

• Unknown • Random Variable - a list of


possible values with associated
• Constant - one single value probabilities

• Use Statistic to infer Parame-


ter.

We now look at various sample statistics.

The sample mean x of observations x1 , x2 , . . ., xn is given by


Pn
x1 + x2 + · · · + xn xi
x= = i=1
n n

Example 6.4. Given the sample: 55, 73, 75, 80, 80, 85, 90, 92, 93, 98. Compute the sample
mean.
5

The mean is greatly affected by the presence of even a single outlier (unusually large or small
observation) making it an inappropriate measure of center under some circumstances.

(Image taken from the web)

The sample median is obtained by first ordering the n observations from smallest to
largest (with any repeated values included so that every sample observation appears in the
ordered list).

Then, look for the observation in the middle.

x x x x x

If there are two observations in the middle, take the average.

x x x x x x

The sample median is sometimes denoted by x̃, and so the population median is denoted
by µ̃.

Example 6.5. Given the sample: 55, 73, 75, 80, 80, 85, 90, 92, 93, 98. Find the sample
median.

Example 6.6. For the following two sets of data, compute the sample mean and the sample
median.

1. Data: 1, 1, 1, 1, 1.

2. Data: 1, 1, 1, 1, 100.

What can you conclude about the properties of the sample mean and the sample median?
6

The sample variance s2 of observations x1 , x2 , . . ., xn is given by

Pn 2 ( ni=1 xi )2
P
i=1 xi −
Pn
(xi − x)2 n
s2 = i=1 =
n−1 n−1

Pn 2
Pn 2
Pn 2 ( i=1 xi )
Example 6.7. Show that Sxx = i=1 (xi − x) = i=1 xi − .
n

Example 6.8. Given the sample: 55, 73, 75, 80, 80, 85, 90, 92, 93, 98. Compute the sample
variance.
7

Why is the denominator n − 1 but not n?

• The population variance, denoted by σ 2 and (and thus σ is the population standard
deviation can be computed by
Pn
2 (xi − µ)2
σ = i=1
N
when the population is finite and consists of N values. Observe here that the divisor is
N and not N − 1.

• Note that σ 2 involves squared deviations about the population mean µ. If we


actually knew the value of µ, then we could define the sample variance as the average
squared deviation of the sample xi ’s about µ.

• However, the value of µ is almost never known, so the sum of squared deviations about
x must be used. But the xi ’s tend to be closer to their average x than to the population
average µ.

• To compensate for this, the divisor n − 1 is used rather than the sample size n. In other
words, if we used a divisor n in the sample variance, then the resulting quantity would
tend to underestimate σ 2 (produce estimated values that are too small on the average),
whereas dividing by the slightly smaller n − 1 corrects this underestimating.
8

6.3 Sampling Distributions


• Consider selecting two different samples of size n from the same population distribution.
The observations xi ’s in the second sample will virtually always differ at least a bit from
those in the first sample.

• For example, a first sample of n = 3 cars of a particular type might result in fuel efficiencies
x1 = 30.7, x2 = 29.4, x3 = 31.1, whereas a second sample may give x1 = 28.8, x2 = 30.0,
and x3 = 32.5.

• Before we obtain data, there is uncertainty about the value of each xi . Because of this
uncertainty, before the data becomes available we will view each observations as a random
variable and denote the sample by X1 , X2 , . . . , Xn .

• This variation in observed values in turn implies that the value of any function of the
sample observations, or statistic – such as the sample mean and sample standard de-
viation – also varies from sample to sample. That is, prior to obtaining x1 , x2 , . . . , xn ,
there is uncertainty as to the value of x, the value of s, and so on.

• Any statistic, being a random variable, has a probability distribution. The probability
distribution of a statistic is sometimes referred to as its sampling distribution to em-
phasize that it describes how the statistics varies in value across all samples that might
be selected.

• The probability distribution of any particular statistic depends not only on the population
distribution and the sample size n but also on the method of sampling. In our course, we
will be dealing with (simple) random samples.

Definition 6.1. The random variables X1 , X2 , . . . , Xn are said to form a (simple) random
sample of size n if

1. The Xi ’s are independent random variables.

2. Every Xi has the same probability distribution.

In other words, the random variables Xi ’s are independent and identically distributed (iid).
9

Example 6.9. A certain brand of MP3 player comes in three configurations: a model with 2
GB of memory, costing $80, a 4 GB model priced at $100, and an 8 GB version with a price
tag of $120. If 20% of all purchasers choose the 2 GB model, 30% choose the 4 GB model,
and 50% choose the 8 GB model, then the probability distribution of the cost X of a single
randomly selected MP3 player purchase is given by

Suppose on a particular day only two MP3 players are sold. Let X1 = the revenue from the
first sale and X2 = the revenue from the second. Suppose that X1 and X2 are independent,
each with the probability distribution shown above so that X1 and X2 constitute a random
sample from the distribution.

Find the sampling distribution of X and the sampling distribution of S.

x1 x2 p(x1 , x2 ) x s2

P (X = x)

s2

P (S 2 = s2 )
10

6.3.1 The Distribution of the Sample Mean

The importance of the sample mean X springs from its use in drawing conclusions about the
population mean µ. Some of the most frequently used inferential procedures are based on the
properties of the sampling distribution of X.

Proposition 6.1. Let X1 , X2 , ..., Xn be a random sample from a distribution with mean
value µX = µ and standard deviation σX = σ. Then,

(a) E(X) = µX = µ ,

2 σ2 σ
(b) V (X) = σX = and σX = √ .
n n
11

Example 6.10. Let X1 , X2 , ..., Xn be a random sample of size n taken from a population with
mean µ and variance σ 2 . Given that T0 = X1 + X2 + · · · + Xn . Find E(T0 ) and V (T0 ).

Example 6.11. Let X1 , X2 , ..., Xn be a random sample of size 25 taken from a population
with mean µ = 28, 000 and standard deviation σ = 5000. Also T0 = X1 + X2 + · · · + X25 .

a. Find E(X) and σX .

b. Find E(T0 ) and σT0 .


12

6.3.2 The Central Limit Theorem (CLT)

Theorem 6.1. Let X1 , X2 , ..., Xn be a random sample of size n from a population distri-
bution with mean µ and variance σ 2 . Then regardless of the population distribution of
X1 , X2 , ..., Xn , if n is sufficiently large (typically n > 30), X has approximately a normal
2 σ2
distribution with µX = µ and σX = .
n
T0 = X1 + X2 + · · · + Xn also has approximately a normal distribution with µT0 = nµ,
σT20 = nσ 2 .

• There are population distributions for which even an n of 40 or 50 does not suffice, but
such distributions are rarely encountered in practice. On the other hand, the rule of
thumb is often conservative; for many population distributions, an n much less than 30
would suffice. For example, in the case of a uniform population distribution, the CLT
gives a good approximation for n ≥ 12.

• If the Xi ’s are normally distributed, so is X for every sample size n.


13

Example 6.12. The amount of a particular impurity in a batch of a certain chemical product
is a random variable with mean value 4.0g and standard deviation 1.5g.

a. If 50 batches are independently prepared, what is the approximate probability that the
sample average amount of impurity X is between 3.5g and 3.8g?

b. Now consider randomly selecting 100 batches, and let T0 represent the total amount of
impurity in these batches. Find the mean and standard deviation of T0 .

c. Find the probability that this total is at most 425g.


14

Example 6.13. Let X = the number of different people sent text messages during a particular
day by a randomly selected student at a large university. Suppose the mean value of X is 7
and the standard deviation is 6 (values very close to those reported in the article “Cell Phone
Use and Grade Point Average Among Undergraduate University Students” (College Student J.,
2011: 544–551). Among 100 randomly selected such students, how likely is it that the sample
mean number of different people texted exceeds 5?

Notice that the distribution being sampled is discrete, but the CLT is applicable whether the
variable of interest is discrete or continuous.

Exercises Sections 5.3 and 5.4 of textbook: 37, 38, 49, 51, 52, 53, 54

You might also like