0% found this document useful (0 votes)
5 views31 pages

Part 2

Uploaded by

Fadia Puan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views31 pages

Part 2

Uploaded by

Fadia Puan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Sampling

Distributions
&
Sampling
Techniques
Pop Quiz

– In a cake factory, the standard deviation of sugar per cup is 129 gram. What is
the mean if not more than 62% has less than 440 gram of sugar per cup?
– What if not more than 62% has more than 440 gram of sugar per cup?
– In a vitamin factory, the standard deviation of vitamin C is 16 milligram. What is
the mean if more than 75% should have 51 milligram of vitamin C or more?
Discrete & Continuous
Distributions
– A random variable is discrete if the set of all possible values is at most a finite or a countably infinite number of possible
values.
Examples:
1. Randomly selecting 25 people who consume soft drinks and determining how many people prefer diet
soft drinks
2. Counting the number of people who arrive at a store during a five-minute period

• A random variable is continuous if it can take on values at every point over a given interval.
Examples:
1. Measuring the time between customer arrivals at a retail outlet
2. Measuring the weight of grain in a grain elevator at different points of time

• Discrete distributions (binomial, Poisson, hypergeometric) are constructed from discrete random
variables.
• Continuous distributions (uniform, normal, exponential, and others) are constructed from
continuous random variables.
I. Discrete Distribution
– A histogram is the most common graphical way of describing a discrete
distribution.
• An executive is considering out-of-town business travel for a given Friday. She recognizes
that at least one crisis could occur on the day that she is gone and she is concerned about
that possibility. Table 5.2 shows a discrete distribution that contains the number of crises
that could occur during the day that she is gone and the probability that each number will
occur.
5.2 Describing a Discrete Distribution

Mean, Variance, and Standard Deviation of Discrete Distributions

– The mean or expected value of a discrete distribution is the long run


average of occurrences.

where
long-run average
an outcome
probability of that outcome
• In the long run, the mean or expected number
of crises on a given Friday for this executive is
1.15 crises.
• However, there will never be exactly 1.15
crises.
II.
Continuous
Distributions
6.2 The Normal Distribution
Characteristics of the Normal Distribution

• It is a continuous distribution.
• It is a symmetrical
distribution about its mean.
• It is asymptotic to the
horizontal axis.
• It is unimodal.
• It is a family of curves.
• Area under the curve is 1.
6.2 The Normal Distribution
Probability Density Function of the Normal Distribution

– Shows area under the normal curve for a given mean and standard deviation.
– Since it is difficult to use the formula, common to use a table or computer.
6.2 The Normal Distribution

Standardized Normal Distribution

– The normal distribution is described by its mean and standard deviation.


– All normal distributions can be converted to a single distribution, the z distribution, using
the formula:

– A z score is the number of standard deviations that a value, x, is above or below the
mean.
– The z distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
6.2 The Normal Distribution

Solving for Probabilities Using the Normal Curve


– Example: According to the U.S. Environmental Protection Agency (EPA), on average
there are 4.43 pounds of waste generated per person in the U.S. per day.
– Suppose waste generated per person per day in the U.S. is normally distributed
with a standard deviation of 1.32 pounds.
– If a U.S. person is randomly selected, what is the probability that the person generates more
than 6.00 pounds of waste per day?

– First, find the z value:

– Look the value up in the z table, which gives an area of .3830.


6.2 The Normal Distribution
Solving for Probabilities Using the Normal Curve
– Example, continued.
– .3830 is the area between the mean and the z value of 1.19 (x value of 6).
– Subtract from .5 to get the area in the upper tail.

• There is an 11.7% chance that a randomly


selected person will generate more than 6
pounds of waste per day.
6.2 The Normal Distribution
Using the Computer to Solve for Normal Distribution Probabilities

• Both Excel and Minitab can be used.

• For the waste generation problem


given earlier, if a U.S. person is
randomly selected, what is the
probability that the person generates
between 5.30 and 6.50 pounds of
waste per day?

• Both programs give the probability,


0.1965.
III. Sampling Techniques

Reasons for Sampling


– The sample can save money.
– The sample can save time.
– For given resources, the sample can broaden the scope of the study.
– Because the research process is sometimes destructive, the sample can save product.
– If accessing the population is impossible, the sample is the only option.
Reasons for Taking a Census
• Eliminate the possibility that a randomly selected sample may not be
representative of the population.
• For the safety of the consumer.
• To benchmark data for future studies.
Frame
• List, map, or directory used in the sampling process to represent the
population.
• Also called the working population.
7.1 Sampling
Frame
– A frame is overregistered if it contains units that are not in the target
population.
– A frame is underregistered if it does not include some units that are in the
population.
Types of Sampling Designs

14-15
7.1 Sampling

Random Versus Nonrandom Sampling


– In random sampling, every unit of the population has the same chance of being selected.
– In nonrandom sampling, not every unit of the population has the same chance of being
selected.
– Generally NOT an appropriate technique for gathering data for statistical analysis

Simple Random Sampling


– Each unit in the frame is numbered from 1 to N (the size of the population.
– A random number table or generator is used to select n items into the sample.
7.1 Sampling
Simple Random Sampling, continued.
Example: From the population frame of companies in Table 7.3, select a simple
random sample of six companies.
– First, the companies were numbered from 1 to 30.
7.1 Sampling
Example, continued:
– From the table of random number, two digit numbers are selected, discarding any that are over 30.
– In the table below, the first two digits are 91, which is unusable.
– The second two digits are 56, also unusable, as is 74, the next two digits
– The fourth set of two digits are 25, which corresponds with Occidental Petroleum.
7.1 Sampling
Example, continued:
– Continue moving across the rows until six two-digit numbers are selected.
– Sample will be:
– (25) Occidental Petroleum
– (27) Procter & Gamble
– (01) Alaska Airlines
– (04) Bank of America
– (02) Alcoa
– (29) Sears
7.1 Sampling
Stratified Random Sampling
– Population is divided into nonoverlapping subpopulations (strata).
– Researcher selects a random sample from each.
– Can reduce sampling error, because sample will more closely match the population.
– More costly than a simple random sample.
– Strata are usually chosen based on available information about the population.

• Within each group, there should be


homogeneity.

• Between each group, there should be


heterogeneity.
7.1 Sampling

Systematic Sampling
– Every kth item is selected to produce a sample of size n from a population of size N.

Example: A business researcher wanted to sample Texas manufacturers as part of a management study.
– Wanted to sample 1,000 companies.
– Frame-- most recent edition of the Texas Manufacturers Register® which listed 26,000 manufacturing
companies in alphabetic order.
– The value of k was 26 (26,000/1,000).
– Use random number table to choose the first element in the study.
7.1 Sampling

Cluster (or Area) Sampling

– Dividing population into nonoverlapping areas.


– Clusters that are internally heterogeneous.
– Example: states, cities
– If clusters are too large, a second set of clusters can be taken from the initial cluster (two-stage
sampling).

– Advantages: convenience, cost


– Disadvantages: may be less efficient than simple random sampling if the elements of the cluster
are similar
7.1 Sampling

Nonrandom Sampling
– Any method that does not involve a random selection process.

Convenience Sampling
– Selected for the convenience of the researcher.

Judgment Sampling
– Chosen by the judgement of the researcher.
– Since the probability of an element being selected cannot be determined, cannot determine
sampling error.
– Can be biased due to systematic errors in judgment.
7.1 Sampling

Quota Sampling
– Population subclasses, such as age or gender, are used as strata.
– Can be useful if no frame is available for the population.
– Can be less costly.
– But nonrandom, and thus probabilities cannot be calculated.

Snowball Sampling
7.1 Sampling

Sampling Error
– Occurs when the sample is not representative of the population.

Nonsampling Error
– All other errors other than sampling error.
– Missing data
– Recording errors
– Measurement errors
– Input processing errors
– Analysis errors
– Response errors
– And many more!
7.2 Sampling Distribution of
Suppose that a small, finite population contains only N = 8 numbers:
54 55 59 63 64 68 69 70

– Distribution of the population data:

– Suppose that all possible samples of size n = 2 are taken from this population.
7.2 Sampling Distribution of
Population:
54 55 59 63 64 68 69 70

All possible samples of n = 2:

– Then take the means of all of the samples.


7.2 Sampling Distribution of
Means of the samples:

Distribution of the means of the samples:


7.2 Sampling Distribution of
– Distribution of the mean of the samples looks different from the original
distribution

– Similarly, the histogram of a Poisson distribution and its samples are different.
7.2 Sampling Distribution of
The Central Limit Theorem
– If random samples of size n are repeatedly drawn from a population that has a mean of μ and a
standard deviation of σ, the sample means,, are approximately normally distributed for sufficiently
large sample sizes (n ≥ 30), regardless of the shape of the population distribution. If the population
is normally distributed, the sample means are normally distributed for any size sample.

– It can be shown that the mean of the sample means is the population mean:

– The standard deviation of the sample means (the standard error of the mean) is:
7.2 Sampling Distribution of

You might also like