Obtaining Data
Obtaining Data
ENGINEERING
3.5
3
2.5
2
Series 1 Series 2
This course is designed for undergraduate engineering students with emphasis
on problem solving related to societal issues that engineers and scientists are
called upon to solve. It introduces different methods of data collection and the
1. The Internal Revenue Service wants to estimate the amount of personal deductions
taxpayers made based on the type of deduction: home office, state income tax,
property taxes, property losses, and charitable contributions. The amount claimed
in each of these categories varies greatly depending on the adjusted gross income
of the taxpayer. Therefore, a simple random sample would not be an efficient
design. The IRS decides to divide taxpayers into five groups based on their
adjusted gross incomes and then takes a simple random sample of taxpayers from
each of the five groups.
ANSWER: This is an example of stratified random sampling with the five levels
of personal deductions serving as the strata.
2. The USDA inspects produce for E. coli contamination. As trucks
carrying produce cross the border, the truck is stopped for inspection. A
random sample of five containers is selected for inspection from the
hundreds of containers on the truck. Every apple in each of the five
containers is then inspected for E. coli.
ANSWER: This is a cluster sampling design with the clusters being the
containers and the individual apples being the measurement unit.
DESIGN OF A SURVEY
There are basically three kinds of questions that may be asked:
• dichotomous
• multiple choice
• free answer
In the dichotomous question, the respondent is asked to select one of two responses,
usually "yes" and "no."
In the multiple choice question, the respondent is asked to select one of a number of
responses:
What is the likelihood of your using the following services for preventive health care
purposes in the next two years?
type of question often results in an underestimate of the average number of times a family visits the park
during a year because people often tend to underestimate the number of occurrences of a common event or
an event occurring far from the time of the interview. A possible remedy is to request respondents to use
Replication
Consider another experiment in which a researcher is testing various dose levels (treatments) of
a new drug on laboratory rats. If the researcher randomly assigned a single dose of the drug to
each rat, then the experimental unit would be the individual rat. Once the treatment is assigned
to an experimental unit, a single replication of the treatment has occurred.
Measurement unit
Distinct from the experimental unit is the measurement unit. This is the physical entity upon
which a measurement is taken. In many experiments, the experimental and measurement unit
are identical. In Example, the measurement unit is the container, the same as the experimental
unit. However, if the individual shrimp were weighed as opposed to obtaining the total
weight of all the shrimp in each container, the experimental unit would be the container, but
the measurement unit would be the individual shrimp.
example
Consider the following experiment. Four types of protective coatings for frying pans are to
be evaluated. Five frying pans are randomly assigned to each of the four coatings. A
measure of the abrasion resistance of the coating is measured at three locations on each of
the 20 pans. Identify the following items for this study: experimental design, treatments,
replications, experimental unit, measurement unit, and total number of measurements
Treatments: Four types of protective coatings.
Replication: There are five frying pans (replications) for each treatment.
Experimental unit: Frying pan, because coatings (treatments) are randomly assigned to the frying
pans.
Measurement unit: Particular locations on the frying pan.
Total number of measurements: 4 x 5 x 3=60 measurements in this experiment. The experimental
unit is the frying pan since the treatment was randomly assigned to a coating. The measurement
unit is a location on the frying pan.
Experimental Designs
SITUATION: ways in which the tires can be assigned to the four cars
is used when we are interested in comparing t “treatments” (in our case, t=4, the treatments are
brand of tire). For each of the treatments, we obtain a sample of observations. The sample sizes
could be different for the individual treatments. For example, we could test 20 tires from
Brands A, B, and C but only12 tires from Brand D.
Randomized block design
In our example, we would want to avoid having the comparison of the tire brands distorted by
the differences in the four cars. The experimental design used to accomplish this goal is called a
randomized block design because we want to “block” out any differences in the four cars to
obtain a precise comparison of the four brands of tires. In a randomized block design, each
treatment appears in every block.
Latin square design
A design having two blocking variables is called a Latin square design, the variables
are: the “car” the tire is placed on and the “position” on the car.
EXAMPLE FOR Sampling Designs for
Survey
1.) Time magazine, in an article in the late 1950s, stated that “the average Yaleman,
class of 1924, makes $25,111 a year,” which, in today’s dollars, would be over
$150,000. Time’s estimate was based on replies to a sample survey questionnaire
mailed to those members of the Yale class of 1924 whose addresses were on file with
the Yale administration in the late 1950s.
a.) What is the survey’s population of interest?
the population of interest is the alumni of Yale in the class of 1924
b.) Were the techniques used in selecting the sample likely to produce a sample that was
representative of the population of interest?
since there is a possibility that not all of the students of class 1924 were on the file with the Yale
administration in the late 1950s, the technique used in selecting the sample would not represent
the population of interest.
c.) What are the possible sources of bias in the procedures used to obtain the sample?
A possible bias in the study would be the no response of the alumni. Furthermore, another bias
would be alumni forgetting his actual earnings.
d.) Based on the sources of bias, do you believe that Time’s estimate of the salary of a
1924 Yale graduate in the late 1950s is too high, too low, or nearly the correct value?
Some respondents have a high tendency of not declaring their exact amount of earnings. They
may state a higher amount than their actual earnings since they want to show off. Therefore,
the estimated salary of class 1924 is too high.
2. The New York City school district is planning a survey of 1,000 of its
250,000 parents or guardians who have students currently enrolled. They
want to assess the parents’ opinion about mandatory drug testing of all
students participating in any extracurricular activities, not just sports. An
alphabetical listing of all parents or guardians is available for selecting the
sample. In each of the following descriptions of the method of selecting the
1,000 participants in the survey, identify the type of sampling method used
(simple random sampling, stratified sampling, or cluster sampling).
1.) A research specialist for a large seafood company plans to investigate bacterial growth on oysters
and mussels subjected to three different storage temperatures. Nine cold-storage units are available.
She plans to use three storage units for each of the three temperatures. One package of oysters and
one package of mussels will be stored in each of the storage units for 2 weeks. At the end of the
storage period, the packages will be removed and the bacterial count made for two samples from
each package. The treatment factors of interest are temperature (levels:0,5,10°C) and seafood
(levels: oysters, mussels). She will also record the bacterial count for each package prior to placing
seafood in the cooler. Identify each of the following components of the experimental design.
a.) factors
Factors are controlled variables compared in a study. The possible values that these factors can take are called factor
levels.
factors = storage temperature and the type of seafood. For the temperature, the factor levels are 0°C, 5°C, and 10°C.
The experimental unit is a physical entity that is the main interest in the study. Meanwhile, the measurement unit is a
physical entity where the measurements in the study are taken.
Analyzing the study, the replications present were the 33 packages per temperature.
f. replications