0% found this document useful (0 votes)
29 views

Obtaining Data

This course is designed for undergraduate engineering students to solve societal problems through data analysis and problem solving. It introduces different methods for collecting data, including personal interviews, telephone interviews, self-administered questionnaires, and direct observation. Key considerations for data collection methods include sample representation, response rates, and potential for bias or error. Common sampling designs discussed include simple random sampling, stratified random sampling, cluster sampling, and systematic sampling. Designing effective surveys requires addressing issues such as question type, responses sought, and potential nonresponse or measurement problems.

Uploaded by

mac b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Obtaining Data

This course is designed for undergraduate engineering students to solve societal problems through data analysis and problem solving. It introduces different methods for collecting data, including personal interviews, telephone interviews, self-administered questionnaires, and direct observation. Key considerations for data collection methods include sample representation, response rates, and potential for bias or error. Common sampling designs discussed include simple random sampling, stratified random sampling, cluster sampling, and systematic sampling. Designing effective surveys requires addressing issues such as question type, responses sought, and potential nonresponse or measurement problems.

Uploaded by

mac b
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

5

4.5 Chart Title


4

ENGINEERING
3.5
3
2.5
2

DATA ANALYSIS 1.5


1
0.5
0
Category 1 Category 2 Category 3 Category 4

Series 1 Series 2
This course is designed for undergraduate engineering students with emphasis

on problem solving related to societal issues that engineers and scientists are

called upon to solve. It introduces different methods of data collection and the

suitability of using a particular method for a given situation.


Obtaining Data
Methods of Data Collection
Having chosen a particular sample survey, how does one
collect the data?
1.) Personal interviews
Data are frequently obtained by personal interviews.
For example, we can use personal interviews with eligible
voters to obtain a sample of public sentiment toward a
community bond issue.
The procedure usually requires the interviewer to ask
prepared questions and to record the respondent’s answers.
The primary advantage of these interviews is that people will
usually respond when confronted in person.
2.) Telephone interview
Information can also be obtained from persons in
the sample through telephone interviews. Surveys
conducted through telephone interviews are frequently
less expensive than personal interviews.
A major problem with telephone surveys is that it
is difficult to find a list or directory that closely
corresponds to the population. Telephone directories
have many numbers that do not belong to households,
and many households have unlisted numbers personal
interviews, owing to the elimination of travel expenses.
3.) Self-administered questionnaire

Another useful method of data collection


is the self-administered questionnaire, to be
completed by the respondent. These
questionnaires usually are mailed to the
individuals included in the sample, although
other distribution methods can be used.
The low response rate can introduce a
bias into the sample because the people who
answer questionnaires may not be
representative of the population of interest.
4.) Direct observation

The fourth method for collecting data is direct observation.


If we were interested in estimating the number of trucks that use a particular road during the 4 – 6 P.M.
rush hours, we could assign a person to count the number of trucks passing a specified point during this
period, or electronic counting equipment could be used.
The disadvantage in using an observer is the possibility of error in observation.
Designing
Surveys/Experiment
Survey
A survey is a process of collecting data from existing population units, with no
particular control over factors that may affect the population characteristics of
interest in the study.
In order to precisely describe the components that are necessary for a sample to be
effective, the following definitions are required.
Target population: The complete collection of objects whose description is the
major goal of the study.
Sample: A subset of the target population
Observation unit: The object upon which data are collected. In studies involving
human populations, the observation unit is a specific individual in the sampled
population. In ecological studies, the observation unit may be a sample of water from
a stream or an individual plant on a plot of land.
Sampling designs
Simple random sampling
The basic design (simple random sampling) consists of selecting a group of
n units in such a way that each sample of size n has the same chance of
being selected.
Stratified random sample
Stratified random sampling is a method of sampling that involves the
division of a population into smaller subgroups known as strata.
Ratio estimation
Ratio estimation uses the known population totals for variables to improve
the weighting from sample values to population estimates. It compares the
sample estimate of the variable with the population total.
Cluster sampling
Although individual preferences are desired in the survey, a more economical
procedure, especially in urban areas, may be to sample specific families,
apartment buildings, or city blocks rather than individual voters. Individual
preferences can then be obtained from each eligible voter within the unit
sampled. This technique is called cluster sampling
Systematic sample
Sometimes, the names of persons in the population of interest are available in a
list, such as a registration list, or on file cards stored in a drawer. For this
situation, an economical technique is to draw the sample by selecting one name
near the beginning of the list and then selecting every tenth or fifteenth name
thereafter. If the sampling is conducted in this manner, we obtain a systematic
sample
EXAMPLES
Identify the type of sampling design in each of the following situations.
1. The selection of 200 people to serve as potential jurors in a medical
malpractice trial is conducted by assigning a number to each of 140,000 registered
voters in the county. A computer software program is used to randomly select 200
numbers from the numbers 1 to 140,000. The people having these 200 numbers
are sent a postcard notifying them of their selection for jury duty.
ANSWER: A simple random sample is selected using the list of registered voters
as the sampling frame.
2. Suppose you are selecting microchips from a production line for
inspection for bent probes. As the chips proceed past the inspection
point, every 100th chip is selected for inspection.
ANSWER: This is an example of systematic random sampling. This type
of inspection should provide a representative sample of chips because
there is no reason to presume that there exists any cyclic variation in the
production of the chips. It would be very difficult in this situation to
perform simple random sampling because no sampling frame exists.
ASSIGMENT:
Identify the type of sampling design in each of the following situations and briefly
explain.

1. The Internal Revenue Service wants to estimate the amount of personal deductions
taxpayers made based on the type of deduction: home office, state income tax,
property taxes, property losses, and charitable contributions. The amount claimed
in each of these categories varies greatly depending on the adjusted gross income
of the taxpayer. Therefore, a simple random sample would not be an efficient
design. The IRS decides to divide taxpayers into five groups based on their
adjusted gross incomes and then takes a simple random sample of taxpayers from
each of the five groups.
ANSWER: This is an example of stratified random sampling with the five levels
of personal deductions serving as the strata.
2. The USDA inspects produce for E. coli contamination. As trucks
carrying produce cross the border, the truck is stopped for inspection. A
random sample of five containers is selected for inspection from the
hundreds of containers on the truck. Every apple in each of the five
containers is then inspected for E. coli.
ANSWER: This is a cluster sampling design with the clusters being the
containers and the individual apples being the measurement unit.
DESIGN OF A SURVEY
There are basically three kinds of questions that may be asked:
• dichotomous
• multiple choice
• free answer
In the dichotomous question, the respondent is asked to select one of two responses,
usually "yes" and "no."

For example, in a transportation study, a worker may be asked,

Did you drive a car to work this morning? YES ( ), NO ( ) .

In the multiple choice question, the respondent is asked to select one of a number of
responses:

What is the likelihood of your using the following services for preventive health care
purposes in the next two years?

(a)Dental check-up, (b) Eye exam, (c) General physical


In the free answer form, the respondent is asked to answer a question in his or her
own words in essay form:
What is your opinion of the dining hall food and service? The difficulty with the free
answer question is in classifying the responses.
This may not only be difficult and somewhat arbitrary, but it is also extremely time
consuming:
Problems Associated with Surveys
Survey nonresponse may result in a biased survey because the sample is not representative of
the population. It is stated in Judging the Quality of a Survey that in surveys of the general
population women are more likely to participate than men; that is, the nonresponse rate for males
is higher than for females.
remedies for nonresponse are:
1. Offering an inducement for participating in the survey
2. Sending reminders or making follow-up telephone calls to the individuals who did not
respond to the first contact
3. Using statistical techniques to adjust the survey findings to account for the sample profile
differing from the population profile
Measurement problems are the result of the respondents not providing the
information that the survey seeks. These problems often are due to the specific
wording of questions in a survey, the manner in which the respondent answers the
survey questions, and the fashion in which an interviewer phrases questions during
the interview.
Examples of specific problems and possible remedies are as follows:

1. Inability to recall answers to questions:


The interviewee is asked how many times he or she visited a particular city park during the past year. This

type of question often results in an underestimate of the average number of times a family visits the park

during a year because people often tend to underestimate the number of occurrences of a common event or

an event occurring far from the time of the interview. A possible remedy is to request respondents to use

written records or to consult with other family members before responding.


2. Leading questions:
The fashion in which an opinion question is posed may result in a response that
does not truly represent the interviewee’s opinion. Thus, the survey results may be
biased in the direction in which the question is slanted.
Example: “Do you support the state fining the chemical company, which is the
major employer of people in our community, considering that this fine may result
in their moving to another state?”
This type of question tends to elicit a “no” response and thus produces a distorted
representation of the community’s opinion on the imposition of the fine.
3. Unclear wording of questions:
An exercise club attempted to determine the number of times a person exercises
per week. The question asked of the respondent was, “How many times in the last
week did you exercise?” The word exercise has different meanings to different
individuals.
Experiment
An experiment is a process of collecting data about population characteristics when control
is exercised over some or all factors that may affect the characteristics of interest in the
study.
Terminologies
Treatments
The treatments in an experimental study are the conditions constructed from the factors.
Factors measurements or observations
Controlled variables called factors are selected by the researchers for comparison. Response
variables are measurements or observations that are recorded but not controlled by the
researcher.
Designed Experiment
Is an investigation in which a specified framework is provided in order to observe, measure, and
evaluate groups with respect to a designated response
Experimental unit
The experimental unit is the physical entity to which the treatment is randomly assigned or the
subject that is randomly selected from one of the treatment populations.

Replication
Consider another experiment in which a researcher is testing various dose levels (treatments) of
a new drug on laboratory rats. If the researcher randomly assigned a single dose of the drug to
each rat, then the experimental unit would be the individual rat. Once the treatment is assigned
to an experimental unit, a single replication of the treatment has occurred.
Measurement unit
Distinct from the experimental unit is the measurement unit. This is the physical entity upon
which a measurement is taken. In many experiments, the experimental and measurement unit
are identical. In Example, the measurement unit is the container, the same as the experimental
unit. However, if the individual shrimp were weighed as opposed to obtaining the total
weight of all the shrimp in each container, the experimental unit would be the container, but
the measurement unit would be the individual shrimp.
example
Consider the following experiment. Four types of protective coatings for frying pans are to
be evaluated. Five frying pans are randomly assigned to each of the four coatings. A
measure of the abrasion resistance of the coating is measured at three locations on each of
the 20 pans. Identify the following items for this study: experimental design, treatments,
replications, experimental unit, measurement unit, and total number of measurements
Treatments: Four types of protective coatings.
Replication: There are five frying pans (replications) for each treatment.
Experimental unit: Frying pan, because coatings (treatments) are randomly assigned to the frying
pans.
Measurement unit: Particular locations on the frying pan.
Total number of measurements: 4 x 5 x 3=60 measurements in this experiment. The experimental
unit is the frying pan since the treatment was randomly assigned to a coating. The measurement
unit is a location on the frying pan.
Experimental Designs
SITUATION: ways in which the tires can be assigned to the four cars

Completely randomized design

is used when we are interested in comparing t “treatments” (in our case, t=4, the treatments are
brand of tire). For each of the treatments, we obtain a sample of observations. The sample sizes
could be different for the individual treatments. For example, we could test 20 tires from
Brands A, B, and C but only12 tires from Brand D.
Randomized block design

In our example, we would want to avoid having the comparison of the tire brands distorted by
the differences in the four cars. The experimental design used to accomplish this goal is called a
randomized block design because we want to “block” out any differences in the four cars to
obtain a precise comparison of the four brands of tires. In a randomized block design, each
treatment appears in every block.
Latin square design
A design having two blocking variables is called a Latin square design, the variables
are: the “car” the tire is placed on and the “position” on the car.
EXAMPLE FOR Sampling Designs for
Survey
1.) Time magazine, in an article in the late 1950s, stated that “the average Yaleman,
class of 1924, makes $25,111 a year,” which, in today’s dollars, would be over
$150,000. Time’s estimate was based on replies to a sample survey questionnaire
mailed to those members of the Yale class of 1924 whose addresses were on file with
the Yale administration in the late 1950s.
a.) What is the survey’s population of interest?
the population of interest is the alumni of Yale in the class of 1924

b.) Were the techniques used in selecting the sample likely to produce a sample that was
representative of the population of interest?
since there is a possibility that not all of the students of class 1924 were on the file with the Yale
administration in the late 1950s, the technique used in selecting the sample would not represent
the population of interest.

c.) What are the possible sources of bias in the procedures used to obtain the sample?
A possible bias in the study would be the no response of the alumni. Furthermore, another bias
would be alumni forgetting his actual earnings.
d.) Based on the sources of bias, do you believe that Time’s estimate of the salary of a
1924 Yale graduate in the late 1950s is too high, too low, or nearly the correct value?
Some respondents have a high tendency of not declaring their exact amount of earnings. They
may state a higher amount than their actual earnings since they want to show off. Therefore,
the estimated salary of class 1924 is too high.
2. The New York City school district is planning a survey of 1,000 of its
250,000 parents or guardians who have students currently enrolled. They
want to assess the parents’ opinion about mandatory drug testing of all
students participating in any extracurricular activities, not just sports. An
alphabetical listing of all parents or guardians is available for selecting the
sample. In each of the following descriptions of the method of selecting the
1,000 participants in the survey, identify the type of sampling method used
(simple random sampling, stratified sampling, or cluster sampling).

a.) SIMPLE RANDOM SAMPLING- the respondents were selected


randomly after assigning a number. There were no groupings or intervals.
Thus, the sampling method used is simple random sampling.
b.) The school district is also concerned that the parent or guardian’s opinion may differ
depending on the age and sex of the student. Each name is randomly assigned a number.
The names with numbers 1 through 1,000 are selected for the survey. The parent is asked
to fill out a separate survey for each of their currently enrolled children.
CLUSTER SAMPLING- in the given description, each selected parent or guardian had
answered the survey for each of their currently enrolled. Thus, we can say that each
parent or guardian represents a cluster children.
c. ) The schools are divided into five groups according to grade level taught at the school:
K–2, 3–5, 6 –7, 8–9, 10 –12. Five separate sampling frames are constructed, one for each
group. A simple random sample of 200 parents or guardians is selected from each group.
STRATIFIED SAMPLING METHOD - the schools were grouped like a stratum based
on the grade level taught.
EXAMPLE FOR SAMPLING DESIGNS FOR EXPERIMENT

1.) A research specialist for a large seafood company plans to investigate bacterial growth on oysters
and mussels subjected to three different storage temperatures. Nine cold-storage units are available.
She plans to use three storage units for each of the three temperatures. One package of oysters and
one package of mussels will be stored in each of the storage units for 2 weeks. At the end of the
storage period, the packages will be removed and the bacterial count made for two samples from
each package. The treatment factors of interest are temperature (levels:0,5,10°C) and seafood
(levels: oysters, mussels). She will also record the bacterial count for each package prior to placing
seafood in the cooler. Identify each of the following components of the experimental design.
a.) factors
Factors are controlled variables compared in a study. The possible values that these factors can take are called factor
levels.

factors = storage temperature and the type of seafood. For the temperature, the factor levels are 0°C, 5°C, and 10°C.

b.) factor levels oyster


s.
the factor levels are mussels and oysters.

c.). experimental and measurement unit

The experimental unit is a physical entity that is the main interest in the study. Meanwhile, the measurement unit is a
physical entity where the measurements in the study are taken.

experimental units = the sample from the packages,


measurement units = the packages of seafood.
d.. replications

Replication is the repetition of an experiment under the same conditions.

Analyzing the study, the replications present were the 33 packages per temperature.

f. replications

You might also like