BS - Ch1 - Data Collection
BS - Ch1 - Data Collection
Chapter 1
Data collection
of all parameter
Percentage
=
sample = statistic 1
lOMoARcPSD|32314091
Descriptive statistics consists of organizing and summarizing data through tables, graphs and
numerical summaries. (We will study these in Chapters 2 to 4.)
Inferential statistics uses methods that extend a result from a sample to the population, and measure
the reliability of the result. (We will study these in Chapters 5 to 11.)
1. Identify the research objective. (What questions to be asked? Which questions must clearly
identify the population to be studied?)
2. Collect the data needed to answer questions. (We typically look at a sample)
3. Describe the data. (Use descriptive statistics)
4. Perform inference. (Apply appropriate techniques to extend the results obtained from the
sample to the population.)
Example 3: Consider a fully-grown tomato plant and suppose we collect information about the
tomatoes harvested.
¾ Individuals: Tomatoes -
Broad
¾ Variables: Weight, Color, age, size (approx. circumference), etc.
1
Differences
Variables
Quantitative variables -
numerical s
Provide numerical measures for which arithmetic
operations, such as adding, averaging make sense.
2
lOMoARcPSD|32314091
a) Nationality: Qual
Discrete variables -
Count
Have either a finite or countable number of possible values.
Quantitative
Continuous variables -
measure
Variables
Have an infinite number of possible values that are not countable.
These may take on every possible value between any two values.
3
lOMoARcPSD|32314091
Definition: Random sampling is the process of using chance to select individuals from a
population to be included in the sample.
Note: If the convenience is used to obtain a sample, the results of the survey are meaningless.
Example 1: Suppose you want to find the proportion of the students on your campus who work.
It is convenient to survey the students in your class. BUT do these students represent the overall
students?
Probably not! So, any results reported from your survey will be misleading.
We will discuss four basic sample techniques in this chapter. These are designed so that any selection
biases introduced during the selection process are eliminated.
x Simple random sampling (SRS)
x Stratified sampling
x Systematic sampling
x Cluster sampling.
A sample of size n from a population of size N is obtained through simple random sampling, if
every possible sample of size n has an equally likely chance of occurring. Such a sample is called a
simple random sample.
Note: In other words, simple random sampling is like selecting names from a hat.
JS JL
S2 4
lOMoARcPSD|32314091
Procedure:
1. Obtain a frame that lists all the individuals in the population of interest. (say there are N individuals)
2. Number the individuals in the frame 1 to N.
3. Pick a starting place in the table. (This is done by closing our eyes and placing the pencil on it to
accomplish the goal of being random.)
4. Then we read and write down the numbers until we obtain a sample of desired size.
a. If your N has k digits, then we read k digit numbers from the table.
b. We can read the numbers from left to right or downwards.
c. You may ignore if a number is repeated.
-
fr
Row 13
Example 3: Obtain a SRS of size 4 from the following population using the above table of random
numbers. You may start at column 4, row 13 and read numbers downward.
1 Bob 9 frank 17 Heidi
Solution:
24 IN -24
Population
-
⑧ Sampley
01 : Bob
52
④ 07 : Dean
44 11 : Elsa
46
37 23 : Zack
230 _
Technology
We can use excel to obtain a simple random sample for Example 3.
1. Type 1 to 24 in a column in an excel spreadsheet (say cells 1 to 24 in column A).
2. *RWR³'DWDDQDO\VLV´XQGHUWKH³'DWD´WDE
3. 6HOHFW³6DPSOLQJ´DQGFOLFN2.
4. Click inside the input range box and highlight the cells 1 to 24 in column A. (You will see
$A$1:$A$24 inside the box.)
5. 8QGHUWKH6DPSOLQJ0HWKRGVHOHFW³5DQGRP´DQGHQWHUWKH³1XPEHURI6DPSOHV´DV
6. 1H[WVHOHFWWKH³2XWSXW5DQJH´FOLFNLQVLGHWKHER[DQGhighlight the cell where you want to
start your output (say cell 1 column C).
7. Click OK.
6
lOMoARcPSD|32314091
Example 1: A market researcher selects 500 drivers under 30 years of age and 500 drivers
above 30 years of age.
Systematic Sampling: Obtained by selecting every k-th individual from the population. The
first individual selected correspond to a random number between 1 and k.
Example 2: The manager of a food store wants to measure the satisfaction of the customer
using a sample of 40. He decides to survey every 7-th customer.
To start, he randomly determines a number between 1 and 7, say 5. He then surveys the 5-th
customer exiting the store and every 7-th customer thereafter until he has a sample of 40.
7KLVVXUYH\ZLOOLQFOXGHFXVWRPHU«
Example 3: A sociologist wants to gather data regarding household income with the city of
Boston.
This can be set up so that each city block is a cluster and then he can obtain a simple random
sample of the city blocks and survey all households on the blocks selected.
s
Sim
St
7
lOMoARcPSD|32314091
Strat : from
group
cluster :
Note: Stratified versus Cluster sampling group from group
x In stratified sampling, we divide the population into groups and obtain a simple random
sample from each group.
x In cluster sampling, we divide the population into groups and obtain a simple random
sample of the groups and survey all individuals in these groups.
Convenient Sample: is one in which the individuals in the sample are easily obtained.
Any studies that use this type of sampling generally yield unrealistic results.
*
8
lOMoARcPSD|32314091
Definition: If the results of the sample are not representative of the population, then the sample
has bias.
For example, a sample of customers at a shopping mall will almost surely over-represent the
middle-class and retired people and under-represent the poor. So, this is a biased sample.
1. Sampling bias means that the technique used to obtain the individuals to be in the sample tends
to favor one part of the population over another.
Undercoverage is a type of sampling bias, occurs when the population of one segment of the
population is lower in a sample than it is in the population.
2. Non-response bias exists when individuals selected to be in the sample who do not respond to
the survey have different opinions from those who do.
3. Response bias exists when the answers on a survey do not reflect the true feelings of the
respondent.
Example 1: Determine the type of bias in each of the following can occur:
i. To conduct a study, a store manager selects first 60 customers who enter his store on a
Saturday morning.
ii. To conduct a survey on sleeping habits, you obtain a simple random sample of 150 students.
2QHTXHVWLRQRQWKLVVXUYH\LV³How much sleep do you get?´
iii. 7RGHWHUPLQHWKHSXEOLF¶VRSLQLRQRIWKHSROLFHGHSDUWPHQWDIWHUVHOHFWLQJDFOXVWHUVDPSOH
a uniformed police officers go door to door to conduct the survey.
iv. To estimate the percentage of households that speaks a foreign language, a polling
organization mails a questionnaire to 1000 randomly selected households. Of the 1000
households selected, 24 responded.
¾ Non-sampling errors result from sampling bias, non-response bias or data-entry error. Such
errors could also be present in a complete census of the population.
¾ Sampling errors result from using a sample to estimate information about a population. This
type of error occurs because a sample gives incomplete information about a population.
10