0% found this document useful (0 votes)
10 views10 pages

BS - Ch1 - Data Collection

The document provides an introduction to statistics, covering key concepts such as population, sample, and the distinction between parameters and statistics. It explains different types of variables (qualitative vs quantitative, discrete vs continuous) and outlines various sampling methods, including simple random sampling, stratified sampling, systematic sampling, and cluster sampling. Additionally, it discusses potential biases in sampling and the importance of ensuring representative samples to avoid misleading results.

Uploaded by

ethan19hg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

BS - Ch1 - Data Collection

The document provides an introduction to statistics, covering key concepts such as population, sample, and the distinction between parameters and statistics. It explains different types of variables (qualitative vs quantitative, discrete vs continuous) and outlines various sampling methods, including simple random sampling, stratified sampling, systematic sampling, and cluster sampling. Additionally, it discusses potential biases in sampling and the importance of ensuring representative samples to avoid misleading results.

Uploaded by

ethan19hg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

lOMoARcPSD|32314091

Chapter 1
Data collection

1.1 Introduction to the Practice of Statistics

Definition: Statistics is the science of collecting, organizing, summarizing and analyzing


information to draw conclusions or answer questions.
In addition, statistics is about providing a measure of confidence in any conclusions.

¾ Population: The entire group to be studied


¾ Sample: A subset of the population that is being studied.
¾ Individual: A person/object of the population that is being studied.

Example 1: Suppose you want to check the scenario ³([HUFLVLQJGDLO\LVLPSRUWDQW´ by determining


the percent of students in your school, who would agree with this fact.
x Population: every student at the school.
We can see this is difficult to check. The second possibility is to pick 50 students and check the
scenario.
x Sample: 50 students

A statistic is a numerical summary of a sample.


A parameter is a numerical summary of population.

Example 2: Parameter versus statistics


Suppose the percentage of all students on your campus who have a job is 84.9%. This value represents
a parameter.
Suppose we pick a sample of 250 students and from this sample, we found 84.6% have a job. This
value represents a statistic.

of all parameter
Percentage
=

sample = statistic 1
lOMoARcPSD|32314091

Descriptive statistics consists of organizing and summarizing data through tables, graphs and
numerical summaries. (We will study these in Chapters 2 to 4.)
Inferential statistics uses methods that extend a result from a sample to the population, and measure
the reliability of the result. (We will study these in Chapters 5 to 11.)

The process of statistics

1. Identify the research objective. (What questions to be asked? Which questions must clearly
identify the population to be studied?)
2. Collect the data needed to answer questions. (We typically look at a sample)
3. Describe the data. (Use descriptive statistics)
4. Perform inference. (Apply appropriate techniques to extend the results obtained from the
sample to the population.)

Distinguish between Qualitative and Quantitative Variables

Variables are the characteristics of the individuals within the population.

Example 3: Consider a fully-grown tomato plant and suppose we collect information about the
tomatoes harvested.
¾ Individuals: Tomatoes -
Broad
¾ Variables: Weight, Color, age, size (approx. circumference), etc.
1

Differences

Qualitative (categorical) variables - Red


> classify
-
Round
Allow for classification based on characteristics.

Variables
Quantitative variables -
numerical s
Provide numerical measures for which arithmetic
operations, such as adding, averaging make sense.

2
lOMoARcPSD|32314091

Example 4: Classify each of the following variables as qualitative or quantitative:

a) Nationality: Qual

b) Number of children: Quant


c) Level of education: Qual
d) Annual household income: Quant
e) Daily intake of whole grains in grams: Quant

Distinguish between Discrete and Continuous Variables

Discrete variables -
Count
Have either a finite or countable number of possible values.

(If you count to get the value of a quantitative variable, it is discrete.)

Quantitative
Continuous variables -
measure
Variables
Have an infinite number of possible values that are not countable.
These may take on every possible value between any two values.

(If you measure to get the value of a quantitative variable, it is continuous.)

Example 5: Classify each of the following quantitative variables as discrete or continuous:

a) Number of children: Discrete

* b) Annual household income: Continuous

c) Daily intake of whole grains in grams: continuous

¾ The list of observations a variable assumes is called data.


For example, variable: gender data: male, female

¾ Qualitative data: Observations corresponding to a qualitative variable.


¾ Quantitative data: Observations corresponding to a quantitative variable.
o Discrete data
o Continuous data

3
lOMoARcPSD|32314091

1.3 Simple Random Sampling

Definition: Random sampling is the process of using chance to select individuals from a
population to be included in the sample.

Note: If the convenience is used to obtain a sample, the results of the survey are meaningless.

Example 1: Suppose you want to find the proportion of the students on your campus who work.
It is convenient to survey the students in your class. BUT do these students represent the overall
students?
Probably not! So, any results reported from your survey will be misleading.

We will discuss four basic sample techniques in this chapter. These are designed so that any selection
biases introduced during the selection process are eliminated.
x Simple random sampling (SRS)
x Stratified sampling
x Systematic sampling
x Cluster sampling.

Obtaining a Simple Random Sample

A sample of size n from a population of size N is obtained through simple random sampling, if
every possible sample of size n has an equally likely chance of occurring. Such a sample is called a
simple random sample.

Note: In other words, simple random sampling is like selecting names from a hat.

Example 2: Suppose a study group consists of 5 students:


Bob (B) Mike (M) Jan (J) Sean (S) Lucy (L)
2 of the students must go to the board to demonstrate a HW problem. List all possible samples of size
2 (without replacement).
Solution: BJ BS BL
Bm -10 solutions
MJ MS ML

JS JL
S2 4
lOMoARcPSD|32314091

How to use a table of random numbers to obtain a simple random sample?

Procedure:
1. Obtain a frame that lists all the individuals in the population of interest. (say there are N individuals)
2. Number the individuals in the frame 1 to N.
3. Pick a starting place in the table. (This is done by closing our eyes and placing the pencil on it to
accomplish the goal of being random.)
4. Then we read and write down the numbers until we obtain a sample of desired size.
a. If your N has k digits, then we read k digit numbers from the table.
b. We can read the numbers from left to right or downwards.
c. You may ignore if a number is repeated.

The following shows a random number table:


Depends on population size (50 ?)

Column 4 2 #'s

-
fr

Row 13

We skip 52 since it is larger than 24.


5
lOMoARcPSD|32314091

Example 3: Obtain a SRS of size 4 from the following population using the above table of random
numbers. You may start at column 4, row 13 and read numbers downward.
1 Bob 9 frank 17 Heidi

2 Mike 10 Luke 18 Peter

3 Alice A Elsa 19 Kate

4 Sean 12 Greg 20 Nate


5 Jake 13 Olive 21 Walter
6 Carla 14 Ian 22 Tom
7 Dean 15 Ryan 23 Zack
8 Young 16 Vinci 2-4 Xu

Solution:
24 IN -24
Population
-

⑧ Sampley
01 : Bob
52
④ 07 : Dean

44 11 : Elsa
46
37 23 : Zack

230 _

Technology
We can use excel to obtain a simple random sample for Example 3.
1. Type 1 to 24 in a column in an excel spreadsheet (say cells 1 to 24 in column A).
2. *RWR³'DWDDQDO\VLV´XQGHUWKH³'DWD´WDE
3. 6HOHFW³6DPSOLQJ´DQGFOLFN2.
4. Click inside the input range box and highlight the cells 1 to 24 in column A. (You will see
$A$1:$A$24 inside the box.)
5. 8QGHUWKH6DPSOLQJ0HWKRGVHOHFW³5DQGRP´DQGHQWHUWKH³1XPEHURI6DPSOHV´DV
6. 1H[WVHOHFWWKH³2XWSXW5DQJH´FOLFNLQVLGHWKHER[DQGhighlight the cell where you want to
start your output (say cell 1 column C).
7. Click OK.

6
lOMoARcPSD|32314091

1.4 Other Effective Sampling Methods

Stratified Sampling: obtained by separating the population into homogeneous, non-


overlapping groups called strata and then obtaining a simple random sample from each
stratum.

Example 1: A market researcher selects 500 drivers under 30 years of age and 500 drivers
above 30 years of age.

Systematic Sampling: Obtained by selecting every k-th individual from the population. The
first individual selected correspond to a random number between 1 and k.

Example 2: The manager of a food store wants to measure the satisfaction of the customer
using a sample of 40. He decides to survey every 7-th customer.
To start, he randomly determines a number between 1 and 7, say 5. He then surveys the 5-th
customer exiting the store and every 7-th customer thereafter until he has a sample of 40.
7KLVVXUYH\ZLOOLQFOXGHFXVWRPHU«

Cluster Sampling: Obtained by selecting all individuals within a randomly selected


collection of groups of individuals.

Example 3: A sociologist wants to gather data regarding household income with the city of
Boston.
This can be set up so that each city block is a cluster and then he can obtain a simple random
sample of the city blocks and survey all households on the blocks selected.
s
Sim

St

7
lOMoARcPSD|32314091

Strat : from
group
cluster :
Note: Stratified versus Cluster sampling group from group
x In stratified sampling, we divide the population into groups and obtain a simple random
sample from each group.
x In cluster sampling, we divide the population into groups and obtain a simple random
sample of the groups and survey all individuals in these groups.

Convenient Sample: is one in which the individuals in the sample are easily obtained.
Any studies that use this type of sampling generally yield unrealistic results.
*

8
lOMoARcPSD|32314091

1.5 Bias in Sampling

Definition: If the results of the sample are not representative of the population, then the sample
has bias.

For example, a sample of customers at a shopping mall will almost surely over-represent the
middle-class and retired people and under-represent the poor. So, this is a biased sample.

Three sources of bias


1. Sampling bias
2. Non-response bias
3. Response bias

1. Sampling bias means that the technique used to obtain the individuals to be in the sample tends
to favor one part of the population over another.
Undercoverage is a type of sampling bias, occurs when the population of one segment of the
population is lower in a sample than it is in the population.

2. Non-response bias exists when individuals selected to be in the sample who do not respond to
the survey have different opinions from those who do.

Non-response can be improved through call-backs or rewards/incentives.

3. Response bias exists when the answers on a survey do not reflect the true feelings of the
respondent.

Types of response bias:


a) ,QWHUYLHZHUHUURU $JRRGLQWHUYLHZHUFDQREWDLQDWUXWKIXODQVZHUWR³Have you ever
cheated on your taxes?´
b) Misrepresented answers
c) Words used in survey questions. (³How much do you study?´YV³How many hours do
you study statistics each week?´)
d) Order of the questions or words within the questions.
9
lOMoARcPSD|32314091

Example 1: Determine the type of bias in each of the following can occur:
i. To conduct a study, a store manager selects first 60 customers who enter his store on a
Saturday morning.

ii. To conduct a survey on sleeping habits, you obtain a simple random sample of 150 students.
2QHTXHVWLRQRQWKLVVXUYH\LV³How much sleep do you get?´

iii. 7RGHWHUPLQHWKHSXEOLF¶VRSLQLRQRIWKHSROLFHGHSDUWPHQWDIWHUVHOHFWLQJDFOXVWHUVDPSOH
a uniformed police officers go door to door to conduct the survey.

iv. To estimate the percentage of households that speaks a foreign language, a polling
organization mails a questionnaire to 1000 randomly selected households. Of the 1000
households selected, 24 responded.

¾ Non-sampling errors result from sampling bias, non-response bias or data-entry error. Such
errors could also be present in a complete census of the population.

¾ Sampling errors result from using a sample to estimate information about a population. This
type of error occurs because a sample gives incomplete information about a population.

10

You might also like