0% found this document useful (0 votes)
4 views

1 Chapter 1 Lecture Notes

The document outlines Standard 1 on Sampling and Data, covering key topics such as definitions of statistics, types of data, sampling methods, and experimental design. It emphasizes the importance of understanding statistical vocabulary, levels of measurement, and ethical considerations in research. Various sampling techniques, including random, stratified, and cluster sampling, are discussed along with the significance of recognizing sampling errors and non-sampling errors.

Uploaded by

wqj68
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1 Chapter 1 Lecture Notes

The document outlines Standard 1 on Sampling and Data, covering key topics such as definitions of statistics, types of data, sampling methods, and experimental design. It emphasizes the importance of understanding statistical vocabulary, levels of measurement, and ethical considerations in research. Various sampling techniques, including random, stratified, and cluster sampling, are discussed along with the significance of recognizing sampling errors and non-sampling errors.

Uploaded by

wqj68
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Standard 1:

Sampling and Data

Topics: Objectives:
● Definitions of statistics, probability and Key terms ● Recognize and differentiate between key terms
● Data, Sampling and Variation in Data and ● Apply various types of sampling methods to data
Sampling collection
● Frequency Table and Levels of Measurement ● Create and interpret frequency tables
● Experimental Design and Ethics

Vocabulary:
Average Blinding Categorical Variable

Cluster Sampling Continuous Random Variable Control Group

Convenience Sampling Cumulative Relative Frequency Data

Discrete Random Variable Double-blinding Experimental Unit

Explanatory Variable Frequency Informed Consent

Institutional Consent Institutional Review Board Lurking Variable

Nonsampling Error Numerical Variable Parameter

Placebo Population Probability

Proportion Qualitative Data Quantitative Data

Random Sampling Relative Frequency Representative Sample

Response variable Sample Sampling Bias

Sampling Error Sampling with Replacement Sampling without Replacement

Statistics Stratified Sampling Systematic Sampling

Treatments Variable
Notes Chapter 1:

Statistics is the science of _____________, ________________,


________________, and _______________ data in order to
make a _______________.

There are two types of Data:



Probability is a mathematical tool used to study ___________________.


It deals with the chance of an event ___________________.
The probability of an event is always between _______ and _________.

In statistics we have a required common vocabulary that we must all use:

Population:

Parameter:

Sample:

Statistics:


How Population and Samples Relate:

Example:
We want to know the mean amount of money first year college students spend at Shenandoah
University on school supplies that do not include books. We randomly survey 100 first year students
at the college. Three of those students spent $150, $200, and $225, respectively.

What is the population?

What is the sample?

What is the parameter?

What is the statistic?

Statistical Variables: a characteristics of ____________for each ___________ or __________ in a

_____________________.

Types of Variables:
● Numerical or Quantitative Data:

● Categorical or Qualitative Data:


In Statistics we will measure and work with lots of
Data!!! Know the difference!!

Qualitative data - are the result of ___________________ or ___________ attributes of a


population. Hair color, blood type, ethnic group, the car a person drives, and the street a person lives
on are examples of qualitative data. Qualitative data are generally described by __________or
_____________.

Quantitative data - are always ___________ and are the result of _____________ or
____________ attributes of a population. Amount of money, pulse rate, weight, number of people
living in your town, and number of students who take statistics are examples of quantitative data.
Quantitative data may be either __________ or ________________.

Discrete data - all data that are the result of ___________are called ____________________
data. These data take on only certain ___________ values. If you count the number of phone calls
you receive for each day of the week, you might get values such as zero, one, two, or three.

Continuous data - all data that are the result of ____________ are
_________________________ data assuming that we can measure accurately. If you and your
friends carry backpacks with books in them to school, the numbers of books in the backpacks are
discrete data and the weights of the backpacks are continuous data.

Example:

You go to the supermarket and purchase three cans of soup (19 ounces) tomato bisque, 14.1 ounces
lentil, and 19 ounces Italian wedding, two packages of nuts (walnuts and peanuts), four different
kinds of vegetable (broccoli, cauliflower, spinach, and carrots), and two desserts (16 ounces Cherry
Garcia ice cream and two pounds 32 ounces chocolate chip cookies).

What of the data set is quantitative discrete?

What of the data set is quantitative continuous?

What of the data set is qualitative?


Sampling: Gathering information about an entire population often costs too much or is virtually
impossible. Instead, we use a _________ of the population. A sample should have the
_________________________________ as the population it is representing. Most
statisticians use various methods of _______________________ in an attempt to achieve this
goal. The best method of sampling is a ___________________________.
A simple random sample is a straightforward method for selecting a random sample
● Give each member of the population a number.
● Use a random number generator to select a set of labels.
● These randomly selected labels identify the members of your sample.

Example:

Suppose Lisa wants to form a four-person study group from her pre-calculus class, which has 31
members.
● To choose a simple random sample of size three from the other members of her class, Lisa
could put all 30 names in a hat, shake the hat, close her eyes, and pick out three names.
● Lisa could put all of her classmates into an order (maybe alphabetical) and use the Random
Number generator to select her three members. (using two-digit numbers and ignoring
numbers greater than 31)

Stratified Sample:
To choose a stratified sample,
● ___________ the _______________ into groups called strata
● Take a ___________________ number from each stratum.

Example:

You could stratify (group) your college population by department and then choose a proportionate
simple random sample from each stratum (each department) to get a stratified random sample.
Suppose there are 5 departments, and you want to choose a sample of 50. You choose 10 from each
department using SRS sampling.
Cluster Sample:
To choose a cluster sample
● ________the ______________ into clusters (groups)
● ____________ select some of the clusters.
● All the members from these clusters are in the cluster sample.

Example:
If you randomly sample four departments from your college population, the four departments make
up the cluster sample. Divide your college faculty by department. The departments are the clusters.
Number each department, and then choose four different numbers using simple random sampling.
All members of the four departments with those numbers are the cluster sample.

Systematic Sample:
To choose a systematic sample:
● ___________ select a ______________
● Take every ________ piece of data from a listing of the population.

Example:
Suppose you have to do a phone survey. Your phone book contains 20,000 residence listings. You
must choose 400 names for the sample. Number the population 1–20,000 and then use a simple
random sample to pick a number that represents the first name in the sample. Then choose every
fiftieth name thereafter until you have a total of 400 names (you might have to go back to the
beginning of your phone list). Systematic sampling is frequently chosen because it is a simple method.
Convenience Sampling:
A type of sampling that is _______________ is convenience sampling. Convenience sampling
involves using results that are _____________________________.

Example:
A computer software store conducts a marketing study by interviewing potential customers who
happen to be in the store browsing through the available software. The results of convenience
sampling may be very good in some cases and highly biased (favor certain outcomes) in others.

With Replacement or Without Replacement:


True random sampling is done________replacement. That is, once a member is picked, that member goes
_________ into the population and thus may be chosen _______________________________.

However for practical reasons, in most populations, simple random sampling is done _________
replacement. Surveys are typically done without replacement. That is, a member of the population may be
chosen only once.

Most samples are taken from _______ populations and the sample tends to be ______ in comparison to the
population. Since this is the case, sampling without replacement is ___________________ the same as
sampling with replacement because the chance of picking the same individual more than once with
replacement is __________________.

Sampling Errors and Non-Sampling Errors:

When you analyze data, it is important to be aware of sampling _______ and non-sampling errors.
The actual ________of sampling causes sampling errors.

For example:

● The sample may not be large enough.


● Factors not related to the sampling process cause non-sampling errors.
● A defective counting device can cause a non-sampling error.

In reality, a sample will __________be exactly representative of the population so there will always
be ________ sampling error. As a rule, the _______ the sample, the _________ the sampling
error.
Critical Evaluation:
We need to evaluate the statistical studies we read about critically and analyze them
____________________________ the results of the studies. Common problems to be aware of
include:

Example:
A study is done to determine the average tuition that San Jose State undergraduate students pay per semester.
Each student in the following samples is asked how much tuition he or she paid for the Fall semester. What is
the type of sampling in each case?
Variation:

Variation is present in ________ set of data.

Example:

16-ounce cans of beverage may contain more or less than 16 ounces of liquid. In one study, six 16
ounce cans were measured and produced the following amount (in ounces) of beverage: 15.8 16.1
15.2 14.8 15.8 15.9

Measurements of the amount of beverage in a 16-ounce can may vary because different people make
the measurements or because the exact amount, 16 ounces of liquid, was not put into the cans.
Manufacturers regularly run tests to determine if the amount of beverage in a 16-ounce can falls
within the desired range.

Answers and Rounding off in Statistics:

In this course, I would like you to round to ______ decimal places, unless the answer automatically
rounds to one place or is not a decimal.

It is ______ necessary to reduce most fractions in this course. Especially in Probability Topics, the
chapter on probability, it is more helpful to leave an answer as an unreduced _____________
Levels of Measurement:

The way a set of data is __________ is called its level of measurement. Correct statistical
procedures depend on a researcher being familiar with levels of measurement. Not every statistical
operation can be used with every set of data. Data can be classified into ______ levels of
measurement. They are (from lowest to highest level):

• _________ scale level


○ Data that is measured using a nominal scale is ______________ (categorical).
Categories, colors, names, labels and favorite foods along with yes or no responses are
examples of nominal level data. Nominal scale data are _____________________.

○ Example: Trying to order people according to their favorite food does not make any
sense. Putting veggie pizza first and vegan sushi second is not meaningful.

• ________ scale level


○ Data that is measured using an ordinal scale is similar to nominal scale data but there is
a big difference. The ordinal scale data __________ ordered.

○ Example of ordinal scale data is a list of the top five national parks in the United States.
The top five national parks in the United States can be ranked from one to five but we
cannot measure differences between the data.

• ________ scale level


○ Data that is measured using the interval scale is similar to ordinal level data because it
has a __________ ordering but there is a ___________ between data.
○ Example: Temperature scales like Celsius (C) and Fahrenheit (F) are measured by using
the interval scale.

• ________ scale level


○ Data that is measured using the ratio scale takes care of the ____ problem and gives
you the _____ information. Ratio scale data is like interval scale data, but it has a
___________ and ratios can be _________.
○ Example: four multiple choice statistics final exam scores are 80, 68, 20 and 92 (out of a
possible 100 points). The exams are machine-graded. The data can be put in order from
lowest to highest: 20, 68, 80, 92.
○ The differences between the data have meaning. The score 92 is more than the score 68
by 24 points. Ratios can be calculated. The smallest score is 0. So 80 is four times 20.
The score of 80 is four times better than the score of 20.
Level of Put data in Arrange data in Subtract Determine if one data
Measurement categories order data values value is a multiple of
another

Nominal Yes No No No

Ordinal Yes Yes No No

Interval Yes Yes Yes No

Ratio Yes Yes Yes Yes

Frequency:
Twenty students were asked how many hours they worked per day. Their responses, in hours, are as
follows:
5 6 3 3 2 4 7 5 2 3 5 6 5 4 4 3 5 2 5 3
Table below lists the different data values in ascending order and their frequencies.

Relative Frequency:
A relative frequency is the ______ (fraction or proportion) of the number of times a value of the data
_______ in the set of all outcomes to the ______ number of outcomes. To find the relative
frequencies, ________ each frequency by the _______number of students in the sample–in this
case, 20. Relative frequencies can be written as ___________, ____________, or __________.

Cumulative Relative Frequency:


Cumulative relative frequency is the _________________ of the previous relative frequencies. To
find the cumulative relative frequencies, ______ all the previous relative frequencies to the relative
frequency for the current row, as shown in Table. The sum of the values in the relative frequency
column of the table is_____. The last entry of the cumulative relative frequency column is _____,
indicating that ________________ percent of the data has been accumulated.
Experiment:
The purpose of an experiment is to investigate the relationship between two variables. When one
variable causes change in another, we call the first variable the ________________ __________.
The affected variable is called the __________ ____________ In a randomized experiment, the
researcher manipulates values of the explanatory variable and measures the resulting changes in the
response variable. The different values of the explanatory variable are called _________________.
An ________________ ________ is a single object or individual to be measured.

Example:
Researchers want to investigate whether taking aspirin regularly reduces the risk of heart attack. Four
hundred men between the ages of 50 and 84 are recruited as participants. The men are divided
randomly into two groups: one group will take aspirin, and the other group will take a placebo. Each
man takes one pill each day for three years, but he does not know whether he is taking aspirin or the
placebo. At the end of the study, researchers count the number of men in each group who have had
heart attacks.

● What is the population?

● What is the sample?

● What is the experimental units?

● What is the explanatory variable?

● What is the response variable?

● What is the treatments?


Ethics:
The U.S. Department of Health and Human Services oversees federal regulations of research studies
with the aim of protecting participants. When a university or other research institution engages in
research, it must ensure the ________ of all human subjects. For this reason, research institutions
establish oversight committees known as Institutional Review Boards _________. All planned
studies must be _________in advance by the IRB. Key protections that are mandated by law include
the following:
• Risks to participants must be ____________ and ___________with respect to projected
benefits.
• Participants must give _________ consent. This means that the risks of participation must be
________ explained to the subjects of the study. Subjects must consent in __________, and
researchers are required to keep documentation of their consent.
• Data collected from individuals must be guarded carefully to protect their privacy.

You might also like