Stat Notes ch1
Stat Notes ch1
Definitions
Data and Statistics
• Statistics is the science of collecting, organizing, analyzing, and interpreting data in
order to make decisions.
• A Statistical study refers to any project consisting of or based on assembling,
classifying, and/or tabulating numerical data to present significant information
about a given subject.
• Data consists of information coming from observations, counts, measurements,
responses, or from any type of study.
Two types of data sets:
• Population: The collection of all outcomes, responses, measurements, or counts
that are of interest.
• Sample: A subset, or part, of a population.
Two types of numerical descriptions:
• Parameter: A numerical description of a population characteristic.
A number describing the entire population.
• Statistic: A numerical description of a sample characteristic.
A number describing a sample (a subset of a population).
Two branches of statistics:
• Descriptive Statistics is the branch of statistics that involves the organization,
summarization, and display of data.
• Inferential Statistics is the branch of statistics that involves using a sample to draw
conclusions about a population. A basic tool in the study of inferential statistics is
probability.
Case Study 1
A new study has confirmed that smoking e-cigarettes can have detrimental effects on
health. Researchers published the first longitudinal study of the effects of vaping in
the American Journal of Preventive Medicine, and findings showed that continuous use of
e-cigarettes increases one’s risk of developing chronic respiratory disease. The study
examined data from around 32,000 adults in the United States over a three-year period
between 2013 and 2016.
Prior to the study, none of the adults had any signs of lung disease, but by 2016,
researchers found that those who vaped were 30 percent more likely to develop a chronic
lung disease — including asthma, bronchitis, and emphysema. The study controlled for
combustible tobacco smoking, demographic, and clinical variables.
*Study taken from: https://people.com/health/new-study-vaping-linked-to-increased-risk-lung-disease/
1
Section 1.1: An Overview of Statistics—the Vocabulary of Statistics
1. Identify the sample in Case Study 1. 2. Identify the population in Case Study 2.
3. In Case Study 1, researchers found that of those who vaped, 30 percent were more
likely develop a chronic lung disease. Does the “30 percent” represent a parameter or a
statistic? Explain your answer.
4. Determine which part of Case Study 1 represents the descriptive branch of statistics.
Case Study 2
A survey of 1060 parents of teenagers found that 63.6% of parents have checked their
teen’s social media profile.
6. Identify the sample in Case Study 2. 7. Identify the population in Case Study 2.
2
Section 1.1: An Overview of Statistics—the Vocabulary of Statistics
Case Study 3
The National Football League reported that player concussions in the 2018 regular season
were down 29 percent from the previous year. The league said there were 135
documented concussions in 2018, which was down from 190 documented concussions in
2017.
8. In Case Study 3, the player concussions in 2018 were down 29 percent from 2017. Does
the “29 percent” represent a parameter or a statistic? Choose the correct statement
below.
o Statistic since this value is a numerical measurement describing a characteristic of a
population.
o Parameter since this value is a numerical measurement describing a characteristic of
a population.
o Statistic since this value is a numerical measurement describing a characteristic of
sample.
o Parameter since this value is a numerical measurement describing a characteristic of
a sample.
Case Study 4
According to AAA on August 16, 2018, the national average of regular grade gasoline was
$2.854 per gallon.
9. How do you think the average price of gas in Case Study 4 was determined?
10. Identify the sample in Case Study 4. 11. Identify the population in Case Study 4.
12. Is the numerical description “average of $2.854 per gallon” in Case Study 4 represent a
parameter or a statistic? Explain your answer.
3
Section 1.1: An Overview of Statistics—the Vocabulary of Statistics
Case Study 5
A report from the Framingham Offspring Study suggests that marriage is truly
heartwarming. Scientists evaluated 3,682 men in the U.S. over a 10-year period. They
found that married men had a 46% lower rate of cardiovascular disease than unmarried
men.
13. Identify the sample in Case Study 5. 14. Identify the population in Case Study 5.
4
Section 1.3: Data Collection and Experimental Design
Sampling Techniques:
• Simple Random Sample: A sample in which every possible sample of the same size
has the same chance of being selected. This can be done by assigning a different
number to each member of the population, and then using a random number
generator to select the subjects.
• Stratified Sample: Members of the population are divided into two or more subsets
that share similar characteristics such as age, gender, ethnicity, etc. A sample is
then randomly selected from each subset.
• Cluster Sample: The population falls into naturally occurring subgroups, each
having similar characteristics. All of the members of one or more of the subgroups
(clusters) is selected.
• Systematic Sample: The members are ordered in some way, a starting number is
randomly selected, and then the subjects are selected at regular intervals from the
starting number. For example, every 5th person is selected.
• Convenience Sample: Consists only of members of the population that are easy to
get.
A biased sample is one that does not represent the population being studied. Because
convenience sampling does not ensure that the members are representative of the
population, it is often a common type of biased sampling.
5
Section 1.3: Data Collection and Experimental Design
3. You place a link to a survey on the ARC homepage. Those who want to participate in the
survey may click the link and answer the questions.
4. Every tenth person entering a mall is asked to name his or her favorite store.
5. Southwest Airlines want to know if their customers are satisfied. They select five flights
on a given day and asks everyone on the flight to complete a brief survey.
6. Using random digit dialing, researchers call 1400 people and ask what obstacles (if any)
keep them from exercising.
6
Section 1.3: Data Collection and Experimental Design
7. If you want to determine the average age of the 115 residents of a retirement
community would you conduct a census or use a sampling technique?
Choose the correct answer below.
o The study would use cluster sampling because the residents of a retirement
community fall into naturally occurring subgroups.
o The study would use stratified sampling because it would be important to have
members from each segment of the population.
o The study would use a census, because the population is small enough for it to be
practical to record all of the responses.
o The study would use simple random sampling because it would be easy to randomly
select a smaller number of residents of the retirement community.
8. If you want to determine the average commute time for ARC students would you
conduct a census or use a sampling technique?
10. To study the effects of music on driving habits, 100 drivers drove 500 miles while
listening to various types of music. The researchers had the first 25 listen to rock, the next
25 listen to hip-hop, the next 25 listen to classical, and the final 25 listen to no music.
7
Section 1.3: Data Collection and Experimental Design
12. A footwear company tested a new type of shoe design on subjects of similar athletic
ability. The researchers had half the subjects wear the new design, and the other half
wear the old design. Athletic events were performed, and the researchers measured their
athletic ability.
Designing an Experiment:
Three key elements of a well-designed experiment are control, randomization, and
replication.
Control: Experiments can be ruined by a variety of factors. It is therefore important to
control the following influential factors.
• Confounding Variable: Occurs when the researcher cannot tell the difference
between the effects of different factors on the variable.
In other words—It is a factor other than the one being studying that affects the
outcome of the experiment.
• Placebo Effect: Occurs when a subject reacts favorably to a placebo (fake
treatment). A technique to help minimize the placebo effect is called blinding. This
is where the subjects do not know whether they are receiving a treatment or a
placebo. In a double-blind experiment, neither the experimenter nor the subjects
know if the subjects are receiving treatment or a placebo.
8
Section 1.3: Data Collection and Experimental Design
Determine whether the below survey question is biased. If the question is biased,
suggest a better wording.
13. Why is drinking beer bad for you? Choose the best option below.
o The question is biased. The wording: "Do you think that drinking beer is bad for
you?" would be better.
o The question is biased. The wording: "Why is drinking beer good for you?" would
be better.
o The question is biased. The wording: "How do you think drinking beer affects your
health?” would be better.
o The question is biased. The wording: "Do you think that beer is good for you?"
would be better.