0% found this document useful (0 votes)
17 views9 pages

Stat Notes ch1

The document provides an introduction to statistics, defining key concepts such as data, population, sample, parameter, and statistic. It outlines the two branches of statistics—descriptive and inferential—and discusses the importance of statistical studies, data collection methods, and sampling techniques. Additionally, it includes case studies to illustrate these concepts and emphasizes the design of experiments and observational studies.

Uploaded by

nvbr4s56jt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

Stat Notes ch1

The document provides an introduction to statistics, defining key concepts such as data, population, sample, parameter, and statistic. It outlines the two branches of statistics—descriptive and inferential—and discusses the importance of statistical studies, data collection methods, and sampling techniques. Additionally, it includes case studies to illustrate these concepts and emphasizes the design of experiments and observational studies.

Uploaded by

nvbr4s56jt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Probability and Statistics

Chapter 1 Introduction to Statistics


Section 1.1: An Overview of Statistics—the Vocabulary of Statistics

Definitions
Data and Statistics
• Statistics is the science of collecting, organizing, analyzing, and interpreting data in
order to make decisions.
• A Statistical study refers to any project consisting of or based on assembling,
classifying, and/or tabulating numerical data to present significant information
about a given subject.
• Data consists of information coming from observations, counts, measurements,
responses, or from any type of study.
Two types of data sets:
• Population: The collection of all outcomes, responses, measurements, or counts
that are of interest.
• Sample: A subset, or part, of a population.
Two types of numerical descriptions:
• Parameter: A numerical description of a population characteristic.
A number describing the entire population.
• Statistic: A numerical description of a sample characteristic.
A number describing a sample (a subset of a population).
Two branches of statistics:
• Descriptive Statistics is the branch of statistics that involves the organization,
summarization, and display of data.
• Inferential Statistics is the branch of statistics that involves using a sample to draw
conclusions about a population. A basic tool in the study of inferential statistics is
probability.
Case Study 1
A new study has confirmed that smoking e-cigarettes can have detrimental effects on
health. Researchers published the first longitudinal study of the effects of vaping in
the American Journal of Preventive Medicine, and findings showed that continuous use of
e-cigarettes increases one’s risk of developing chronic respiratory disease. The study
examined data from around 32,000 adults in the United States over a three-year period
between 2013 and 2016.
Prior to the study, none of the adults had any signs of lung disease, but by 2016,
researchers found that those who vaped were 30 percent more likely to develop a chronic
lung disease — including asthma, bronchitis, and emphysema. The study controlled for
combustible tobacco smoking, demographic, and clinical variables.
*Study taken from: https://people.com/health/new-study-vaping-linked-to-increased-risk-lung-disease/

1
Section 1.1: An Overview of Statistics—the Vocabulary of Statistics

1. Identify the sample in Case Study 1. 2. Identify the population in Case Study 2.

3. In Case Study 1, researchers found that of those who vaped, 30 percent were more
likely develop a chronic lung disease. Does the “30 percent” represent a parameter or a
statistic? Explain your answer.

4. Determine which part of Case Study 1 represents the descriptive branch of statistics.

5. What inference(s) may be drawn from Case Study 1?

Case Study 2
A survey of 1060 parents of teenagers found that 63.6% of parents have checked their
teen’s social media profile.
6. Identify the sample in Case Study 2. 7. Identify the population in Case Study 2.

2
Section 1.1: An Overview of Statistics—the Vocabulary of Statistics

Case Study 3
The National Football League reported that player concussions in the 2018 regular season
were down 29 percent from the previous year. The league said there were 135
documented concussions in 2018, which was down from 190 documented concussions in
2017.
8. In Case Study 3, the player concussions in 2018 were down 29 percent from 2017. Does
the “29 percent” represent a parameter or a statistic? Choose the correct statement
below.
o Statistic since this value is a numerical measurement describing a characteristic of a
population.
o Parameter since this value is a numerical measurement describing a characteristic of
a population.
o Statistic since this value is a numerical measurement describing a characteristic of
sample.
o Parameter since this value is a numerical measurement describing a characteristic of
a sample.
Case Study 4
According to AAA on August 16, 2018, the national average of regular grade gasoline was
$2.854 per gallon.
9. How do you think the average price of gas in Case Study 4 was determined?

10. Identify the sample in Case Study 4. 11. Identify the population in Case Study 4.

12. Is the numerical description “average of $2.854 per gallon” in Case Study 4 represent a
parameter or a statistic? Explain your answer.

3
Section 1.1: An Overview of Statistics—the Vocabulary of Statistics

Case Study 5
A report from the Framingham Offspring Study suggests that marriage is truly
heartwarming. Scientists evaluated 3,682 men in the U.S. over a 10-year period. They
found that married men had a 46% lower rate of cardiovascular disease than unmarried
men.
13. Identify the sample in Case Study 5. 14. Identify the population in Case Study 5.

15. Identify the descriptive aspect of Case Study 5.

16. What inference could be drawn from Case Study 5?

17. What might this inference incorrectly imply?

4
Section 1.3: Data Collection and Experimental Design

Designing a statistical study:


• Identify the variable(s) of interest (the focus) and the population of the study.
• Develop a detailed plan for collecting data. If you use a sample, make sure the
sample is representative of the population.
• Collect the data.
• Describe the data, using descriptive statistics techniques.
• Interpret the data and make decisions about the population using inferential
statistics.
• Identify any possible errors.

A Census is a count or measure of an entire population. Taking a census provides


complete information but is often costly and difficult to perform. Therefore, it is common
to take a sampling of the population. In order to collect unbiased results, the researcher
must ensure that the sample is representative of the population. Below are some
different types of techniques.

Sampling Techniques:
• Simple Random Sample: A sample in which every possible sample of the same size
has the same chance of being selected. This can be done by assigning a different
number to each member of the population, and then using a random number
generator to select the subjects.
• Stratified Sample: Members of the population are divided into two or more subsets
that share similar characteristics such as age, gender, ethnicity, etc. A sample is
then randomly selected from each subset.
• Cluster Sample: The population falls into naturally occurring subgroups, each
having similar characteristics. All of the members of one or more of the subgroups
(clusters) is selected.
• Systematic Sample: The members are ordered in some way, a starting number is
randomly selected, and then the subjects are selected at regular intervals from the
starting number. For example, every 5th person is selected.
• Convenience Sample: Consists only of members of the population that are easy to
get.
A biased sample is one that does not represent the population being studied. Because
convenience sampling does not ensure that the members are representative of the
population, it is often a common type of biased sampling.

5
Section 1.3: Data Collection and Experimental Design

Identify the type of sampling used in each below problem.


1. Chosen at random, 500 rural and 500 urban people aged 65 or older are asked about
their health. Select the correct answer below.
o Cluster sampling is used because the people are divided into groups, the groups are
chosen at random, and every person in each of the groups is sampled.
o Simple random sampling is used because people are chosen at random.
o Systematic sampling is used, because the people stude are selected from a list, with
a fixed interval between students on the list.
o Stratified sampling is used, because the people are divided into groups and are
chosen at random from these groups.
2. 100 students are each assigned a number then a computer randomly picks 8 different
numbers. The 8 students assigned the numbers picked by the computer are selected.

3. You place a link to a survey on the ARC homepage. Those who want to participate in the
survey may click the link and answer the questions.

4. Every tenth person entering a mall is asked to name his or her favorite store.

5. Southwest Airlines want to know if their customers are satisfied. They select five flights
on a given day and asks everyone on the flight to complete a brief survey.

6. Using random digit dialing, researchers call 1400 people and ask what obstacles (if any)
keep them from exercising.

6
Section 1.3: Data Collection and Experimental Design

7. If you want to determine the average age of the 115 residents of a retirement
community would you conduct a census or use a sampling technique?
Choose the correct answer below.
o The study would use cluster sampling because the residents of a retirement
community fall into naturally occurring subgroups.
o The study would use stratified sampling because it would be important to have
members from each segment of the population.
o The study would use a census, because the population is small enough for it to be
practical to record all of the responses.
o The study would use simple random sampling because it would be easy to randomly
select a smaller number of residents of the retirement community.
8. If you want to determine the average commute time for ARC students would you
conduct a census or use a sampling technique?

The two main types of Statistical studies:


• Observational Study: The researcher observes and measures characteristics of
interest of a population but does not change existing conditions. The researcher
does not influence the responses.
• Experiment: The researcher deliberately applies a treatment before observing
responses. In a typical experiment, there are two groups. The treatment group
receives a treatment, while the control group does not receive the treatment. The
treatment group may be divided into subgroups, each receiving a different
treatment.

Determine whether each of the below studies is an observation or an experiment.


9. In a survey of 1,000 adult males, 65% said they visit a dentist at least once per year.

10. To study the effects of music on driving habits, 100 drivers drove 500 miles while
listening to various types of music. The researchers had the first 25 listen to rock, the next
25 listen to hip-hop, the next 25 listen to classical, and the final 25 listen to no music.

7
Section 1.3: Data Collection and Experimental Design

Determine whether each of the below studies is an observation or an experiment.


11. To study predator-prey relationships in the Bering Sea, researchers looked at the
feeding habits of three species: kittiwakes, thick-billed murres, and seals.

12. A footwear company tested a new type of shoe design on subjects of similar athletic
ability. The researchers had half the subjects wear the new design, and the other half
wear the old design. Athletic events were performed, and the researchers measured their
athletic ability.

Designing an Experiment:
Three key elements of a well-designed experiment are control, randomization, and
replication.
Control: Experiments can be ruined by a variety of factors. It is therefore important to
control the following influential factors.
• Confounding Variable: Occurs when the researcher cannot tell the difference
between the effects of different factors on the variable.
In other words—It is a factor other than the one being studying that affects the
outcome of the experiment.
• Placebo Effect: Occurs when a subject reacts favorably to a placebo (fake
treatment). A technique to help minimize the placebo effect is called blinding. This
is where the subjects do not know whether they are receiving a treatment or a
placebo. In a double-blind experiment, neither the experimenter nor the subjects
know if the subjects are receiving treatment or a placebo.

Randomization: A process of randomly assigning subjects to different treatment groups.


• Completely Randomized Design: Subjects are assigned to different treatment
groups through random selection.
• Randomized Block Design: The experimenter divides the subjects with similar
characteristics into blocks, and then, within each block, randomly assigns
subjects to treatment groups.
• Matched-Pairs Design: Subjects are paired up according to similarity. Each
member of the pair receives a different treatment.
Replication: Repetition of an experiment under the same or similar conditions.

8
Section 1.3: Data Collection and Experimental Design

Determine whether the below survey question is biased. If the question is biased,
suggest a better wording.

13. Why is drinking beer bad for you? Choose the best option below.
o The question is biased. The wording: "Do you think that drinking beer is bad for
you?" would be better.
o The question is biased. The wording: "Why is drinking beer good for you?" would
be better.

o The question is biased. The wording: "How do you think drinking beer affects your
health?” would be better.

o The original question is not biased.

o The question is biased. The wording: "Do you think that beer is good for you?"
would be better.

You might also like