Stat 101 Exam 2: Important Formulas and Concepts 1
Stat 101 Exam 2: Important Formulas and Concepts 1
1
Important Formulas and Concepts
1 Chapter 8
1.1 Definitions
1. Population
The entire group of individuals or instances about whom we hope to learn.
2. Sample
A (representative) subset of a population, examined in the hope of learning about the
population.
3. Sample Survey
A study that asks questions of a sample drawn from some population in the hope of
learning something about the entire population.
4. Randomization
The best defense against bias is randomization, in which each individual is given a fair,
random chance of selection.
5. Census
A sample that consists of the entire population.
6. Population Parameter
A numerically valued attribute of a model for a population. Example: mean income
of all employed people in the USA
7. Sample statistic
Statistics or sample statistics are values that are calculated for sample data. Example:
mean income of employed people in a representative sample
8. Sampling Frame
A list of individuals from whom the sample is drawn. Individuals who may be in the
population of interest, but who are not in the sampling frame cannot be included in
any sample.
2 Chapter 9
1. Studies
2. Matching in Studies
In a retrospective of prospective study, participants who are similar in ways not under
study may be matched and then compared with each other on the variables of interest.
3. Experiments
(a) Factor
Variable whose levels are manipulated by the experimenter.
(b) Response Variable
Variable whose values are compared across different treatments.
(c) Experiment
Manipulates factor levels to create treatments, randomly assigns subjects to these
treatment levels, and then compares the responses of the subject groups across
treatment levels. Tries to assess effects of treatments.
(d) Levels
Specific values that the experimenter chooses for a factor.
(e) Treatment
Process, intervention, or other controlled circumstance applied to randomly as-
signed experimental units.
(f) Block
When groups of experimental units are similar in a way that is not a factor
under study, it is often a good idea to gather them together into blocks and then
randomize the assignment of treatments within each block.
(a) Control
Control aspects of the experiment that we know may have an effect on the re-
sponse, but that are not the factors being studied.
(b) Randomize
Randomize subjects to treatments to even out effects that we cannot control.
(c) Replicate
Replicate over as many subjects as possible.
(d) Block
Reduce the effects of identifiable attributes of the subjects that cannot be con-
trolled.
6. Statistically Significant
When an observed difference is too large for us to believe that it is likely to have
occurred naturally, we consider the difference to be statistically significant.
7. Types of Experiments
8. Control Treatment
Baseline treatment.
9. Control Group
Experimental units assigned to a baseline treatment level typically either the default
treatment or a placebo treatment. Responses provide a basis for comparison.
10. Blinding
Any individual associated with an experiment who is not aware of how subjects have
been allocated to treatment groups.
Single Blind: when either of the two above statements is blinded. Double Blind: when
both of the two above statements is blinded.
12. Placebo
A treatment known to have no effect.
(a) Confounding
When the levels of one factor are associated with the levels of another factor in
such a way that their effects cannot be separated, we say that these two factors
are confounded.
(b) Lurking Variable
A variable associated with both y and x that makes it appear that x may be
causing y.
• P(S) = 1
• The set of all possible outcomes of a trial must have probability = 1.
3. Complement Rule
4. Addition Rule
• For 2 disjoint events A and B, the probability that one or the other occurs is the
sum of the probability of the two events.
• P (A or B) = P (A) + P (B) where A and B are disjoint
• disjoint means mutually exclusive; there are no outcomes in common
5. Multiplication Rule
• For two independent events A and B, the probability that both A and B occur
is the product of the probabilities of the two events.
• P (A and B) = P (A)P (B) where A and B are independent
7. Conditional Probability
The conditional probability of the event B given the event A has occurred is
P (B | A) = P (A and B) .
P (A)
9. Independent
Events A and B are independent when P (B | A) = P (B). Note: independent is not
the same as disjoint.
10. Tree Diagram
A display of conditional events or probabilities that is helpful in thinking through
conditioning.
11. Bayes Rule
P (A|B)P (B)
P (B | A) = P (A|B)P (B)+P (A|BC )P (BC )
.
Since P (A | B)P (B) + P A | BC P BC = P (A) so this may be simplified to read
P (B | A)P (A) = P (A | B)P (B)
2. Success/Failure Condition
A Binomial Model is approximately Normal if we expect at least 10 successes and 10
failures, i.e. np ≥ 10 and n(1 − p) ≥ 10.
• µ = np
p
• σ= np(1 − p)
where
n n!
• k
= k!(n−k)!
5 Chapter 15
5.1 Definitions
1. Sampling Distribution
Different random samples give different values of a statistic. Distribution of the statis-
tics over all possible samples is called the sampling distribution. Sampling distribution
model shows the behavior of the statistic over all the possible samples for the same
size n.
5. A study attempts to compare two sunscreens. Each of 50 subjects with varying skin
complexions will use both sunscreens—Screen A on one side of the body and Screen B
on the other side. For each subject, a coin is tossed to determine which side receives
Screen A and which receives Screen B. Researchers measure the amount of ultraviolet
light exposure over both treated areas for each subject. This is an example of:
6. For his Statistics class experiment, researcher J. Gilbert decided to study how parents’
income affects children’s performance on standardized tests like the SAT. He proposed
to collect information from a random sample of test takers and examine the relationship
between parental income and SAT score.
7. In 2002, the journal Science reported that a study of women in Finland indicated that
having sons shortened the life spans of mothers by about 34 weeks per son, but that
daughters helped to lengthen the mothers’ lives. The data came from church records
from the period 1640 to 1870.
8. Some people claim they can get relief from migraine headache pain by drinking a large
glass of ice water. Researchers plan to enlist several people who suffer from migraines
in a test. When a participant experiences a migraine headache, he or she will take a
pill that may be a standard pain reliever or a placebo. Half of each group will also
drink ice water. Participants will then report the level of pain relief they experience.
10. In a large Introductory Statistics lecture hall, the professor reports that 55% of the
students enrolled have never taken a Calculus course, 32% have taken only one semester
of Calculus, and the rest have taken two or more semesters of Calculus. The professor
randomly assigns students to groups of three to work on a project of the course. What
is the probability that the first group-mate you meet has studied
11. Continuation. What is the probability that of your other two group-mates,
12. A certain bowler can bowl a strike 70% of the time. If the bowls are independent,
what’s the probability that she
13. A check of dorms revealed that 38% had refrigerators, 52% had TV’s and 21% had
both a TV and a refrigerator. What’s the probability that a randomly selected dorm
room has:
(a) P(US)
(b) Probability that a person completed education before college? Do not include
those who did not answer.
(c) Probability that a person is from France or did post graduate study.
(d) Probability that a person is from France and finished primary school.
15. An animal shelter states that it currently has 24 dogs and 18 cats available for adoption.
8 of the dog and 6 of the cats are male. Find the conditional probability of:
16. Followup. The local animal shelter in reported that it currently has 24 dogs and 18
cats available for adoption; 8 of the dogs and 6 of the cats are male. Are being male
and being a dog independent events? Briefy justify your answer.
17. Police setup checkpoints to catch drunk drivers. Based on the initial stop, trained
officers can make the right decision 80% of the time. Suppose a checkpoint is set up at
a time when it is estimated that about 12% of people have been drinking. Questions
to answer:
(a) Suppose a person is stopped and is not drinking. What is the probability that he
is detained for further testing?
(b) What’s the probability that any given driver will be detained?
(c) What’s the probability that a driver who is detained has actually been drinking?
(d) What’s the probability that a driver who was released had actually been drinking?
18. A company’s records indicate that on any given day about 1% of their day-shift employ-
ees and 2% of the night-shift employees will miss work. Sixty percent of the employees
work the day shift. What percent of employees are absent on any given day?
(a) What is the value of the missing probability in the table above?
20. A printing company ships boxes of paper to office stores. In each box, there are 30
reams of paper. However, in every box, they estimate that 2% of the reams of paper
are defective in some way. What is the probability that in a box, there will be exactly
4 reams of paper that need to be shipped back to the printing company? What is the
mean number of reams of paper that need to be shipped back? What is the standard
deviation?
21. The life span of a battery is normally distributed with a mean of 120 hours and a
standard deviation of 15 hours. A random sample of 50 batteries is collected and the
sample mean will be computed.
2. (a) Response bias if the students answer and lie. Nonresponse Bias if they do not
respond at all.
(b) Voluntary Response Bias
(c) Undercoverage. This method leaves out a lot students.
3. (a) The group receiving the pill with inert ingredients will not experience the placebo
effect. This is fallse. They are given the placebo to induce the placebo effect so
they can be compared to the control goup.
(b) This experiment includes blocking. This is false. The individuals were not
grouped first by some property or condition.
(c) The number of factors in the experiment is two. This is false. There is one factor,
the medicine, with two levels.
(d) This study is single blind. This is false. Both sets of participants were blinded.
(e) This study is double blind. This is true. Both sets of participants were blinded.
(f) The group receiving the medicine is the control group. This is false, the group
receiving the placebo is the control group.
4. This is a block design, as the horses were separated into groups before the treatments
were applied.
5. This is a matched pairs design. The pairs consist of the two sides of the subjects’
bodies.
8. (a) Experiment
(b) There are 2 factors - pain reliever and water temp. The pain reliever has 2 levels -
pain reliever or placebo. The water temperature has 2 levels - ice water or regular
water. Total, there are 4 treatments.
(c) Explanatory variable: pain reliever and water temp. Response variable: level of
pain relief.
9. (a) Experiment
(b) There is 1 factor - type of exercise. This factor has 2 levels - static stretching and
trunk stabilization exercises. In total, there are 2 treatments.
(c) Explanatory variable: type of exercise. Response variable: time before the ath-
letes were able to return to sports.
10. We are given that
P(no calculus) = 0.55,
P(1 semester) = 0.32.
• P(TV) = 0.52
• P(Refrigerator) = 0.38
• P(both) = P(TV and Refrigerator) = 0.21
Answers to questions:
18. Before we answer any questions, it may be useful to create a tree diagram.
t P (Day and Absent) = (0.6)(0.01) = 0.006
sen
Ab 1
0.0
No
tA
y bs
Da 0.9 ent
9
0.6 P (Day and Not Absent) = (0.6)(0.99) = 0.594
Question to answer: What percent of employees are absent on any given day? Need
to calculate P(Absent). This is the denominator of Bayes Rule.
P (Absent) = P (Absent | Day) P (Day) + P (Absent | Night) P (Night)
= (0.01)(0.6) + (0.02)(0.4)
= 0.014
= 1.4%.
19. (a) What is the value of the missing probability in the table above? The total proba-
bility must equal 1. Therefore, the missing value is then 1 − 0.2 − 0.1 − 0.3 − 0.3 =
0.1.
20. This is an example of a Binomial Model problem. We are given that p = 0.02, n = 30.
We define “success” to be that a ream of paper that needs to be shipped back to the
printing company. The probability that there will be exactly 4 reams of paper that
need to be shipped
back is
30
P(X = 4) = 4 (0.02) (1 − 0.02)30−4
4
= 27405(0.024 )(0.9826 )
= 27405(1.6 × 10−7 )(0.5914)
= 0.0026
You can also calculate this probability on your calculator as binompdf (30, 0.02, 4) =
0.0026.
p
The
p mean is µ = np = 30(0.02) = 0.6. The standard deviation is σ = np(1 − p) =
30(0.02)(0.98) = 0.7668.
z = 122−120
2.121
= 0.94.
So we want to calculate P(Z > 0.94) = normalcdf (0.94, 999) = 0.174.
(f) By the 68-95-99.7 Rule, we know that between µ ± 3σ we have 99.7% of the total
area. However,
√ since we are working with the sample mean, we want to calculate
µ ± 3σ/ n instead. Thus, our interval will be