MATH& 146: Midterm Synopsis: CHAPTER 1: Stats Starts Here
MATH& 146: Midterm Synopsis: CHAPTER 1: Stats Starts Here
Parameter: A population’s parameters are the calculated number types (e.g. mean, median, IQR, proportion, standard
deviation, etc.) the experimenter wishes they had values of for the entire population. Unless a census is performed,
which is often not possible due to logistical and financial reasons, we use calculations from a sample, statistics, to
estimate the population parameters.
Statistic: A sample’s statistics are the calculated number types (e.g. mean, median, IQR, proportion, standard
deviation, etc.) the experimenter computes the values of using the data collected on variables from the members of the
sample. The more representative the sample is of the population, the better the statistic will reflect the value of the
parameter.
Good Sampling Strategies
Simple Random Sample
The best (theoretically, at least) kind of sample is the Simple Random Sample. A Simple Random Sample (SRS; of
size n) is chosen in such a way that every possible group (of size n) from the population has an equal chance of being
selected. The best example to get the idea of an SRS is drawing names out of a hat (or slips of paper out of a box…or
something equivalent).
Since every group has an equal chance of being selected, every individual also has an equal chance. This is not
reversible! If every individual has an equal chance, there is no guarantee that every group has an equal chance.
Theoretically, the simple random sample gives you the best chance of obtaining a sample that is representative of the
population. Also, this type of sample makes probability calculations fairly simple. In fact, all of the inference
procedures that you’ll learn in this course are based on the simple random sample.
For example, I could divide the student body into Freshmen, Sophomores, Juniors and Seniors; then take an SRS from
each group; and finally combine these to make a single sample.
The point of this type is to ensure that known subgroups of the population are represented in the sample. We won’t talk
about why you would use this version instead of a simple random sample—that’ll come if you take any more statistics
courses at the University level.
Cluster sampling is a good way to get a representative slice of the population -- assuming, of course, that there isn’t
some other reason why the individuals are clustered together.
The first individual is selected randomly. After that, the system kicks in … pick every kth individual from the first one
selected. This method is useful in a lot of case -- people entering a sporting event, bottles rolling off of an assembly
line.
Multistage Sampling
A multistage sample combines one or more of the above methods to form a sample. This is, actually, a widely used
technique. Example: Separate high school students by grade, then randomly choose two classrooms from each grade,
and finally randomly choose 10 students from each classroom; this involves the Stratified, Cluster and SRS sampling
strategies.
Selection Bias: This type of bias is introduced during the selection process where certain individuals are given greater
(than intended) probabilities of being selected, or are excluded from the selection process. Failing to include all
individuals in the selection process is often called undercoverage. Notice that a voluntary response sample (or even a
convenience sample) automatically suffers from selection bias! Using a good sampling method should minimize any
selection bias.
Measurement Bias: This type of bias is introduced when the measurement process tends to give results that differ
(systematically) from the population. For example, if a light meter is not properly calibrated, then the measurements it
gives will not be correct! A common source of measurement bias is wording bias—the way in which a question is
worded can often have an effect on the responses.
Some people use the terms measurement bias and response bias synonymously, but they don’t quite mean the same
thing. Measurement bias refers exclusively to problems with the measurement device, where response bias refers to
anything that might affect the measured results—for example, you are likely to get very different answers to a survey if
the person conducting the survey is disguised.
Nonresponse Bias: This type of bias is introduced when individuals (people, mostly) refuse to be measured and/or
refuse to answer questions. Telephone surveys suffer from this. This type of bias is almost unavoidable, so minimizing
its effects is important. There are a host of methods (beyond the scope of this course) to address this.
A study can never show a causal relationship between variables. There can always be other unknown variables such as
lurking or confounding variables. Studies can show trends and possible relationships that can be investigated through
experimentation.
An experiment is where the experimenter makes a change to one variable (variation is introduced) and changes in
another variable are observed (variation is measured). In an ideal situation, that’s all the variation there is…in reality,
there are other sources of variation. Identify them. Control them when you can. Account for them when you can’t.
An experiment has at least one explanatory variable, otherwise called a factor, and least one response variable to
measure.
The keys to a well-run experiment is active and deliberate manipulation of factors and random assignment of subjects
to treatments.
Levels: The levels in an experiment are the specific values of the factors used. For example, for the explanatory
variable or factor “Aspirin Dosage,” the (three) levels might be 100 mg, 200 mg, and 300 mg.
Treatments: Each treatment represents a different combination of levels from all the factors that a subject receives.
For example, suppose one factor in an experiment is “Hours of Sleep” and it has the three levels -- 2, 4, and 6 hours of
sleep, and a second factor in the same experiment is “Minutes of Exercise” and it has two levels -- 0 minutes or 30
minutes -- then there are six treatments in this experiment.
Treatment #1: 100 mg aspirin, 0 mins exercise Treatment #4: 100 mg aspirin, 30 mins exercise
Treatment #2: 200 mg aspirin, 0 mins exercise Treatment #5: 200 mg aspirin, 30 mins exercise
Treatment #3: 300 mg aspirin, 0 mins exercise Treatment #6: 300 mg aspirin, 30 mins exercise
Control treatments provide a baseline measurement; it could either be a previous treatment or no treatment.
In an experiment, factor levels are manipulated to create treatments. Subjects are randomly assigned to treatment
levels. The subject groups’ responses across treatment levels are compared.
NOTES:
A factor is the same thing as an explanatory variable, which in algebra, we called the independent variable (x).
A treatment is the name given to each specific set of values on the factors. For example, if the two factors are 'hair
color' and 'socioeconomic status (SES)' with values red, brown, black, and blonde for 'hair color' and poor, middle
class, and rich for SES, then one of the many treatments would be the “poor blonde”.
Blinding: Knowledge of which treatments are being applied can affect the interpretation of results. Two groups can
be blinded:
Influencers: Subjects, treatment administrators, technicians, for example.
Evaluators: Judges, examiners, for example.
When everyone in one group -- Influencers or Evaluators -- is blinded, the experiment is single blind.
When everyone in both groups -- Influencers and Evaluators -- is blinded, the experiment is double blind.
Placebo: A placebo is a “fake” treatment that looks like the real thing.
Placebo Effect: The placebo effect is a result when people respond positively to a placebo treatment because they
believe will get an effect from it.
Four Principles of Experimental Design
Control: Sources of variation other than the factors are controlled by making the conditions as similar as possible
across treatment groups.
Randomize: Randomization allows the experimenter to equalize unknown or uncontrollable sources of variation.
Experimental units must be randomly assigned to treatments. Note that this doesn’t mean that the it is necessary that
all subjects in the experiment are randomly chosen; the subjects must be appropriate for the experiment. It is the
administration of treatment that must be randomly assigned.
Replication: Treatment should be applied to more than one subject. And, the entire experiment should be repeated on
different groups of subjects.
Block: If subjects that have similar characteristics are grouped together and then treatments are randomized within
each group/block. Common ways people are frequently blocked is by age, political views, socioeconomic status,
religion, ethnicity, etc.
One way to deal with unknown extraneous variables is randomization, but that isn’t always the best answer. Often, a
better choice is to use a matched pairs design. This involves matching one experimental unit in the experimental
group with one experimental unit in the control group—these two units are either selected because they are identical or
very similar, or they are made to be very similar. Often, they are matched by things like age, exercise habits, income
level, etc.).
Just because treatment groups have different results, this doesn’t mean that the results are important. If the differences
are larger than would be caused by randomization alone, then we say they are statistically significant. we think the
differences
Experimental results only apply to the group under study.
While a histogram looks like a bar chart, they are different. Besides the fact that a bar chart is used for data collected
on a categorical variable and a histogram is used to graphically represent data collected on a quantitative variable,
another difference is that there are no gaps between adjacent bins in a histogram.
The heights of the bars in a histogram represents the number of data values in the each of the bins. The horizontal axis
consists of values of the quantitative variable; this axis is split into equal-sized bins which divide the numerical data
into ranges. A ‘good’ histogram has between 4 and 10 bins.
Histograms can show frequency or relative frequency; either choice will create similar looking distributions.
When you describe a quantitative variable’s distribution, think SOCS: Shape, Outliers, Center, Spread.
Shape: What does it look like? Unimodal, bimodal, uniform, symmetric? Is it skewed, and if so is it skewed left or
skewed right (the ‘tail’ points in the direction of the skew)?
Outliers: Are there data points that appear unusually small or large compared to the rest of the distribution? Not all
distributions have outliers. You can precisely determine whether-or-not a data value is an outlier by seeing if lies
beyond the fences. See Tukey’s Rule.
Center: Mean, Median, Mode, and Midrange are four measures of center.
The mean, which is often called the average by many non-statisticians, is the sum of all the data values
divided by
the number of data values. If the quantitative variable is named x, then the population’s mean is
N
x i
= i =1
, which
N
The mean of a sample’s data set is denoted by the variable name (e.g. x, y, weight, etc.) with a bar over it (e.g x , y ,
n
x i
weight , etc.). Thus, for the quantitative variable is named x, a sample’s mean is x = i =1
, which written less
n
formally is, x =
x .
n
One can estimate the mean of a distribution from its histogram by finding the ‘fulcrum’ -- the place where the
distribution would balance; think of the balance point on a see-saw or teeter-totter.
The median is the middle value of an ordered data set. If there are an even number of data values, then the median
is the mean of the middle two value.
The median of a sample’s data set is denoted by the variable name (e.g. x, y, weight, etc.) with a tilde over it (e.g x ,
y , weight , etc.).
One can estimate the median of a distribution from its histogram by estimating the location on the horizontal
axis where the area of the rectangles to the left equals the area of the rectangles to the right.
See https://www.youtube.com/watch?v=ik7LAJg432k
and https://www.youtube.com/watch?v=-haVXyCHMSs for illustrations.
The midrange is the mean of the smallest and largest values in the data set.
Spread: The dispersion of quantitative data. A measure of how spread-out the data in the data set is.
A data set that has been split into quartiles conveys where each of 25% of the data lies. The lower quartile contains
the lower 25% of the data and the upper quartile measures the upper 25% of the data.
The InterQuartile Range (IQR) measures the spread or range of the middle 50% of the distribution;
IQR = Q3 − Q1 , where Q1 is the median of the lower half of the ordered data set and Q3 is the median of the upper half
of the ordered data set
The Five Number Summary, which includes the following five numbers: Minimum, First Quartile ( Q1 ) , Median,
Third Quartile ( Q3 ) , and Maximum, where the median is denoted by if working with the population and it is
denoted by x if working with a sample and the quantitative variable is named x.
Two other important measures of spread are variance and standard deviation.
A deviation is the difference between two values. The variance and standard deviation both involve the deviation
between a specific value in the data set and the mean of the data set: xi − and xi − x , for data collected in a
population and a sample, respectively, on a quantitative variable is named x. In a population, the variance is denoted
by 2 and in a sample it is denoted by s 2 .
The variance of a set of data collected on a quantitative variable named x in a population is the average of every data
N
( x − )
2
i
value’s deviation from the mean: 2 = i =1
.
N
The variance of a set of data collected on a quantitative variable named x in a sample is the average (almost) of every
n
( x − x )
2
i
data value’s deviation from the mean: s 2 = i =1
.
n −1
N n
( x − ) ( x − x )
2 2
i i
The standard deviation of a set of data is the square root of the variance: = i =1
and s = i =1
for
N n −1
a population and a sample, respectively.
Note: Although the standard deviation of a sample’s data set is typically denoted by s, sometimes the variable for
which the standard deviation has been computed may appear as a subscript, such as s x , s y , and sweight , for
example.
Other ways to display data collected on a quantitative variable include Dot Plots, Stem-and-Leaf Plots (also called
Stemplots), and Boxplots (also called Box & Whisker Plots). Boxplots are nearly as popular as histograms for visually
displaying quantitative data.
A Boxplot is a drawn-to-scale graph of the Five Number Summary. If an outlier exists in a set of data (see Tukey’s
Rule below for calculation details), and one choose to distinguish it as a point separate from the boxes and whiskers of
a traditional boxplot, then this is called a Modified Boxplot.
Tukey’s Rule: An outlier is defined to be any value less than the Lower Fence or any value greater than the
Upper Fence, where Lower Fence = Q1 - 1.5(IQR) and Upper Fence = Q3 + 1.5(IQR).
For data sets containing more than 10 values, the computation of a quantitative variable’s descriptive statistics, as well
as the construction of a graph, is typically done using statistical functions within a particular technology type (e.g.
Excel, StatCrunch, Excel, R, a Texas Instruments calculator such as the TI-84, an app, etc.).
You should, however, be able to determine the mean, median, mode, midrange, range, 5-Number Summary, lower
fence, and upper fence of a data set comprised of no more than 10 values using paper-and-pencil calculations. You
should be able to compute the variance and standard deviation of a data set comprised of no more than 5 values using
paper-and-pencil calculations.
Boxplots help one see the lower 25% of a data set, the middle 50% of a data set, and the upper 25% of a data set. This
can be helpful when comparing multiple distributions as can identifying the locations of each distribution’s Five
Number Summary’s values (e.g. the minimums, the first quartiles, the medians, the third quartiles, and the
maximums).
When comparing distributions using histograms or boxplots, be sure compare SOCS: Shape, Outliers, Center, Spread.
Example: The age distributions of the residents in two neighboring counties in Florida are quite different. The box-
plots show these distributions, where the ages are measured in years. Describe these populations.
County 1
County 2
Applying SOCS to describe the ages of the residents in these two counties, we see that each age distribution
is skewed-right. Consequently, we can conclude that in each county, the mean age is higher than the median
age of the residents.
The median age of County 1’s population is 44 years while the median age of County 2 is 29 years. Thus,
on average, the residents of County 1 are older than those people living in County 2. More specifically,
since the median (i.e. second quartile) of County 1, 44 years, is approximately equal County 2’s third
quartile, 43 years, we can deduce that 50% of County 1’s population’s age is at least 44 years while only
25% of County 2’s population is 43 years or older.
The first, second, and third quartiles have approximately the same range -- 23 years, 21 years, and 20 years,
respectively -- in County 1. The same is true for the age distribution of County 2, where the range of the
first, second, and third quartiles are 15 years, 14 years, and 14 years, respectively.
Since the IQR of the ages for County 1 is 64 – 23 = 41 years while the IQR of the ages for County 2 is
43 – 15 = 28 years, we know that the ages of the middle 50% of the residents in County 1 are more disperse
(i.e. more spread out) than the ages of the middle 50% of the residents in County 2.
The Upper Fence for County 1 is Q3 + (1.5)(IQR) = 64 + (1.5)(41) = 125.5 years and the Lower Fence for
County 1 is Q1 – (1.5)(IQR) = 23 – (1.5)(41) = −38.5 years. Since the minimum age in County 1 is 0 years
and the maximum age is 109 years, there is no resident whose age is either less than the Lower Fence of
−38.5 years or greater than the Upper Fence of 125.5 years, we can conclude that there are no outliers in
the age distribution of County 1.
Similarly, the Upper Fence for County 2 is Q3 + (1.5)(IQR) = 43 + (1.5)(28) = 85 years and the Lower
Fence for County 2 is Q1 – (1.5)(IQR) = 15 – (1.5)(28) = −27 years. Since the minimum age in County 2
is 0 years and the maximum age is 108 years, there is at least one resident whose age is an outlier because
although there is no one whose age is less than the Lower Fence of −27 years, there is at least one person
(the person who is 108 years old) who’s age is greater than the Upper Fence of 85 years.
Basic Terms
A random experiment is any activity which has a measurable result. The term measurable is used in a very open
sense—it does not imply that numbers are involved! Thus, the results can be quantitative or qualitative.
A trial is each occasion that we observe in random phenomena.
An outcome is any (simple) result of one trial of a random experiment. It is the value of the random phenomena during
a trial. When listing outcomes, we’ll use some short description (numbers, letters, words—something) to identify each
outcome.
An event is a group (set) of related outcomes. We typically use capital Roman letters (e,g, A, C, etc.) to identify
events.
The sample space is the group (set) of all possible outcomes for a random experiment. Since this is sometimes called
the universe, the Roman capital U is often reserved to identify the sample space.
Law of Large Numbers: As the number of independent trials increases, the long-run relative frequency of repeated
events approaches a single value … it approaches the theoretical probability’s value.
The (theoretical) probability of an event is defined as the number of outcomes in the event divided by the number of
E
outcomes in the universe: P ( E ) = , where E represents the number of outcomes in the event and U represents
U
the number of outcomes in the universe. That is, the vertical bars “number of” or “size of,” not “absolute value.” This
formula works only if the outcomes in the universe are all equally likely.
Probability is a number (fraction, a reduced fraction, a decimal, or a percentage) between 0 and 1 (0% and 100%) for
event A. Symbolically, 0 P ( A) 1 . If P ( A) = 0 , then event A cannot occur; if P ( A) = 1 , then event A always
occurs.
The set of all possible outcomes of a trial must have probability equal to 1. In symbols, P (U ) = 1 .
Ac represents the complement of event A and consist of all events that are not A.
=U
Multiplication or Product Rule: If events A and B are independent events, then the probability that both A and B
occur is the product of their individual probabilities. In symbols, P ( A and B ) = P ( A B ) = P ( A) P ( B ) , where the
symbol means ‘intersection’.
Events are independent if the probability that one event occurs in no way affects the probability of the other event
occurring.
Examples
1. A bag contains four marbles—one each of black, white, red and blue. You reach in and pull out two.
(a) List the universe for this experiment.
(b) List the outcomes in the event “at least one marble is black.”
(c) What is the probability that at least one marble is black?
I elect to use a capital B to represent the outcome “marble is black,“ a lowercase b to represent the outcome “marble is
blue,” ,“ a lowercase w to represent the outcome “marble is white,” and a lowercase r to represent the outcome
“marble is red.”
(a) The universe, U, is {Bw, wB, Br, rB, Bb, bB, wr, rw, wb, bw, rb, br}.
(b) The event “at least one marble is black” is {Bw, wB, Br, rB, Bb, bB}.
(c) The probability that at least one marble is black is 6 12 = 1 2 .
Observation: To draw two marbles simultaneously from the bag is the same as drawing the first marble and then
drawing the second without having put the first one back into the bag … it is the same as drawing two marbles without
replacement.
Besides describing the universe or sample space in list form, as was done above in part (a), one can also a use tree
diagram.
Tree diagrams are a way of showing combinations of two or more events. Tree diagrams are quite helpful when
attempting to show sequences of events and/or when attempting to calculate complicated probabilities.
Each branch is labelled at the end with its outcome and the probability is written alongside the line. Specifically, each
branch represents a possible outcome while an entire path, along multiple branches, represents an event. The
Multiplication Rule is applied along one pathway while the Addition Rule is applied across pathways. The sum of the
probabilities associated with each set of branches must equal 1. The sum of the probabilities associated with every
path must also 1.
I personally choose to create a tree diagram for problems with five or fewer sets of branches; otherwise, it becomes too
unwieldy. In those cases, I look to see if a different mathematical model applies, such as the Central Limit Theorem
for Proportions in this course or the Binomial Model in a second probability course.
Below is a tree diagram illustrating the experiment where you reach in and pull out two marbles from a bag containing
four marbles—one each of black, white, red and blue … where sampling is done without replacement. As you can see,
once a marble of a particular color is withdrawn on the first draw, it is impossible in this case for the second marble to
be of the same color since there is only one of each color marble in this problem. Hence, there are four branches
corresponding to the possible outcomes for the first marble and only three for the second marble.
B w
w
b
r
B
w b
r b
w
Thus, the probability of getting at least one black marble among the two randomly chose marbles is …
P ( at least one B ) = P ( Bb Bw Br bB wB rB )
= P ( Bb ) + P ( Bw) + P ( Br ) + P ( bB ) + P ( wB ) + P ( rB )
= 14 13 + 14 13 + 14 13 + 14 13 + 14 13 + 14 13
= 121 + 121 + 121 + 121 + 121 + 121
= 126 = 12 .
Since this problem involved the phrase “at least”, one could opt to approach the problem by using the complement.
That is, P ( at least one B ) = 1 − P ( no B )
= 1 − P ( bw br wb wr rb rw)
= 1 − P ( bw ) + P ( br ) + P ( wb ) + P ( wr ) + P ( rb ) + P ( rw )
= 1 − 14 13 + 14 13 + 14 13 + 14 13 + 14 13 + 14 13
= 1 − 121 + 121 + 121 + 121 + 121 + 121
= 1 − 126 = 12
12 − 12 = 12 = 2 .
6 6 1
Although in this particular example, using the complement in this problem involving “at least” involved just as many
calculations as not, it is often the case that using the complement in problems involving “at least” requires significantly
fewer calculations.
2. A bag contains four marbles—one each of black, white, red and blue. You reach in and pull out one marble, |
observe its color, and replace it, before you draw a second marble.
I elect to use a capital B to represent the outcome “marble is black,“ a lowercase b to represent the outcome “marble is
blue,” ,“ a lowercase w to represent the outcome “marble is white,” and a lowercase r to represent the outcome
“marble is red.”
(a) The universe, U, is {BB, ww, rr, bb, Bw, wB, Br, rB, Bb, bB, wr, rw, wb, bw, rb, br}.
(b) The event “at least one marble is black” is {BB, Bw, wb, Br, rB, Bb, bB}.
b
B
w
b
b w
b
w
w
b
r
w
Thus, the probability of getting at least one black marble among the two randomly chose marbles is …
P ( at least one B ) = P ( BB Bb Bw Br bB wB rB )
= P ( BB ) + P ( Bb ) + P ( Bw) + P ( Br ) + P ( bB ) + P ( wB ) + P ( rB )
= 14 14 + 14 14 + 14 14 + 14 14 + 14 14 + 14 14 + 14 14 = 161 + 161 + 161 + 161 + 161 + 161 + 161 = 167 .
Sampling is done without replacement in the majority of circumstances since researchers don’t typically create a
sample from a population by drawing the first unit (person, TV, etc.), performing the study on them, and then returning
that unit back into the population so that it’s possible that it is drawn again.
As you can see from the previous two examples involving the four colored marbles, this has a sizable effect on the
probabilities since the denominator changes size from N to N − 1 after the first unit is drawn from the population of
size N. Specifically, in the examples involving marbles, the population size was N = 4 and this was the denominator
of the fractions associated with the first marble, but it dropped by 1 to 3 in those fractions associated with the second
marble in the case where sampling was drawn without replacement. Observe that in this case the sample size was
n = 2 while the population size was N = 4 , and so the sample size is 50% of the population size.
Fact: Whenever the sample size, n, is larger than 10% of the population size, N, the effect on the probabilities
associated with the second, third, …, etc. selection, when sampling is done without replacement, is great enough that it
will have a significant impact on the probabilities of the various events (sequences of outcomes). And so, the
probabilities associated with the same outcome (e.g. marble is blue) for each of those coming after the first marble’s
selection will be different from the probabilities associated with the first selection (e.g. 1 3 versus 1 4 ).
However, in those case where the sample size, n, is at most 10% of the population size, N, the effect on the
probabilities associated with the second, third, …, etc. selection is negligible, and so we use the same probability for
each selection. That is, n 0.10 N , or equivalently, 10n N , we can use the same probability value for each outcome
on subsequent selections that was used on the first selection. This is referred to as the 10% Condition.
So, for example, if we are told that 21% of all U.S. college students are soccer fans and we generate a sample by
randomly selecting 350 people from among all U.S. college students, then n = 350 and N is the number of all U.S.
college students. Although I don’t know the exact value of N, I do know that N is certainly larger than 3500. Thus,
10n = 10 ( 350 ) = 3500 N , and so if we could use the proportion 0.21 as the value of P ( Is a soccer fan ) for each
selected person. Consequently, if we wanted to know the probability that all 350 people sample is a soccer fan, we
would compute P ( F F F F ) , where F represents the outcome “is a soccer fan”, by computing
( 0.21) ( 0.21) ( 0.21) ( 0.21)
= ( 0.21) 5.981E − 238 or 5.98110−238
350
(found by using this online calculator
https://keisan.casio.com/calculator).
4. Two dice are rolled and the sum of the pips noted (those dots are called pips).
At first, you might think that the universe is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}…and that’s not wrong, but it’s not useful.
The events in that universe aren’t as specific as possible. For example, there are multiple ways to roll a sum of five (4
and 1, 3 and 2). Furthermore, the dice are different…you can imagine that one is blue and the other is green. A blue 3
and a green 2 is different from a blue 2 and a green 3!
Thus, a more useful version of this universe is best shown in a table. In the margins of the table, I’ll list one die result
in blue and the other die result in green. The sum of the two dice is in the table’s body.
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
In this universe of 36 outcomes,
four of them result in a sum of nine, so the probability of rolling a sum of nine is 4 36 = 1 9 .
5. Joe drives through a particular intersection once a day. Given that the probability of a traffic signal at a particular
intersection being green when Joe reaches it is 0.35, the probability of that traffic signal being yellow when Joe
reaches it is 0.04, and the probability of that traffic signal being green when Joe reaches it is 0.61.
(a) What is probability Joe will hit a red light on two consecutive days?
(b) What is the probability that Joe won’t see a red light until the third day?
(c) What is the probability that Joe will have to stop for a red light at least once during a five-day week?
This particular intersection’s traffic signal displaying a red light when Joe arrives at it on one day is independent of
that same intersection’s traffic signal displaying a red light when Joe arrives at it on the second day.
(a) By the Multiplication Rule with R representing the outcome light is red, we see that
P ( R R ) = P ( R ) P ( R ) = ( 0.61)( 0.61) = 0.3721 .
(b) By the Complement Rule, with R representing the outcome light is red, P ( R c ) = 1 − P ( R ) = 1 − 0.61 = 0.39 .
Thus, P ( R c Rc R ) = P ( R c ) P ( R c ) P ( R ) = ( 0.39 )( 0.39 )( 0.61) = 0.092781 .
(c) Since the phrase “at least” appears in this problem, I will consider using the Complement Rule.
To compute the probability that Joe will have to stop for a red light at least once during a five-day
week involves computing the probability of a sizable number of possible events:
R Rc Rc Rc Rc , Rc R Rc Rc Rc , Rc Rc R Rc Rc , Rc Rc Rc R Rc ,
R c R c R c R c R , R R R c R c R c , R R c R R c R c , …., R R R R R .
Consequently, I elect to compute P ( stop at least once for a red light in five days )
= 1 − P ( no stops for a red light in five days )
= 1 − P ( Rc Rc Rc Rc Rc )
= 1 − P ( R c ) P ( R c ) P ( R c ) P ( R c ) P ( R c )
The Union of two events contains all outcomes that belong to either event (and, maybe both).
One nice way to think about the union is to use a Venn Diagram. In such a diagram, the universe is depicted as a
rectangle, outcomes are points within the rectangle, and events are shapes inside of the rectangle (commonly circles).
Here’s a Venn Diagram of the union (in yellow):
The word “or” gets used a lot when talking about unions, but it’s not being used in the same way that most people use
it in conversational English. Outside of statistics class, when you use the word “or”, you almost always are excluding
the possibility of both things (“do you want to go to the movies or do you want to go shopping?” usually means that
only one of the two is an option). The conversational “or” is (usually) an exclusive or (abbreviated xor), but the one we
use in statistics is an inclusive or.
To find the probability of the union, you need to sum up the outcomes in the two events. Note, though, that if you just
add the number of outcomes in the first event to the number of outcomes in the second event, then you might be
adding some outcomes twice—in the diagram above, that means adding the outcomes (area) of that almond-shaped
middle part twice. Thus, to properly add up the outcomes in the union, you need to adjust by subtracting the number of
outcomes that belong to both events.
FYI The reason the formula requires that the probability of A B be subtracted from the sum P ( A) + P ( B ) is
because A B was included twice in the sum P ( A) + P ( B ) … once in P ( A) and a second time in P ( B ) .
Observe that if events A and B are mutually exclusive (i.e. disjoint), then they cannot happen simultaneously, and so
P ( A B ) = 0 , leading to the Addition Rule for Disjoint Events we encountered in Chapter 13:
P ( A B ) = P ( A) + P ( B ) .
The Intersection of two events contains all outcomes that are shared by the events (that are in common). Here’s the
Venn Diagram of that:
Conditional Probability: The conditional probability that one event will happen if it is known that another event has
happened. Remember that the contingency tables we encountered in chapter 2 showed relative frequencies and
conditional distributions.
When you know something has happened, that changes the universe; specifically, it shrinks it to the given event. The
amount of the other non-given event remaining after this shrinkage is the intersection. Thus, the size of the other non-
given event becomes the intersection, and the size the universe becomes the given event.
The probability that event A will occur given that event B has already occurred is denoted by P ( A B ) .
In a Venn Diagram,
Observe that with P ( A B ) ,
event B has already occurred,
and so the universe shrinks to
only include B, and since the
only part of A that’s in B is
the intersection A B , we
P ( A B)
see that P ( A B ) = .
P( B)
P ( B A) P ( A B)
Similarly, P ( B A ) = , and since P ( A B ) = P ( B A) , we can also express P ( B A ) as P ( B A ) = .
P ( A) P ( A)
Observe that to compute a conditional probability, divide the probability of both events occurring together by the
probability of the event following the “given that” symbol (the vertical line | ).
P ( A B) P ( A B)
P( A B) = and P ( B A ) =
P ( B) P ( A)
Example: Suppose we know the following information about 478 elementary school children.
251
P ( Girl ) = 0.525
478
90 45
P ( Sports ) = = 0.188
478 239
Suppose we know that the student we select is a girl, then we are working with a conditional probability. Say, we are
specifically interested in the probability that a student whose goal is to excel at sports given that they are a girl, then
we wish to determine the value of P ( Sports Girl ) .
P ( Sports Girl ) 30
P ( Sports Girl ) = = 0.120 .
P ( Girl ) 251
Now, suppose we know that the student we select has the goal of excelling in sports and we wish to know the
probability that they are a girl. Then we seek a value of P ( Girl Sports ) .
P ( Girl Sports ) P ( Sports Girl ) 30 1
P ( Girl Sports ) = = = = 0.333 .
P ( Sports ) P ( Sports ) 90 3
General Multiplication Rule: This rule, which does not require that the two events are independent, is simply a
P ( A B)
restatement of the Conditional Probability Rule. Solving P ( A B ) = for P ( A B ) , we discover that
P( B)
P( A B) = P ( A B) P (B) .
Recall that independence means that the outcome of one event does not influence the outcome of another.
Examples
Refer to the Gender/Goals example on the previous page in order to determine if having the goal of achieving good
grades is independent of gender.
A way to approach this problem is to compare the values of P ( Grades Girl ) and P ( Girl ) . One could also see if
P ( Grades Boy ) equals P ( Boy ) . Choosing the former approach, then since P ( Grades Girl ) = 130 251 0.518 and
P ( Girl ) = 251 478 0.525 are the same when rounded and reported to the nearest tenth (they are both 0.5), having a
goal of achieving good grades is independent of gender.
Again, refer to the Gender/Goals example on the previous page in order to determine if having the goal of excelling at
sports is independent of gender.
A way to approach this problem is to compare the values of P ( Sports Boy ) and P ( Boy ) . One could also see if
P ( Sports Girl ) equals P ( Girl ) . Choosing the former approach, then since P ( Sports Boy ) = 60 227 0.264 and
P ( Boy ) = 90 478 0.188 are not the same, nor approximately the same, then having a goal of excelling at sports is
independent of gender.
A bag holds five marbles—one each of red, orange, yellow, green and blue. You reach in and randomly select two
marbles. What’s the probability that you get either a red or a green marble?
Choosing to solve this problem by creating a list the sample space or universe, rather than trying to use a formula, I
have {ro, ry, rg, rb, oy, og, ob, yg, yb, gb} for the universe. There are seven outcomes with either a red or a green.
Since there are ten outcomes in the universe, the probability is 7 10 .
According to the most recent U.S. Census, 6.7% of the population is aged 10 to 14 years, and 7.1% of the population is
aged 15 to 19 years. If you pick a person at random, what’s the probability that they are aged 10 to 19 years?
Since there is no possibility of being in both groups at once, P (10 to 14years old 15 to 19 years old ) = 0 , then
P (10 to 14years old 15 to 19 years old )
= P (10 to 14years old ) + P (15 to 19 years old ) − P (10 to 14years old 15 to 19 years old )
= 6.7% + 7.1% − 0%
= 13.8% or 0.138.
In a statistics textbook, 48% page have data displayed in a table, 27% have an equation, and 7% have both. What
percentage of all the pages in this textbook either have data displayed in a table or have an equation? That is, what is
the probability that a randomly selected page from this textbook has either a table displaying data or an equation?
0.41 0.20
0.32
Observe:
48% − 7% = 41% of the pages have data displayed in a table and don’t contain an equation.
27% − 7% = 20% of the pages have an equation and don’t display data in a table.
Thus, P ( Table Equation ) = P ( Table ) + P ( Equation ) − P ( Table Equation )
= 0.27 + 0.48 − 0.07 = 0.68 .
FYI The probability of randomly selecting a page without a table or an equation is 1 − 0.68 = 0.32 .
When stopping drivers for a suspected DWI/DUI, 78% of the suspected drives get a breath test, 36% receive a blood
test, and 22% get both. Create both a contingency table comprised of probabilities and a Venn Diagram.
The 78% and 36% values are totals and so they should appear in the margins of a contingency table.
The 22% corresponds to the percentage of the suspected drivers that get both a breath test and a blood test, and so
this should appear in the “Yes” and “Yes” positions for “Breath Test” and “Blood Test” in a contingency table.
The cell in the lower right-hand corner of the table must contain 1.00.
So far, we have
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.36
No
Totals 0.78 1.00
The rest of the table can be determined by using addition and/or subtraction.
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.36 − 0.22 0.36
No 0.78 − 0.22 1.00 − 0.36
Totals 0.78 1.00 − 0.78 1.00
And so …
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.14 0.36
No 0.56 0.64
Totals 0.78 0.22 1.00
Which leads to …
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.14 0.36
No 0.22 − 0.14
0.56 or 0.64
0.64 − 0.56
Totals 0.78 0.22 1.00
Therefore,
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.14 0.36
No 0.56 0.08 0.64
Totals 0.78 0.22 1.00
Using the numbers obtained and placed into the probability contingency table to generate a Venn Diagram, the
information in this problem could be summarized as follows.
Blood Breath
Test 0.22
Test
0.14 0.56
0.08
Example
Suppose we know that 44% of college students binge drink (alcohol), 37% drink moderately, and 19% abstain.
Furthermore, we also know that among binge drinkers, 17% have been in an alcohol-related accident while only 9% of
the moderate drinkers have been in an alcohol-related accident.
(a) Calculate the probability that a randomly selected college student has been in an alcohol-related accident.
(b) Calculate the probability that a randomly selected student who has been in an alcohol-related accident is a
binge drinker.
Here’s a tree diagram illustrating both the given and deduced probabilities.
Observe that a student who abstains from drinking, by definition, cannot have an alcohol-related accident.
Note that the conditional probabilities are located above the branches associated with the second set of tree
leaves. For example, P ( Accident Binge Drinker ) = 0.17 (this was given in the problem’s statement) and
P ( No Accident Moderate Drinker ) = 0.91 (this was determined using the fact that the sum of all the
probabilities across a set of branches must equal 1, and so 1 − 0.09 = 0.91 ).
Notice that the General Multiplication Rule is used to obtain the probabilities at the end of each pathway.
For example,
P ( Binge Drinker Accident )
= P ( Binge Drinker ) P ( Accident Binge Drinker )
= ( 0.44 ) ( 0.17 )
= 0.0748 .
(a) The probability that a randomly selected college student has been in an alcohol-related accident equals
P ( Binge Drinker Accident ) + P ( Moderate Drinker Accident ) + P ( Abstainer Accident )
= 0.0748 + 0.3333 + 0
= 0.1081 .
(b) Calculate the probability that a randomly selected student who has been in an alcohol-related accident is
a binge drinker.
P ( A B) P ( B A)
Recall P ( A B ) = and P ( B A ) = .
P( B) P ( A)
P ( A B)
Since P ( B A) = P ( A B ) , we can also express P ( B A ) as P ( B A ) = .
P ( A)
P ( A B) P( A B) P ( B)
Since P ( A B ) = P ( A B ) P ( B ) , when we can rewrite P ( B A ) = as P ( B A ) = .
P ( A) P ( A)
If you have been given or have already computed the value of a conditional probability, say P ( A B ) , and you wish to
determine the value of the conditional probability with the conditions switched, P ( B A ) , use
P( A B) P ( B)
P ( B A) = .
P ( A)
Example
Returning to part (b) of the recent example involving college students who binge drink, drink moderately, and abstain,
and alcohol-related accidents, suppose we wish to calculate the probability that a randomly selected student who has
been in an alcohol-related accident is a binge drinker, you can see that we did just that.
From the tree diagram, we easily determined the value of P ( Accident Binge Drinker ) as 0.17.
However, when we were asked to determine P ( Binge Drinker Accident ) , we did so by computing:
=
( 0.44 ) ( 0.17 ) =
0.0748 0.0748 10, 000 748
= = 0.6920 .
0.0748 + 0.3333 + 0 0.1081 0.1081 10, 000 1081
In summary, my first step in addressing a probability problem is to determine the size of n, which is sometimes called
the "sample size" and other times is referred to as the "number of trials."
If the sample size is something small -- for me, this means n < 6 -- then I will typically use the tools in chapters 13 and
14. That is, I will either use …
I generally choose the tool in the order they appear in the list. Thus, for probability problems where n is at most five, I
will first see if I can draw a tree diagram. If not, then I'll see if I can use the probability laws themselves, applying
them directly. If not, then I'll draw a Venn Diagram. If a table is given, I will definitely use it, but I only create a table
for those problems that only involve two events.
So, for example, I’ll return once again to the recent example involving college students’ drinking level (binge drink,
drink moderately, and abstain) and alcohol-related accidents. Specifically, we are told that 44% of college students
binge drink (alcohol), 37% drink moderately, and 19% abstain. Furthermore, we also know that among binge drinkers,
17% have been in an alcohol-related accident while only 9% of the moderate drinkers have been in an alcohol-related
accident.
Choosing to convert these percentages to counts out of 10,000, where, for example, 44% corresponds to 4400 out of
1000, and 17% of 4400 equals the Whole number 748, then here’s a table that communicates the information given in
the problem.
The calculations used to complete this table are: (0.44)(10000) = 4400, (0.37)(10000) = 3700, (0.19)(10000) = 1900,
(0.17)(4400) = 748, and (0.09)(3700) = 333.
Note that a student who abstains from drinking, by definition, cannot have an alcohol-related accident. Thus, zero of
the 1900 students who abstain from drinking have an alcohol-related accident.
The remaining cells of the table can then be completed using the fact that the sum of the parts must equal the total.
If we were then wanted to compute the probability that a randomly selected college student has been in an alcohol-
related accident, we need only look at the right-most margin of the table; specifically, 1081 and 10000. Thus, the
1081
probability that a randomly selected college student has been in an alcohol-related accident is = = 0.1081 .
10000
And, if we wished to calculate the probability that a randomly selected student who has been in an alcohol-related
accident is a binge drinker, we need only look at the numbers displayed in the table with a red font, and take the
appropriate ratio.
Thus, the probability that a randomly selected student who has been in an alcohol-related accident is a binge drinker is
P ( Binge Drinker Accident ) =
748
0.6920 .
1081
As expected, these values are the same as those computed previously using a tree diagram to summarize the
information given in the problem.
On the following pages, you will find more worked problems involving probability. I have arranged these by the
tool -- tree diagram, implementation of the rules directly, Venn diagram, and table -- I chose to use to solve them.
A Tree Diagram
Example 1: Suppose you just received a shipment of ten televisions. Two of the televisions are defective.
(a) If two televisions are randomly selected, compute the probability that both televisions work.
(b) What is the probability that at least one of the two televisions does not work?
Observation: Sampling is being done without replacement since the two TVs are selected
simultaneously, which is equivalent to selecting the first TV at random and
then randomly selecting the second TV without having replaced the first one
back into the shipment.
Consequently, since the n = 2 TVs are being selected without replacement, then these n = 2 trials are not
independent.
I find that a tree diagram helps me solve nearly every probability problem involving fewer than 5 trials.
So, I’ve created one for this problem; please see the next page.
Let D represent the event “TV is defective” and Dc represent the event “TV is not defective.”
P( D D ) =
1
9
D D and D
P( D ) =
2
10 D
(
P Dc D = ) 8
9
Dc
P ( D Dc ) =
2
9 D D and Dc
P( D )=
c 8
10 Dc
(
P Dc Dc = ) 7 Dc
9
First TV Second TV
So, since 2 of the original 10 TVs were defective, and so 8 of the original 10 TVs were not defective, then the probability that the first TV is defective is
and the probability that the first TV is not defective is 8/10 P ( D c ) =
2 2
2/10 P ( D ) = .
10 10
Now, if the first TV selected is defective, then there is only one defective TV left amongst the nine remaining TVs, and so P ( D D ) =
1
. I will leave it to
9
you deduce how I got the other three conditional probabilities associated with the second TV.
(a) Therefore, the probability that both TVs work – both TVs are Dc equals the product of the two probabilities
along the bottom set of branches in the tree diagram. That is,
(
P ( D c and D c ) = P ( D c D c ) = P ( D c ) P D c D c = )
8 7 56
=
10 9 90
0.622 .
Answer: If two televisions are randomly selected, the probability that both televisions work is approximately
0.622.
(b) Although I could multiply the pairs of probabilities along each of the top three sets of branches of the tree, and
then add those three products together in order to determine the probability that at least one of the two TVs
selected is defective, I choose, instead, to use the idea of the complement. I will compute the probability that
both TVs are not defective – by computing the product of the probabilities along the bottom branch of the tree
– and then subtract that from 1.
Answer: If two televisions are randomly selected, the probability that at least one of the two televisions does
not work is approximately 0.378.
Example 2: A study concluded that among people with a certain virus, 99.4% of tests conducted were (correctly)
positive, while for people without the virus, 97.9% of the tests were (correctly) negative. If 32% of
patients actually carry the virus, what’s the probability that a patient testing negative is truly free of the
virus? Round to three decimal places as needed.
I find that a tree diagram helps me solve nearly every probability problem involving fewer than 5 trials. So, I’ve
created one for this problem.
Let V represent the event “person has the virus” and Vc represent the event “person does not have the virus.”
Let + represent the event “test result is positive for the virus” and – represent the event “test result is negative for
the virus.”
P ( + V ) = 0.994
+ V and +
P ( V ) = 0.32 V
P ( − V ) = 0.006 -
V and –
P( + V c
) = 0.021
+
P(V ) = 0.68
Vc
c
P ( − V c ) = 0.979 - Vc and +
Vc and –
We are asked to find the probability that a patient testing negative is truly free of the virus.
That is, given (for a fact) that the patient’s test results is –, we want the probability that they don’t have the virus, Vc.
Thus, we want the conditional probability P Vc − . ( )
Since this conditional probability is in the reverse order of the one we do know the value of, P ( − V c ) = 0.979 , we
must follow the “reversing the conditioning” procedures described in chapter 14 of the textbook.
P( A B ) P ( Vc −)
That is, we must employ the fact that P ( A B ) =
P( B )
to our events. Thus, P V c − = ( ) P( − )
.
P ( Vc −)
In order to compute a value for P V − = ( c
) P( − )
, we must determine the value of P ( V c − ) and P ( − ) .
From the tree diagram, we see that the fourth set of branches corresponds to P ( V c −) ,
and so P ( V c − ) = P ( V c ) P ( − V c ) = ( 0.68 ) ( 0.979 ) = 0.66572 .
The second and fourth sets of branches correspond to the two ways one can have a negative (–) rest result (I’ve
highlighted these with the events in a red font).
Thus, P ( − ) = P ( ( V and − ) or (V c
) ((V
and − ) = P − ) or (Vc
−) )
= P( V −) + P( V c
−)
= P ( V ) P ( − V ) + P ( Vc ) P ( − Vc )
P ( Vc −)
(
Therefore, P V − = c
) P( − )
0.66572
=
0.66764
0.9971241987
0.997 .
Answer: The probability that a patient testing negative is truly free of the virus is approximately 0.997.
Probability Rules
Example 1: There are two professors. The first is Professor Scedastic whose class has 70 open seats and in which
where Molly's chances of passing are 0.8. In contrast, with Professor Kurtosis, Molly's chances of
passing are 0.6 and there are 50 seats in this professor’s class.
(a) If Molly gets randomly assigned to a class, what are her chances of passing?
(b) If Molly doesn’t pass the statistics class, what is the probability that she was randomly assigned
to Professor Kurtosis’ class?
Let S represent the event that a student is assigned to Professor Scedastic's class.
Let K represent the event that a student is assigned to Professor Kurtosis' class.
Let P represent the event that a student passes the statistics class.
Let PC represent the event that a student does not pass the statistics class.
(a) If Molly gets randomly assigned to a class, what are her chances of passing?
A Solution
There are two ways for Molly to pass statistics:
(1) Molly will either be assigned to Professor Scedastic's class AND then she passes
OR
(2) Molly will be assigned to Professor Kurtosis' class AND then she passes.
Therefore, since the events (S and P) and (K and P) are disjoint (a.k.a. mutually exclusive) – they cannot both
happen at the same time – then …
= 43/60
~ 0.717
Answer: The probability that Molly passes the statistics class she is randomly assigned to is approximately
0.717.
(b) If Molly doesn’t pass the statistics class, what is the probability that she was randomly assigned to
Professor Kurtosis’ class?
A Solution
= 10/17
~ 0.5882352941
~ 0.588
Answer: If Molly doesn’t pass the statistics class, the probability that she was randomly assigned to
Professor Kurtosis’ class is approximately 0.588.
A Venn Diagram
Problem 39 in Chapter 14.
Example: According to estimates from the federal government’s 2010 National Health Interview Survey, based on
face-to-face interviews in 16,676 households, approximately 63.6% of U.S. adults have both a landline
in their residence and a cell phone, 25.4% has only cell phone service bu no landline, and 1.8% have no
telephone service at all.
(a) Polling agencies won’t phone cell phone numbers because customers object to paying for such calls.
What proportion of U.S. households can be reacherd by a landline call?
0.254 0.636 ??
A Solution
Let C represent the event “American adult has cell phone service.”
Let L represent the event “American adult has landline phone service.”
P ( L ) = ?? + 0.636
P ( C L ) = 1 − 0.018 = 0.982 .
Thus,
AC SL
0.254 0.636 ??
0.018
0.254 + 0.636 = 0.89
(b) Are having a cell phone and having a landline independent? Explain.
A Solution:
Informally, two events are independent if knowing that one of the events has occurred, or not, changes the
liklihood of the second event occurring, or not. For example, knowing a person is a professional basketball
player affects the liklihood of the person working-out at least once a week during the season. Thus, the events
person is a professional basketball player and person works-out at least once a week during the season are
not independent.
( )
In this problem, we now know: P ( C ) = 0.89 , P L = 0.728 , P C ( )
L = 0.636 , and since
P C ( ) ( ) (
L = P L C , we also know that P L C = 0.636 . )
Using the (General) Multiplication Rule, we see that
P ( C L ) 0.636 0.636 1000 636 159
P (C L ) = = = = = 0.8736 .
P ( L) 0.728 0.728 1000 728 182
Therefore, since P ( C L ) 0.8736 is not equal to P ( C ) = 0.89 , then events C and L are not independent;
C and L are dependent events.
Alternatively, one can reach the same conclusion by comparing the values of P ( L C ) and P ( L ) .
( )
P L = 0.728 . These probabilities are close, but they aren’t the same. Therefore, events C and L are not
independent; C and L are depedent events.
It is the second approach that leads to the values given in the answer to this problem in Appendix C of the
textbook.
Answer: No, having a cell phone and having a landline are not independent.
A Table
Example: Suppose 40 of 100 people are smokers. Suppose 70 of these 100 people are women and 14 of these
women are smokers. Address parts (a) – (f).
Choosing to organize the information in a table – a Contingency Table like we saw in chapter 2 -- I
create the following way to summarize what’s given in this problem.
(a) Suppose one person is randomly selected. What is the probability that the person is a woman?
A Solution
( )
From the table, P W =
70
100
=
7
10
= 0.7 .
Answer: The probability that the randomly selected person is a woman is 7/10 or 0.7.
(b) Suppose one woman is randomly selected. What is the probability that the woman is a smoker?
A Solution
(
From the table, P S W = ) 14
70
=
1
5
= 0.2 .
14
( ) = 100
( )
P S W 14 1
By a probability rule, P S W = = = = 0.2
P W ( ) 70 70 5
100
Answer: The probability that the randomly selected woman is a smoker is 1/5 or 0.2.
(c) Suppose one person is randomly selected. What is the probability that the person is a woman and a
smoker? That is, what is the probability that the person is a woman smoker?
A Solution
(
From the table, P W S = ) 14
100
=
50
7
= 0.14 .
(
By a probability rule, P W )
S = P W P S W = ( ) ( ) 70
100 70
14
=
14
100
=
7
50
= 0.14
Answer: The probability that the randomly selected person is a woman is 7/50 or 0.14.
(d) Suppose one smoker is randomly selected. What is the probability that the smoker is a woman?
A Solution
(
From the table, P W S = ) 14
40
=
7
20
= 0.35 .
14
( ) = 100
( )
P W S 14 7
By a probability rule, P W S = = = = 0.35
P S ( ) 40 40 20
100
Answer: The probability that the randomly selected smoker is a woman is 7/20 or 0.35.
(e) Suppose one person is randomly selected. What is the probability that the person is a woman or a
smoker?
A Solution
(
P W S = ) 14
+
56
+
26
100 100 100 100 25
=
96
=
24
= 0.96 .
(
By a probability rule, P W ) ( ) ( ) (
S =P W +P S −P W S )
70 40 14
= + −
100 100 100
96 24
= = = 0.35
100 25
Answer: The probability that the randomly selected smoker is a woman is 7/20 or 0.35.
(f) Suppose two people are randomly selected. What is the probability that both are non-smokers?
A Solution
Recall:
Smoker (S) Not a Smoker (Sc) Total
Woman (W) 14 56 70
Man (Wc) 26 4 30
Total 40 60 100
Since it wasn’t specified whether this sampling is done with replacement or without replacement,
the assumption is that it was done without replacement because the implication is that we end-up
with a sample comprised of two people … not one person that is then replaced back into the
population of 100 people before the second person is selected.
From the table, the probability that the first person selected is a non-smoker is
( )
P S1c =
60
100 5
3
= = 0.6 .
Now, since sampling is done without replacement, there are only 59 non-smokers remaining among
99 people when the second person is randomly chosen, thus, the probability that the second person
selected is a non-smoker is P S 2 c =
59
99
( )
= 0.59 0.596 .
Therefore, the probability that both randomly selected people are non-smokers is
P S1( ) ( )
S2 = P S1 P S2 S1 ( )
60 59
=
100 99
3540
=
9900
59
=
165
Answer: The probability that the randomly selected smoker is a woman is 59/165 or 0.357 .
NOTE: Had this problem been worded as “Suppose two people are randomly selected with
replacement. What is the probability that both are non-smokers?” then the probability
that the second person selected was a non-smoker would have been the same as for
( ) ( ) ( )
the first person, and so, P S1 S 2 = P S1 P S 2 S1 = P S P S =( ) ( ) 60 60
100 100
3600 9
= = = 0.36 .
10000 25
To quantify the distance, measured in standard deviations, that a data value is away from the mean of the distribution’s
y− y
mean, we use a z-score. If y represents the name of the quantitative variable, then z = , where z measures the
sy
x−x
number of s y increments that y is from y . If x represents the name of the quantitative variable, then z = ,
sx
where z measures the number of s x increments that x is from x . If weight represents the name of the quantitative
weight − weight
variable, then z = , where z measures the number of sweight increments that weight is from weight . A
sweight
positive z value indicates that the original data value is above the mean. A negative z value indicates that the original
data value is below the mean. The larger the absolute value of the z-score z is, the more unusual the data value is.
Example: The average weight of a sample of fictional mammals called Morponias is 72 kilograms and the standard
deviation is 4 kilograms. Delgridge, a Morponia, weighs 82 kilograms. If I choose to define variable y to represent the
weight of a Morponia mammal in kilograms, then we’ve been given y = 72 kilograms and s y = 4 kilograms. The z-
y− y
score of Delgridge’s individual weight of y = 82 kilograms is z =
sy
82 − 72
= = 2.5 standard deviations. This tells us that Delgridge’s weight of 82 kilograms is 2.5 standard deviations
4
above this sample’s mean weight of 72 kilograms. That is, Delgridge’s weight is 2.5 s y = 2.5 4 = 10 kilograms
greater than 72 kilograms, which is indeed the case since 72 + 10 does equal 82! If you knew that Delgridge’s cousin,
Plwick, has a z-value of −0.5 then we an determine Plick’s weight as follows:
y− y
z= zs y = y − y zs y + y = y , where with z = −0.5 standard deviations, y = 72 kilograms, and s y = 4
sy
kilograms, we find y = ( −0.5)( 4 ) + 72 = −2 + 72 = 70 kilograms.
The phrase “shift the data” refers to moving the data set’s center, but keeping the spread the same. In contrast, “rescaling
the data” changes both the center and the spread. If you were to look at a histogram, for example, of data collected on a
quantitative variable, x, and then create a second histogram of the new variable w , where w = x − x , the two histograms
would look the same and they would have the same spread, but the center of each would be different. Specifically, the
mean of the distribution of the original variable x would be x while the mean of the new distribution comprised of
shifted data values (i.e. the values of w) would be 0. If you were now to construct a third histogram of shifted and
x−x
rescaled x values, say of values of z = , then a comparison with the histogram of the x values and the z values
sx
would show that the mean and standard deviation had changed from x and s x to 0 and 1, respectively. The mean and
standard deviation of a distribution of z-scores is always 0 and 1.
A Normal Distribution Model is a way to make assumptions about our distribution based on the shape of the distribution.
A distribution of a quantitative variable that is unimodal, symmetric, and bell-shaped is one for which a Normal
Distribution model is appropriate.
A Normal model is named based on its mean and standard deviation. Because it is a model, we use for the mean
and for the standard deviation. These two numbers are referred to as the parameters of the model. The notation
N ( , ) is used to designate the specific Normal Distribution Model being applied. The formula to convert values of
y−
a quantitative variable, say y, that is well-modeled by a Normal Distribution, to z-scores is z = … an equation
value − mean
in the same form as is used for any distribution z = . If variable y is well-modeled with a
standard deviation
y−
Normal Distribution N ( , ) , then when the values of y are converted to z-scores using z = , then the
distribution of these standardized scores will be the Standard Normal Model N ( 0,1) .
Unless specifically told that a Normal Model applies, one must verify that a set of data meets the “Nearly Normal
Conditions” before employing a Normal Model. These conditions can be checked by making a histogram and
verifying that it is unimodal, symmetric, and bell-shaped. You can also check to see if the mean, median, and mode
are approximately equal.
• In a Standard Normal Distribution Model, approximately 68% of the data falls within one standard deviation,
z = 1 , of the mean z = 0 . That is, approximately 68% of the data lie in the interval −1 z 1 . This means
that approximately 68% of the data in a Normal Distribution with mean and standard deviation lie in the
interval − y + .
• In a Standard Normal Distribution Model, approximately 95% of the data falls within two standard deviations,
z = 1 , of the mean z = 0 . That is, approximately 95% of the data lie in the interval −2 z 2 . This means
that approximately 95% of the data in a Normal Distribution with mean and standard deviation lie in the
interval − 2 y + 2 .
• In a Standard Normal Distribution Model, approximately 99.7% of the data falls within three standard
deviations, z = 1 , of the mean z = 0 . That is, approximately 99.7% of the data lie in the interval −3 z 3
. This means that approximately 99.7% of the data in a Normal Distribution with mean and standard
deviation lie in the interval − 3 y + 3 .
To draw a Normal Distribution, please the mean on the horizontal axis at the center. To find the location that
corresponds to one standard deviation above the mean, move to the upper inflection point -- the point on the right half
of the graph where the graph changes from being an upside-down ‘U’ shape to a right-side-up ‘U’ shape. Label this
location on the horizontal axis + if the horizontal axis represents the non-standardized Normally Distributed
variable; name this location z = 1 if the horizontal axis represents the z-scores of a Standardized Normal Distribution.
The inflection point on the lower half of a Normal Distribution’s curve corresponds to − and z = −1 .
Calculations involving Normal Models can involve determining an area under a Normal Distribution’s curve or any of
y−
the quantities (x, y, , and ) in the formula z = since algebra can be used to rewrite this formula with each of
y− y−
the quantities isolated: z = , y = z + , = , and = y − z .
z
Technology is used to compute areas under a Normal Distribution’s curve in statistical research (and it’s available, and
recommended, for use on our homework problems). A free, simple-to-use app is available at https://s3-us-west-
2.amazonaws.com/oervm/stats/probs.html. [Leave the radio button next to “Normal Distribution” selected, enter
values for the “Mean” and “Standard deviation”, then select the radio button next to the type of percentage problem
under the “Probability” heading within the “Calculate” options that you wish to compute -- P x ( ) (
, P x )
, or
P ( x ) -- and enter the value(s) before clicking-on the gray “Calculate” button.]
Area approximations reported to the nearest ten-thousandths involving approximate z-scores reported to the nearest
hundredth can be obtained via a Z Table. Here’s a nice explanation of how to read a Z Table like the one that appears
in Appendix F of our textbook: https://www.youtube.com/watch?v=85G_PLBTX00.
Technology is used to determine the ‘cut score’ for a given percentile in statistical research (and it’s available, and
recommended, for use on our homework problems). A free, simple-to-use app is available at https://s3-us-west-
2.amazonaws.com/oervm/stats/probs.html. [Leave the radio button next to “Normal Distribution” selected, enter
values for the “Mean” and “Standard deviation”, then select the radio button next to the type of inverse probability
(
problem you wish to compute -- P ( x ?? ) or P ( x ?? ) , or P x ) -- and enter the area value before clicking-
on the gray “Calculate” button.]
A Z Table can be used to determine “cut scores” for specified percentiles. Once the corresponding percentile, or as
close to it as you can get, is found in the Z Table, it can be translated to a value of the original Normally Distributed
y−
variable by using the formula z = . Here’s a nice explanation of how to read a Z Table like the one that appears
in Appendix F of our textbook: https://www.youtube.com/watch?v=9KOJtiHAavE.
Don’t round any values until all calculations have been generated. Wait to round at the very last step if you must
round.