0% found this document useful (0 votes)
46 views44 pages

MATH& 146: Midterm Synopsis: CHAPTER 1: Stats Starts Here

This document provides an overview of key concepts in sampling and statistics. It discusses different sampling methods like simple random sampling, stratified random sampling, and cluster sampling. It explains important terminology like population, sample, parameter, and statistic. It also covers potential sources of bias in sampling and measurement like selection bias, measurement bias, and nonresponse bias. The goal is to select samples that accurately represent the overall population and minimize bias.

Uploaded by

charity joy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views44 pages

MATH& 146: Midterm Synopsis: CHAPTER 1: Stats Starts Here

This document provides an overview of key concepts in sampling and statistics. It discusses different sampling methods like simple random sampling, stratified random sampling, and cluster sampling. It explains important terminology like population, sample, parameter, and statistic. It also covers potential sources of bias in sampling and measurement like selection bias, measurement bias, and nonresponse bias. The goal is to select samples that accurately represent the overall population and minimize bias.

Uploaded by

charity joy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

MATH& 146: Midterm Synopsis

CHAPTER 1: Stats Starts Here


• Data must have context to be useful
• Be able to identify the following in context:
o Who: The individual cases about whom the experimenter is recording characteristics. Typically, this
is comprised of individuals in a subset (i.e. the sample) of the entire group of individuals the
experimenter wishes to learn about (the population). If the entire population is studied, the study is
referred to as a census.
o What: The variable(s) that the experimenter is collecting information about -- categorical or
quantitative.
o How: The process by which the data is collected is often the difference between useful and not useful
information.
o When & Where: Provides context on the study.

CHAPTER 11: Sample Surveys


• A census is a data collection method where the entire population is measured. In most cases this is not possible,
and instead the experimenter studies a subset of the population called a sample.
• The method used to collect a sample is very important. Poor sampling techniques will ruin any attempt at
analysis!
• Population: This is the group of cases or individuals (e.g. people, cars, candies, etc.) that the experimenter wishes
to study, but usually it is impossible to study the entire population for logistical and/or financial reasons. The size
of the population is denoted by N.
• Sampling Frame: Since you can’t measure every individual in the population, you’ll need to select a
representative sample instead. Often, the method use to select a sample will not actually be able to reach every
individual. The part of that population that you can get is called the sampling frame. You want the sampling frame
to be as large as possible—the same as the population, if you can.
• Sample: The sample is the group of individuals that are actually selected and measured. The size of the sample is
denoted by n. Ultimately, we will use information about the sample to say something interesting (and
mathematically defensible) about the population. Perhaps (hopefully) we’ll even get some sort of answer to the
question that started the whole process. We want the sample to be representative of the population.
Randomization is when the experimenter selects the sample subjects as randomly as possible in or to protect from
the influence of all features of the population by making sure that, on average, the sample looks like the entire
population.
The sample needs to be large enough to be representative. Think of a pot of minestrone soup. If you wish to
sample the soup, using a tiny spoon wouldn’t produce a sample big enough to be representative of the entire pot of
soup … it wouldn’t be a sample containing beans, carrots, tomatoes, onions, and pasta. Using a tablespoon or a
soup ladle, however, would produce a sample big enough to get a miniature version of what’s in the entire pot.

Parameter: A population’s parameters are the calculated number types (e.g. mean, median, IQR, proportion, standard
deviation, etc.) the experimenter wishes they had values of for the entire population. Unless a census is performed,
which is often not possible due to logistical and financial reasons, we use calculations from a sample, statistics, to
estimate the population parameters.
Statistic: A sample’s statistics are the calculated number types (e.g. mean, median, IQR, proportion, standard
deviation, etc.) the experimenter computes the values of using the data collected on variables from the members of the
sample. The more representative the sample is of the population, the better the statistic will reflect the value of the
parameter.
Good Sampling Strategies
Simple Random Sample
The best (theoretically, at least) kind of sample is the Simple Random Sample. A Simple Random Sample (SRS; of
size n) is chosen in such a way that every possible group (of size n) from the population has an equal chance of being
selected. The best example to get the idea of an SRS is drawing names out of a hat (or slips of paper out of a box…or
something equivalent).

Since every group has an equal chance of being selected, every individual also has an equal chance. This is not
reversible! If every individual has an equal chance, there is no guarantee that every group has an equal chance.

Theoretically, the simple random sample gives you the best chance of obtaining a sample that is representative of the
population. Also, this type of sample makes probability calculations fairly simple. In fact, all of the inference
procedures that you’ll learn in this course are based on the simple random sample.

Stratified Random Sample


To take a Stratified Sample, first divide the population into homogenous groups (strata). Now take an SRS from each
group. Combine these individual SRS’s into a single sample from the population.

For example, I could divide the student body into Freshmen, Sophomores, Juniors and Seniors; then take an SRS from
each group; and finally combine these to make a single sample.

The point of this type is to ensure that known subgroups of the population are represented in the sample. We won’t talk
about why you would use this version instead of a simple random sample—that’ll come if you take any more statistics
courses at the University level.

Cluster Random Sampling


Cluster Samples are like stratified samples in that there are strata in the population. The difference is that a cluster
sample uses naturally occurring groups that are heterogeneous (“all mixed up”). The idea is that each group (each
cluster) is individually representative of the population—so we take a sample of those clusters (to help deal with
random variation in the various clusters).

Cluster sampling is a good way to get a representative slice of the population -- assuming, of course, that there isn’t
some other reason why the individuals are clustered together.

Systematic Random Sampling


In a Systematic Random Sample, there is some “stream” of individuals from which to select. The method involves
selecting every kth individual (every 10th, or every 50th, for example).

The first individual is selected randomly. After that, the system kicks in … pick every kth individual from the first one
selected. This method is useful in a lot of case -- people entering a sporting event, bottles rolling off of an assembly
line.

Multistage Sampling
A multistage sample combines one or more of the above methods to form a sample. This is, actually, a widely used
technique. Example: Separate high school students by grade, then randomly choose two classrooms from each grade,
and finally randomly choose 10 students from each classroom; this involves the Stratified, Cluster and SRS sampling
strategies.

Bad Method of Sampling


Convenience Sampling
When the subjects are chosen based on ease of access, then you have a Convenience Sample. For example, if I want to
know about the opinions of Shoreline CC students, and I decide to sample from students in my classes only, then I’ve
chosen my sample from those that are easiest to reach -- that’s a convenience sample. It’s a pretty good bet that
convenience samples are not representative of the population.

Voluntary Response Sampling


When the subjects select themselves for the sample, then you have a Voluntary Response Sample. For example, call-in
polls use voluntary response (since the caller decides to call in, and become part of the sample). Voluntary Response
is bad -- there is often some reason why the subjects join the sample, and this reason may have some effect on what
you are trying to measure.

Problems with Measurement


Bias: Something that creates results that are different from they should be; in other words, something that
systematically favors certain outcomes or measurements over others.

Selection Bias: This type of bias is introduced during the selection process where certain individuals are given greater
(than intended) probabilities of being selected, or are excluded from the selection process. Failing to include all
individuals in the selection process is often called undercoverage. Notice that a voluntary response sample (or even a
convenience sample) automatically suffers from selection bias! Using a good sampling method should minimize any
selection bias.

Measurement Bias: This type of bias is introduced when the measurement process tends to give results that differ
(systematically) from the population. For example, if a light meter is not properly calibrated, then the measurements it
gives will not be correct! A common source of measurement bias is wording bias—the way in which a question is
worded can often have an effect on the responses.
Some people use the terms measurement bias and response bias synonymously, but they don’t quite mean the same
thing. Measurement bias refers exclusively to problems with the measurement device, where response bias refers to
anything that might affect the measured results—for example, you are likely to get very different answers to a survey if
the person conducting the survey is disguised.

Nonresponse Bias: This type of bias is introduced when individuals (people, mostly) refuse to be measured and/or
refuse to answer questions. Telephone surveys suffer from this. This type of bias is almost unavoidable, so minimizing
its effects is important. There are a host of methods (beyond the scope of this course) to address this.

CHAPTER 12: Experiments and Observational Studies


A observational study is when we observe a group of subjects and note what happens to them over time.
A retrospective study identifies subjects and looks at their pasts.
A prospective study identifies subjects and collects data as events unfold.

A study can never show a causal relationship between variables. There can always be other unknown variables such as
lurking or confounding variables. Studies can show trends and possible relationships that can be investigated through
experimentation.

An experiment is where the experimenter makes a change to one variable (variation is introduced) and changes in
another variable are observed (variation is measured). In an ideal situation, that’s all the variation there is…in reality,
there are other sources of variation. Identify them. Control them when you can. Account for them when you can’t.
An experiment has at least one explanatory variable, otherwise called a factor, and least one response variable to
measure.
The keys to a well-run experiment is active and deliberate manipulation of factors and random assignment of subjects
to treatments.
Levels: The levels in an experiment are the specific values of the factors used. For example, for the explanatory
variable or factor “Aspirin Dosage,” the (three) levels might be 100 mg, 200 mg, and 300 mg.
Treatments: Each treatment represents a different combination of levels from all the factors that a subject receives.
For example, suppose one factor in an experiment is “Hours of Sleep” and it has the three levels -- 2, 4, and 6 hours of
sleep, and a second factor in the same experiment is “Minutes of Exercise” and it has two levels -- 0 minutes or 30
minutes -- then there are six treatments in this experiment.
Treatment #1: 100 mg aspirin, 0 mins exercise Treatment #4: 100 mg aspirin, 30 mins exercise
Treatment #2: 200 mg aspirin, 0 mins exercise Treatment #5: 200 mg aspirin, 30 mins exercise
Treatment #3: 300 mg aspirin, 0 mins exercise Treatment #6: 300 mg aspirin, 30 mins exercise

Control treatments provide a baseline measurement; it could either be a previous treatment or no treatment.

In an experiment, factor levels are manipulated to create treatments. Subjects are randomly assigned to treatment
levels. The subject groups’ responses across treatment levels are compared.

NOTES:

A factor is the same thing as an explanatory variable, which in algebra, we called the independent variable (x).

The values of a factor are referred to as levels.

A treatment is the name given to each specific set of values on the factors. For example, if the two factors are 'hair
color' and 'socioeconomic status (SES)' with values red, brown, black, and blonde for 'hair color' and poor, middle
class, and rich for SES, then one of the many treatments would be the “poor blonde”.

Blinding: Knowledge of which treatments are being applied can affect the interpretation of results. Two groups can
be blinded:
Influencers: Subjects, treatment administrators, technicians, for example.
Evaluators: Judges, examiners, for example.

When everyone in one group -- Influencers or Evaluators -- is blinded, the experiment is single blind.
When everyone in both groups -- Influencers and Evaluators -- is blinded, the experiment is double blind.

Placebo: A placebo is a “fake” treatment that looks like the real thing.
Placebo Effect: The placebo effect is a result when people respond positively to a placebo treatment because they
believe will get an effect from it.
Four Principles of Experimental Design
Control: Sources of variation other than the factors are controlled by making the conditions as similar as possible
across treatment groups.

Randomize: Randomization allows the experimenter to equalize unknown or uncontrollable sources of variation.
Experimental units must be randomly assigned to treatments. Note that this doesn’t mean that the it is necessary that
all subjects in the experiment are randomly chosen; the subjects must be appropriate for the experiment. It is the
administration of treatment that must be randomly assigned.

Replication: Treatment should be applied to more than one subject. And, the entire experiment should be repeated on
different groups of subjects.

Block: If subjects that have similar characteristics are grouped together and then treatments are randomized within
each group/block. Common ways people are frequently blocked is by age, political views, socioeconomic status,
religion, ethnicity, etc.
One way to deal with unknown extraneous variables is randomization, but that isn’t always the best answer. Often, a
better choice is to use a matched pairs design. This involves matching one experimental unit in the experimental
group with one experimental unit in the control group—these two units are either selected because they are identical or
very similar, or they are made to be very similar. Often, they are matched by things like age, exercise habits, income
level, etc.).
Just because treatment groups have different results, this doesn’t mean that the results are important. If the differences
are larger than would be caused by randomization alone, then we say they are statistically significant. we think the
differences
Experimental results only apply to the group under study.

Three Main Types of Experiments


Completely Randomized Design: This is a design where all treatments are assigned at random.
Randomized Block Design: In this design, subjects are first blocked by some variable, then assigned treatments
(randomly) within each block. Thus, the experiment is split up into lots of little experiments (one per block).
Matched Pairs Design: This could be thought of as a very special kind of block design, where each block consists of
exactly two individuals that have been paired by one or more variables—in other words, they are similar or identical in
some manner. Within each pair, one individual is (randomly) assigned to the treatment, and the other to the control.

CHAPTER 2: Displaying and Describing Categorical Data


An important step to use with any kind of data is to make a picture. Follow the “Area Principle” -- the magnitude of
area should correspond to the value it represents.
Pie Charts (whole group of cases as a circle), Bar Charts, and Segmented Bar Charts are commonly used tools to
graphically convey data collected on a categorical variable. Bars in bar charts should be the same width and bars
corresponding to adjacent categories should not touch. In a Segmented Bar Chart, each bar is a whole and they are the
same height so that the relative frequencies of different ‘values’ of a categorical variable of two or different groups can
be compared.
Titles often accompany graphs; axes should be labeled; scales should be given.
Be aware of area and scales. It is easy to misrepresent data using inaccurate graphs or scales.
Another important step is to organize the data. With data collected on a single categorical variable, we do this by
either constructing a frequency table (records total counts in each category) or a relative frequency table (records
proportions or percentages in each category).
Contingency Table, also known as a Two-Way Table, is a table that contains information about two different
categorical variables. Numbers inside the table show conditional distributions while category totals, often writing
along the right side and bottom of the table, are marginal distributions. We can use a contingency table to see if there
is an association (relationship) between variables or if they are independent. If the distribution of one variable is the
same for all categories of another variable, we can say that the variables are independent.

CHAPTER 3: Displaying and Summarizing Quantitative Data


A histogram is the primary means of displaying data collected on quantitative variables.

While a histogram looks like a bar chart, they are different. Besides the fact that a bar chart is used for data collected
on a categorical variable and a histogram is used to graphically represent data collected on a quantitative variable,
another difference is that there are no gaps between adjacent bins in a histogram.

The heights of the bars in a histogram represents the number of data values in the each of the bins. The horizontal axis
consists of values of the quantitative variable; this axis is split into equal-sized bins which divide the numerical data
into ranges. A ‘good’ histogram has between 4 and 10 bins.

Histograms can show frequency or relative frequency; either choice will create similar looking distributions.

When you describe a quantitative variable’s distribution, think SOCS: Shape, Outliers, Center, Spread.
Shape: What does it look like? Unimodal, bimodal, uniform, symmetric? Is it skewed, and if so is it skewed left or
skewed right (the ‘tail’ points in the direction of the skew)?

Outliers: Are there data points that appear unusually small or large compared to the rest of the distribution? Not all
distributions have outliers. You can precisely determine whether-or-not a data value is an outlier by seeing if lies
beyond the fences. See Tukey’s Rule.

Center: Mean, Median, Mode, and Midrange are four measures of center.

The mean, which is often called the average by many non-statisticians, is the sum of all the data values
divided by
the number of data values. If the quantitative variable is named x, then the population’s mean is
N

x i
= i =1
, which
N

written less formally is,  =


x.
N

The mean of a sample’s data set is denoted by the variable name (e.g. x, y, weight, etc.) with a bar over it (e.g x , y ,
n

x i
weight , etc.). Thus, for the quantitative variable is named x, a sample’s mean is x = i =1
, which written less
n

formally is, x =
x .
n
One can estimate the mean of a distribution from its histogram by finding the ‘fulcrum’ -- the place where the
distribution would balance; think of the balance point on a see-saw or teeter-totter.
The median is the middle value of an ordered data set. If there are an even number of data values, then the median
is the mean of the middle two value.

The median of a sample’s data set is denoted by the variable name (e.g. x, y, weight, etc.) with a tilde over it (e.g x ,
y , weight , etc.).

One can estimate the median of a distribution from its histogram by estimating the location on the horizontal
axis where the area of the rectangles to the left equals the area of the rectangles to the right.

The mode is the most frequent value in the set of data.


One can estimate the mode(s) of a distribution by finding the tallest rectangle(s).

See https://www.youtube.com/watch?v=ik7LAJg432k
and https://www.youtube.com/watch?v=-haVXyCHMSs for illustrations.

The midrange is the mean of the smallest and largest values in the data set.

Spread: The dispersion of quantitative data. A measure of how spread-out the data in the data set is.

The minimum and maximum values are often reported.

The range equals Maximum - Minimum.

A data set that has been split into quartiles conveys where each of 25% of the data lies. The lower quartile contains
the lower 25% of the data and the upper quartile measures the upper 25% of the data.

The InterQuartile Range (IQR) measures the spread or range of the middle 50% of the distribution;
IQR = Q3 − Q1 , where Q1 is the median of the lower half of the ordered data set and Q3 is the median of the upper half
of the ordered data set

The Five Number Summary, which includes the following five numbers: Minimum, First Quartile ( Q1 ) , Median,
Third Quartile ( Q3 ) , and Maximum, where the median is denoted by  if working with the population and it is
denoted by x if working with a sample and the quantitative variable is named x.

Two other important measures of spread are variance and standard deviation.

A deviation is the difference between two values. The variance and standard deviation both involve the deviation
between a specific value in the data set and the mean of the data set: xi −  and xi − x , for data collected in a
population and a sample, respectively, on a quantitative variable is named x. In a population, the variance is denoted
by  2 and in a sample it is denoted by s 2 .
The variance of a set of data collected on a quantitative variable named x in a population is the average of every data
N

( x −  )
2
i
value’s deviation from the mean:  2 = i =1
.
N

The variance of a set of data collected on a quantitative variable named x in a sample is the average (almost) of every
n

( x − x )
2
i
data value’s deviation from the mean: s 2 = i =1
.
n −1
N n

( x −  ) ( x − x )
2 2
i i
The standard deviation of a set of data is the square root of the variance:  = i =1
and s = i =1
for
N n −1
a population and a sample, respectively.

Note: Although the standard deviation of a sample’s data set is typically denoted by s, sometimes the variable for
which the standard deviation has been computed may appear as a subscript, such as s x , s y , and sweight , for
example.

In summary, variance = ( standard deviation ) and standard deviation = variance .


2

Other ways to display data collected on a quantitative variable include Dot Plots, Stem-and-Leaf Plots (also called
Stemplots), and Boxplots (also called Box & Whisker Plots). Boxplots are nearly as popular as histograms for visually
displaying quantitative data.

A Boxplot is a drawn-to-scale graph of the Five Number Summary. If an outlier exists in a set of data (see Tukey’s
Rule below for calculation details), and one choose to distinguish it as a point separate from the boxes and whiskers of
a traditional boxplot, then this is called a Modified Boxplot.

Tukey’s Rule: An outlier is defined to be any value less than the Lower Fence or any value greater than the
Upper Fence, where Lower Fence = Q1 - 1.5(IQR) and Upper Fence = Q3 + 1.5(IQR).

For data sets containing more than 10 values, the computation of a quantitative variable’s descriptive statistics, as well
as the construction of a graph, is typically done using statistical functions within a particular technology type (e.g.
Excel, StatCrunch, Excel, R, a Texas Instruments calculator such as the TI-84, an app, etc.).

You should, however, be able to determine the mean, median, mode, midrange, range, 5-Number Summary, lower
fence, and upper fence of a data set comprised of no more than 10 values using paper-and-pencil calculations. You
should be able to compute the variance and standard deviation of a data set comprised of no more than 5 values using
paper-and-pencil calculations.

CHAPTER 4: Understanding and Comparing Distributions


A timeplot is a graph of a quantitative variable (on the vertical axis) with time on the horizontal axis.

Boxplots help one see the lower 25% of a data set, the middle 50% of a data set, and the upper 25% of a data set. This
can be helpful when comparing multiple distributions as can identifying the locations of each distribution’s Five
Number Summary’s values (e.g. the minimums, the first quartiles, the medians, the third quartiles, and the
maximums).
When comparing distributions using histograms or boxplots, be sure compare SOCS: Shape, Outliers, Center, Spread.

Example: The age distributions of the residents in two neighboring counties in Florida are quite different. The box-
plots show these distributions, where the ages are measured in years. Describe these populations.

County 1

County 2

Applying SOCS to describe the ages of the residents in these two counties, we see that each age distribution
is skewed-right. Consequently, we can conclude that in each county, the mean age is higher than the median
age of the residents.

The median age of County 1’s population is 44 years while the median age of County 2 is 29 years. Thus,
on average, the residents of County 1 are older than those people living in County 2. More specifically,
since the median (i.e. second quartile) of County 1, 44 years, is approximately equal County 2’s third
quartile, 43 years, we can deduce that 50% of County 1’s population’s age is at least 44 years while only
25% of County 2’s population is 43 years or older.

The first, second, and third quartiles have approximately the same range -- 23 years, 21 years, and 20 years,
respectively -- in County 1. The same is true for the age distribution of County 2, where the range of the
first, second, and third quartiles are 15 years, 14 years, and 14 years, respectively.

Since the IQR of the ages for County 1 is 64 – 23 = 41 years while the IQR of the ages for County 2 is
43 – 15 = 28 years, we know that the ages of the middle 50% of the residents in County 1 are more disperse
(i.e. more spread out) than the ages of the middle 50% of the residents in County 2.

The Upper Fence for County 1 is Q3 + (1.5)(IQR) = 64 + (1.5)(41) = 125.5 years and the Lower Fence for
County 1 is Q1 – (1.5)(IQR) = 23 – (1.5)(41) = −38.5 years. Since the minimum age in County 1 is 0 years
and the maximum age is 109 years, there is no resident whose age is either less than the Lower Fence of
−38.5 years or greater than the Upper Fence of 125.5 years, we can conclude that there are no outliers in
the age distribution of County 1.

Similarly, the Upper Fence for County 2 is Q3 + (1.5)(IQR) = 43 + (1.5)(28) = 85 years and the Lower
Fence for County 2 is Q1 – (1.5)(IQR) = 15 – (1.5)(28) = −27 years. Since the minimum age in County 2
is 0 years and the maximum age is 108 years, there is at least one resident whose age is an outlier because
although there is no one whose age is less than the Lower Fence of −27 years, there is at least one person
(the person who is 108 years old) who’s age is greater than the Upper Fence of 85 years.

CHAPTER 13: From Randomness to Probability

Basic Terms
A random experiment is any activity which has a measurable result. The term measurable is used in a very open
sense—it does not imply that numbers are involved! Thus, the results can be quantitative or qualitative.
A trial is each occasion that we observe in random phenomena.
An outcome is any (simple) result of one trial of a random experiment. It is the value of the random phenomena during
a trial. When listing outcomes, we’ll use some short description (numbers, letters, words—something) to identify each
outcome.
An event is a group (set) of related outcomes. We typically use capital Roman letters (e,g, A, C, etc.) to identify
events.
The sample space is the group (set) of all possible outcomes for a random experiment. Since this is sometimes called
the universe, the Roman capital U is often reserved to identify the sample space.

There are three different types of probability.


Subjective probability is probability based on personal beliefs. You hear people say things like the “I think your
chances of winning tomorrow’s race are 70-30.” This is the least useful type of probability.
Empirical probability is based on actual observations—we flipped a coin 500 times and 489 of the flips came up
heads, so we say that the probability of heads is 489 500 . This is a useful way to do things, but not exactly how we
number of times E occurs
approach probability. Thus, P ( E ) = . Remember, empirical probability is based on
total number of trials
observation.
Theoretical probability—think of this as empirical probability with an infinite number of trials. For example, we’d
expect that out of an infinite number of flips, half of them should land on heads. This
approach is also called a frequentist approach—where probability is defined in terms of a long-term frequency—and
this is the approach we used. Theoretical probability is, therefore, based on a mathematical model, not observation.

Law of Large Numbers: As the number of independent trials increases, the long-run relative frequency of repeated
events approaches a single value … it approaches the theoretical probability’s value.
The (theoretical) probability of an event is defined as the number of outcomes in the event divided by the number of
E
outcomes in the universe: P ( E ) = , where E represents the number of outcomes in the event and U represents
U
the number of outcomes in the universe. That is, the vertical bars “number of” or “size of,” not “absolute value.” This
formula works only if the outcomes in the universe are all equally likely.

Properties of Theoretical Probability

Probability is a number (fraction, a reduced fraction, a decimal, or a percentage) between 0 and 1 (0% and 100%) for
event A. Symbolically, 0  P ( A)  1 . If P ( A) = 0 , then event A cannot occur; if P ( A) = 1 , then event A always
occurs.
The set of all possible outcomes of a trial must have probability equal to 1. In symbols, P (U ) = 1 .

Ac represents the complement of event A and consist of all events that are not A.

=U

The Complement Rule: P ( A ) = 1 − P ( Ac ) , or equivalently, P ( Ac ) = 1 − P ( A ) .


Addition Rule: If events A and B are disjoint, then the probability that one or the other occurs is the sum of their
individual probabilities. In symbols, P ( A or B ) = P ( A B ) = P ( A) + P ( B ) , where the symbol means ‘union’.

Events are disjoint or mutually exclusive if they have no outcomes in common.

Multiplication or Product Rule: If events A and B are independent events, then the probability that both A and B
occur is the product of their individual probabilities. In symbols, P ( A and B ) = P ( A B ) = P ( A)  P ( B ) , where the
symbol means ‘intersection’.

Events are independent if the probability that one event occurs in no way affects the probability of the other event
occurring.

Important: Many statistics methods require an Independence Assumption.

Examples

1. A bag contains four marbles—one each of black, white, red and blue. You reach in and pull out two.
(a) List the universe for this experiment.
(b) List the outcomes in the event “at least one marble is black.”
(c) What is the probability that at least one marble is black?

I elect to use a capital B to represent the outcome “marble is black,“ a lowercase b to represent the outcome “marble is
blue,” ,“ a lowercase w to represent the outcome “marble is white,” and a lowercase r to represent the outcome
“marble is red.”

(a) The universe, U, is {Bw, wB, Br, rB, Bb, bB, wr, rw, wb, bw, rb, br}.
(b) The event “at least one marble is black” is {Bw, wB, Br, rB, Bb, bB}.
(c) The probability that at least one marble is black is 6 12 = 1 2 .

Observation: To draw two marbles simultaneously from the bag is the same as drawing the first marble and then
drawing the second without having put the first one back into the bag … it is the same as drawing two marbles without
replacement.

Besides describing the universe or sample space in list form, as was done above in part (a), one can also a use tree
diagram.

Tree diagrams are a way of showing combinations of two or more events. Tree diagrams are quite helpful when
attempting to show sequences of events and/or when attempting to calculate complicated probabilities.

Each branch is labelled at the end with its outcome and the probability is written alongside the line. Specifically, each
branch represents a possible outcome while an entire path, along multiple branches, represents an event. The
Multiplication Rule is applied along one pathway while the Addition Rule is applied across pathways. The sum of the
probabilities associated with each set of branches must equal 1. The sum of the probabilities associated with every
path must also 1.
I personally choose to create a tree diagram for problems with five or fewer sets of branches; otherwise, it becomes too
unwieldy. In those cases, I look to see if a different mathematical model applies, such as the Central Limit Theorem
for Proportions in this course or the Binomial Model in a second probability course.
Below is a tree diagram illustrating the experiment where you reach in and pull out two marbles from a bag containing
four marbles—one each of black, white, red and blue … where sampling is done without replacement. As you can see,
once a marble of a particular color is withdrawn on the first draw, it is impossible in this case for the second marble to
be of the same color since there is only one of each color marble in this problem. Hence, there are four branches
corresponding to the possible outcomes for the first marble and only three for the second marble.

B w

w
b
r

B
w b

r b

w
Thus, the probability of getting at least one black marble among the two randomly chose marbles is …

P ( at least one B ) = P ( Bb Bw Br bB wB rB )
= P ( Bb ) + P ( Bw) + P ( Br ) + P ( bB ) + P ( wB ) + P ( rB )
= 14  13 + 14  13 + 14  13 + 14  13 + 14  13 + 14  13
= 121 + 121 + 121 + 121 + 121 + 121
= 126 = 12 .

Since this problem involved the phrase “at least”, one could opt to approach the problem by using the complement.
That is, P ( at least one B ) = 1 − P ( no B )
= 1 − P ( bw br wb wr rb rw)
= 1 −  P ( bw ) + P ( br ) + P ( wb ) + P ( wr ) + P ( rb ) + P ( rw )
= 1 −  14  13 + 14  13 + 14  13 + 14  13 + 14  13 + 14  13 
= 1 −  121 + 121 + 121 + 121 + 121 + 121 
= 1 − 126 = 12
12 − 12 = 12 = 2 .
6 6 1

Although in this particular example, using the complement in this problem involving “at least” involved just as many
calculations as not, it is often the case that using the complement in problems involving “at least” requires significantly
fewer calculations.

2. A bag contains four marbles—one each of black, white, red and blue. You reach in and pull out one marble, |
observe its color, and replace it, before you draw a second marble.

(a) List the universe for this experiment.


(b) List the outcomes in the event “at least one marble is black.”
(c) What is the probability that at least one marble is black?

I elect to use a capital B to represent the outcome “marble is black,“ a lowercase b to represent the outcome “marble is
blue,” ,“ a lowercase w to represent the outcome “marble is white,” and a lowercase r to represent the outcome
“marble is red.”
(a) The universe, U, is {BB, ww, rr, bb, Bw, wB, Br, rB, Bb, bB, wr, rw, wb, bw, rb, br}.
(b) The event “at least one marble is black” is {BB, Bw, wb, Br, rB, Bb, bB}.

(c) The probability that at least one marble is black is 7 16 .


B

b
B
w

b
b w

b
w
w

b
r
w

Thus, the probability of getting at least one black marble among the two randomly chose marbles is …

P ( at least one B ) = P ( BB Bb Bw Br bB wB rB )
= P ( BB ) + P ( Bb ) + P ( Bw) + P ( Br ) + P ( bB ) + P ( wB ) + P ( rB )
= 14  14 + 14  14 + 14  14 + 14  14 + 14  14 + 14  14 + 14  14 = 161 + 161 + 161 + 161 + 161 + 161 + 161 = 167 .

Sampling is done without replacement in the majority of circumstances since researchers don’t typically create a
sample from a population by drawing the first unit (person, TV, etc.), performing the study on them, and then returning
that unit back into the population so that it’s possible that it is drawn again.

As you can see from the previous two examples involving the four colored marbles, this has a sizable effect on the
probabilities since the denominator changes size from N to N − 1 after the first unit is drawn from the population of
size N. Specifically, in the examples involving marbles, the population size was N = 4 and this was the denominator
of the fractions associated with the first marble, but it dropped by 1 to 3 in those fractions associated with the second
marble in the case where sampling was drawn without replacement. Observe that in this case the sample size was
n = 2 while the population size was N = 4 , and so the sample size is 50% of the population size.

Fact: Whenever the sample size, n, is larger than 10% of the population size, N, the effect on the probabilities
associated with the second, third, …, etc. selection, when sampling is done without replacement, is great enough that it
will have a significant impact on the probabilities of the various events (sequences of outcomes). And so, the
probabilities associated with the same outcome (e.g. marble is blue) for each of those coming after the first marble’s
selection will be different from the probabilities associated with the first selection (e.g. 1 3 versus 1 4 ).

However, in those case where the sample size, n, is at most 10% of the population size, N, the effect on the
probabilities associated with the second, third, …, etc. selection is negligible, and so we use the same probability for
each selection. That is, n  0.10 N , or equivalently, 10n  N , we can use the same probability value for each outcome
on subsequent selections that was used on the first selection. This is referred to as the 10% Condition.

So, for example, if we are told that 21% of all U.S. college students are soccer fans and we generate a sample by
randomly selecting 350 people from among all U.S. college students, then n = 350 and N is the number of all U.S.
college students. Although I don’t know the exact value of N, I do know that N is certainly larger than 3500. Thus,
10n = 10 ( 350 ) = 3500  N , and so if we could use the proportion 0.21 as the value of P ( Is a soccer fan ) for each
selected person. Consequently, if we wanted to know the probability that all 350 people sample is a soccer fan, we
would compute P ( F F F F ) , where F represents the outcome “is a soccer fan”, by computing
( 0.21)  ( 0.21)  ( 0.21)   ( 0.21)
= ( 0.21)  5.981E − 238 or  5.98110−238
350
(found by using this online calculator
https://keisan.casio.com/calculator).

3. A single die is rolled once.

(a) List the outcomes in the experiment.


(b) List the outcomes in the event “an even number is rolled.”
(c) What is the probability that an even number is drawn?

(a) The universe is {1, 2, 3, 4, 5, 6}.


(b) The event “an even number” is {2, 4, 6}.
(c) The probability of rolling an even number is 3 6 = 1 2 .

4. Two dice are rolled and the sum of the pips noted (those dots are called pips).

What is the probability that a sum of nine is rolled?

At first, you might think that the universe is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}…and that’s not wrong, but it’s not useful.
The events in that universe aren’t as specific as possible. For example, there are multiple ways to roll a sum of five (4
and 1, 3 and 2). Furthermore, the dice are different…you can imagine that one is blue and the other is green. A blue 3
and a green 2 is different from a blue 2 and a green 3!
Thus, a more useful version of this universe is best shown in a table. In the margins of the table, I’ll list one die result
in blue and the other die result in green. The sum of the two dice is in the table’s body.

1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
In this universe of 36 outcomes,
four of them result in a sum of nine, so the probability of rolling a sum of nine is 4 36 = 1 9 .

5. Joe drives through a particular intersection once a day. Given that the probability of a traffic signal at a particular
intersection being green when Joe reaches it is 0.35, the probability of that traffic signal being yellow when Joe
reaches it is 0.04, and the probability of that traffic signal being green when Joe reaches it is 0.61.

(a) What is probability Joe will hit a red light on two consecutive days?

(b) What is the probability that Joe won’t see a red light until the third day?

(c) What is the probability that Joe will have to stop for a red light at least once during a five-day week?

This particular intersection’s traffic signal displaying a red light when Joe arrives at it on one day is independent of
that same intersection’s traffic signal displaying a red light when Joe arrives at it on the second day.

(a) By the Multiplication Rule with R representing the outcome light is red, we see that
P ( R R ) = P ( R )  P ( R ) = ( 0.61)( 0.61) = 0.3721 .

(b) By the Complement Rule, with R representing the outcome light is red, P ( R c ) = 1 − P ( R ) = 1 − 0.61 = 0.39 .
Thus, P ( R c Rc R ) = P ( R c )  P ( R c )  P ( R ) = ( 0.39 )( 0.39 )( 0.61) = 0.092781 .

(c) Since the phrase “at least” appears in this problem, I will consider using the Complement Rule.
To compute the probability that Joe will have to stop for a red light at least once during a five-day
week involves computing the probability of a sizable number of possible events:
R Rc Rc Rc Rc , Rc R Rc Rc Rc , Rc Rc R Rc Rc , Rc Rc Rc R Rc ,
R c R c R c R c R , R R R c R c R c , R R c R R c R c , …., R R R R R .
Consequently, I elect to compute P ( stop at least once for a red light in five days )
= 1 − P ( no stops for a red light in five days )
= 1 − P ( Rc Rc Rc Rc Rc )

= 1 −  P ( R c )  P ( R c )  P ( R c )  P ( R c )  P ( R c ) 

= 1 − ( 0.39 )  ( 0.39 )  ( 0.39 )  ( 0.39 )  ( 0.39 ) 


 0.9909978 .
CHAPTER 14: Probability Rules!

The Union of two events contains all outcomes that belong to either event (and, maybe both).

One nice way to think about the union is to use a Venn Diagram. In such a diagram, the universe is depicted as a
rectangle, outcomes are points within the rectangle, and events are shapes inside of the rectangle (commonly circles).
Here’s a Venn Diagram of the union (in yellow):

The word “or” gets used a lot when talking about unions, but it’s not being used in the same way that most people use
it in conversational English. Outside of statistics class, when you use the word “or”, you almost always are excluding
the possibility of both things (“do you want to go to the movies or do you want to go shopping?” usually means that
only one of the two is an option). The conversational “or” is (usually) an exclusive or (abbreviated xor), but the one we
use in statistics is an inclusive or.

To find the probability of the union, you need to sum up the outcomes in the two events. Note, though, that if you just
add the number of outcomes in the first event to the number of outcomes in the second event, then you might be
adding some outcomes twice—in the diagram above, that means adding the outcomes (area) of that almond-shaped
middle part twice. Thus, to properly add up the outcomes in the union, you need to adjust by subtracting the number of
outcomes that belong to both events.

General Addition Rule: P ( A B ) = P ( A) + P ( B ) − P ( A B )

FYI The reason the formula requires that the probability of A B be subtracted from the sum P ( A) + P ( B ) is
because A B was included twice in the sum P ( A) + P ( B ) … once in P ( A) and a second time in P ( B ) .

Observe that if events A and B are mutually exclusive (i.e. disjoint), then they cannot happen simultaneously, and so
P ( A B ) = 0 , leading to the Addition Rule for Disjoint Events we encountered in Chapter 13:
P ( A B ) = P ( A) + P ( B ) .

The Intersection of two events contains all outcomes that are shared by the events (that are in common). Here’s the
Venn Diagram of that:
Conditional Probability: The conditional probability that one event will happen if it is known that another event has
happened. Remember that the contingency tables we encountered in chapter 2 showed relative frequencies and
conditional distributions.
When you know something has happened, that changes the universe; specifically, it shrinks it to the given event. The
amount of the other non-given event remaining after this shrinkage is the intersection. Thus, the size of the other non-
given event becomes the intersection, and the size the universe becomes the given event.
The probability that event A will occur given that event B has already occurred is denoted by P ( A B ) .
In a Venn Diagram,
Observe that with P ( A B ) ,
event B has already occurred,
and so the universe shrinks to
only include B, and since the
only part of A that’s in B is
the intersection A B , we
P ( A B)
see that P ( A B ) = .
P( B)

P ( B A) P ( A B)
Similarly, P ( B A ) = , and since P ( A B ) = P ( B A) , we can also express P ( B A ) as P ( B A ) = .
P ( A) P ( A)

Observe that to compute a conditional probability, divide the probability of both events occurring together by the
probability of the event following the “given that” symbol (the vertical line | ).
P ( A B) P ( A B)
P( A B) = and P ( B A ) =
P ( B) P ( A)

Example: Suppose we know the following information about 478 elementary school children.

Gender↓ Goals → Grades Popularity Sports Totals


Boy 117 50 60 227
Girl 130 91 30 251
Totals 247 141 90 478
From the table, we can deduce the following.

251
P ( Girl ) =  0.525
478
90 45
P ( Sports ) = =  0.188
478 239
Suppose we know that the student we select is a girl, then we are working with a conditional probability. Say, we are
specifically interested in the probability that a student whose goal is to excel at sports given that they are a girl, then
we wish to determine the value of P ( Sports Girl ) .
P ( Sports Girl ) 30
P ( Sports Girl ) = =  0.120 .
P ( Girl ) 251

Now, suppose we know that the student we select has the goal of excelling in sports and we wish to know the
probability that they are a girl. Then we seek a value of P ( Girl Sports ) .
P ( Girl Sports ) P ( Sports Girl ) 30 1
P ( Girl Sports ) = = = =  0.333 .
P ( Sports ) P ( Sports ) 90 3

Observe that P ( Sports Girl )  P ( Girl Sports ) .

General Multiplication Rule: This rule, which does not require that the two events are independent, is simply a
P ( A B)
restatement of the Conditional Probability Rule. Solving P ( A B ) = for P ( A B ) , we discover that
P( B)
P( A B) = P ( A B)  P (B) .

Recall that independence means that the outcome of one event does not influence the outcome of another.

Formal definition of independent events: A and B are independent if P ( A ) = P ( A B ) .

Independent  Disjoint. A and B are independent if P ( A ) = P ( A B ) while A and B are disjoint if P ( A B ) = 0 .


If events are disjoint (i.e. mutually exclusive), they cannot be independent.
P ( A B)
Solving P ( B A ) = for P ( A B ) , we discover that P ( A B ) = P ( B A )  P ( A ) .
P ( A)

Examples
Refer to the Gender/Goals example on the previous page in order to determine if having the goal of achieving good
grades is independent of gender.

A way to approach this problem is to compare the values of P ( Grades Girl ) and P ( Girl ) . One could also see if
P ( Grades Boy ) equals P ( Boy ) . Choosing the former approach, then since P ( Grades Girl ) = 130 251  0.518 and
P ( Girl ) = 251 478  0.525 are the same when rounded and reported to the nearest tenth (they are both 0.5), having a
goal of achieving good grades is independent of gender.

Again, refer to the Gender/Goals example on the previous page in order to determine if having the goal of excelling at
sports is independent of gender.

A way to approach this problem is to compare the values of P ( Sports Boy ) and P ( Boy ) . One could also see if
P ( Sports Girl ) equals P ( Girl ) . Choosing the former approach, then since P ( Sports Boy ) = 60 227  0.264 and
P ( Boy ) = 90 478  0.188 are not the same, nor approximately the same, then having a goal of excelling at sports is
independent of gender.

A bag holds five marbles—one each of red, orange, yellow, green and blue. You reach in and randomly select two
marbles. What’s the probability that you get either a red or a green marble?
Choosing to solve this problem by creating a list the sample space or universe, rather than trying to use a formula, I
have {ro, ry, rg, rb, oy, og, ob, yg, yb, gb} for the universe. There are seven outcomes with either a red or a green.
Since there are ten outcomes in the universe, the probability is 7 10 .

According to the most recent U.S. Census, 6.7% of the population is aged 10 to 14 years, and 7.1% of the population is
aged 15 to 19 years. If you pick a person at random, what’s the probability that they are aged 10 to 19 years?

Since there is no possibility of being in both groups at once, P (10 to 14years old 15 to 19 years old ) = 0 , then
P (10 to 14years old 15 to 19 years old )
= P (10 to 14years old ) + P (15 to 19 years old ) − P (10 to 14years old 15 to 19 years old )
= 6.7% + 7.1% − 0%
= 13.8% or 0.138.

In a statistics textbook, 48% page have data displayed in a table, 27% have an equation, and 7% have both. What
percentage of all the pages in this textbook either have data displayed in a table or have an equation? That is, what is
the probability that a randomly selected page from this textbook has either a table displaying data or an equation?

Table 0.07 Equation

0.41 0.20

0.32
Observe:
48% − 7% = 41% of the pages have data displayed in a table and don’t contain an equation.
27% − 7% = 20% of the pages have an equation and don’t display data in a table.
Thus, P ( Table Equation ) = P ( Table ) + P ( Equation ) − P ( Table Equation )
= 0.27 + 0.48 − 0.07 = 0.68 .
FYI The probability of randomly selecting a page without a table or an equation is 1 − 0.68 = 0.32 .

When stopping drivers for a suspected DWI/DUI, 78% of the suspected drives get a breath test, 36% receive a blood
test, and 22% get both. Create both a contingency table comprised of probabilities and a Venn Diagram.

The 78% and 36% values are totals and so they should appear in the margins of a contingency table.
The 22% corresponds to the percentage of the suspected drivers that get both a breath test and a blood test, and so
this should appear in the “Yes” and “Yes” positions for “Breath Test” and “Blood Test” in a contingency table.
The cell in the lower right-hand corner of the table must contain 1.00.
So far, we have
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.36
No
Totals 0.78 1.00
The rest of the table can be determined by using addition and/or subtraction.

Breath Test
Yes No Totals
Blood Test Yes 0.22 0.36 − 0.22 0.36
No 0.78 − 0.22 1.00 − 0.36
Totals 0.78 1.00 − 0.78 1.00

And so …
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.14 0.36
No 0.56 0.64
Totals 0.78 0.22 1.00

Which leads to …

Breath Test
Yes No Totals
Blood Test Yes 0.22 0.14 0.36
No 0.22 − 0.14
0.56 or 0.64
0.64 − 0.56
Totals 0.78 0.22 1.00

Therefore,
Breath Test
Yes No Totals
Blood Test Yes 0.22 0.14 0.36
No 0.56 0.08 0.64
Totals 0.78 0.22 1.00

Using the numbers obtained and placed into the probability contingency table to generate a Venn Diagram, the
information in this problem could be summarized as follows.

Blood Breath
Test 0.22
Test

0.14 0.56
0.08
Example
Suppose we know that 44% of college students binge drink (alcohol), 37% drink moderately, and 19% abstain.
Furthermore, we also know that among binge drinkers, 17% have been in an alcohol-related accident while only 9% of
the moderate drinkers have been in an alcohol-related accident.

(a) Calculate the probability that a randomly selected college student has been in an alcohol-related accident.
(b) Calculate the probability that a randomly selected student who has been in an alcohol-related accident is a
binge drinker.

Here’s a tree diagram illustrating both the given and deduced probabilities.

P ( Binge Drinker Accident )


0.17 Accident
= ( 0.44 )( 0.17 ) = 0.0748
Binge Drinker
P ( Binge Drinker No Accident )
0.83 No Accident
= ( 0.44 )( 0.83) = 0.3652
0.44
P ( Moderate Drinker Accident )
0.09 Accident
= ( 0.37 )( 0.09 ) = 0.0333
0.37 Moderate
Drinker
P ( Moderate Drinker No Accident )
0.91 No Accident
= ( 0.37 )( 0.91) = 0.3367
0.19
P ( Abstains Accident )
0 Accident
Abstains From = ( 0.19 )( 0 ) = 0
Drinking
1 No Accident P ( Abstains No Accident )
= ( 0.19 )(1) = 0.19

Observe that a student who abstains from drinking, by definition, cannot have an alcohol-related accident.
Note that the conditional probabilities are located above the branches associated with the second set of tree
leaves. For example, P ( Accident Binge Drinker ) = 0.17 (this was given in the problem’s statement) and
P ( No Accident Moderate Drinker ) = 0.91 (this was determined using the fact that the sum of all the
probabilities across a set of branches must equal 1, and so 1 − 0.09 = 0.91 ).

Notice that the General Multiplication Rule is used to obtain the probabilities at the end of each pathway.
For example,
P ( Binge Drinker Accident )
= P ( Binge Drinker )  P ( Accident Binge Drinker )
= ( 0.44 )  ( 0.17 )
= 0.0748 .
(a) The probability that a randomly selected college student has been in an alcohol-related accident equals
P ( Binge Drinker Accident ) + P ( Moderate Drinker Accident ) + P ( Abstainer Accident )
= 0.0748 + 0.3333 + 0
= 0.1081 .

(b) Calculate the probability that a randomly selected student who has been in an alcohol-related accident is
a binge drinker.

P ( Binge Drinker Accident )


P ( Binge Drinker Accident ) =
P ( Accident )
0.0748 0.0748 10, 000 748
= =  =  0.6920 .
0.1081 0.1081 10, 000 1081

Reversing the Conditions in a Conditional Probability

P ( A B) P ( B A)
Recall P ( A B ) = and P ( B A ) = .
P( B) P ( A)
P ( A B)
Since P ( B A) = P ( A B ) , we can also express P ( B A ) as P ( B A ) = .
P ( A)

P ( A B) P( A B)  P ( B)
Since P ( A B ) = P ( A B )  P ( B ) , when we can rewrite P ( B A ) = as P ( B A ) = .
P ( A) P ( A)

If you have been given or have already computed the value of a conditional probability, say P ( A B ) , and you wish to
determine the value of the conditional probability with the conditions switched, P ( B A ) , use
P( A B)  P ( B)
P ( B A) = .
P ( A)

Example
Returning to part (b) of the recent example involving college students who binge drink, drink moderately, and abstain,
and alcohol-related accidents, suppose we wish to calculate the probability that a randomly selected student who has
been in an alcohol-related accident is a binge drinker, you can see that we did just that.

From the tree diagram, we easily determined the value of P ( Accident Binge Drinker ) as 0.17.

However, when we were asked to determine P ( Binge Drinker Accident ) , we did so by computing:

P ( Binge Drinker Accident )


P ( Binge Drinker Accident ) =
P ( Accident )
P ( Binge Drinker )  P ( Accident Binge Drinker )
=
P ( Accident )
P ( Binge Drinker )  P ( Accident Binge Drinker )
=
P ( Binge Drinker Accident ) + P ( Moderate Drinker Accident ) + P ( Abstainer Accident )

=
( 0.44 )  ( 0.17 ) =
0.0748 0.0748 10, 000 748
=  =  0.6920 .
0.0748 + 0.3333 + 0 0.1081 0.1081 10, 000 1081

In summary, my first step in addressing a probability problem is to determine the size of n, which is sometimes called
the "sample size" and other times is referred to as the "number of trials."

If the sample size is something small -- for me, this means n < 6 -- then I will typically use the tools in chapters 13 and
14. That is, I will either use …

(1) a tree diagram


(2) the probability laws,
(3) a Venn diagram,
(4) a table.

I generally choose the tool in the order they appear in the list. Thus, for probability problems where n is at most five, I
will first see if I can draw a tree diagram. If not, then I'll see if I can use the probability laws themselves, applying
them directly. If not, then I'll draw a Venn Diagram. If a table is given, I will definitely use it, but I only create a table
for those problems that only involve two events.

So, for example, I’ll return once again to the recent example involving college students’ drinking level (binge drink,
drink moderately, and abstain) and alcohol-related accidents. Specifically, we are told that 44% of college students
binge drink (alcohol), 37% drink moderately, and 19% abstain. Furthermore, we also know that among binge drinkers,
17% have been in an alcohol-related accident while only 9% of the moderate drinkers have been in an alcohol-related
accident.

Choosing to convert these percentages to counts out of 10,000, where, for example, 44% corresponds to 4400 out of
1000, and 17% of 4400 equals the Whole number 748, then here’s a table that communicates the information given in
the problem.

College Student’s Level of Drinking


Alcohol- binge drink drink moderately abstain Totals
Related Yes 748 703 0
Accident No
Totals 4400 3700 1900 10000

The calculations used to complete this table are: (0.44)(10000) = 4400, (0.37)(10000) = 3700, (0.19)(10000) = 1900,
(0.17)(4400) = 748, and (0.09)(3700) = 333.
Note that a student who abstains from drinking, by definition, cannot have an alcohol-related accident. Thus, zero of
the 1900 students who abstain from drinking have an alcohol-related accident.
The remaining cells of the table can then be completed using the fact that the sum of the parts must equal the total.

College Student’s Level of Drinking


Alcohol- binge drink drink moderately abstain Totals
Related Yes 748 333 0 1081
Accident No 3652 3367 1900 8919
Totals 4400 3700 1900 10000

If we were then wanted to compute the probability that a randomly selected college student has been in an alcohol-
related accident, we need only look at the right-most margin of the table; specifically, 1081 and 10000. Thus, the
1081
probability that a randomly selected college student has been in an alcohol-related accident is = = 0.1081 .
10000

And, if we wished to calculate the probability that a randomly selected student who has been in an alcohol-related
accident is a binge drinker, we need only look at the numbers displayed in the table with a red font, and take the
appropriate ratio.

College Student’s Level of Drinking


Alcohol- binge drink drink moderately abstain Totals
Related Yes 748 333 0 1081
Accident No 3652 3367 1900 8919
Totals 4400 3700 1900 10000

Thus, the probability that a randomly selected student who has been in an alcohol-related accident is a binge drinker is
P ( Binge Drinker Accident ) =
748
 0.6920 .
1081

As expected, these values are the same as those computed previously using a tree diagram to summarize the
information given in the problem.
On the following pages, you will find more worked problems involving probability. I have arranged these by the
tool -- tree diagram, implementation of the rules directly, Venn diagram, and table -- I chose to use to solve them.

A Tree Diagram
Example 1: Suppose you just received a shipment of ten televisions. Two of the televisions are defective.

(a) If two televisions are randomly selected, compute the probability that both televisions work.

(b) What is the probability that at least one of the two televisions does not work?

Round to three decimal places as needed.

Observation: Sampling is being done without replacement since the two TVs are selected
simultaneously, which is equivalent to selecting the first TV at random and
then randomly selecting the second TV without having replaced the first one
back into the shipment.

Consequently, since the n = 2 TVs are being selected without replacement, then these n = 2 trials are not
independent.

I find that a tree diagram helps me solve nearly every probability problem involving fewer than 5 trials.

So, I’ve created one for this problem; please see the next page.
Let D represent the event “TV is defective” and Dc represent the event “TV is not defective.”

P( D D ) =
1
9
D D and D
P( D ) =
2
10 D
(
P Dc D = ) 8
9
Dc
P ( D Dc ) =
2
9 D D and Dc
P( D )=
c 8
10 Dc
(
P Dc Dc = ) 7 Dc
9

First TV Second TV

So, since 2 of the original 10 TVs were defective, and so 8 of the original 10 TVs were not defective, then the probability that the first TV is defective is
and the probability that the first TV is not defective is 8/10  P ( D c ) =
2 2
2/10  P ( D ) = .
10 10

Now, if the first TV selected is defective, then there is only one defective TV left amongst the nine remaining TVs, and so P ( D D ) =
1
. I will leave it to
9
you deduce how I got the other three conditional probabilities associated with the second TV.
(a) Therefore, the probability that both TVs work – both TVs are Dc equals the product of the two probabilities
along the bottom set of branches in the tree diagram. That is,

(
P ( D c and D c ) = P ( D c D c ) = P ( D c )  P D c D c = )
8 7 56
 =
10 9 90
 0.622 .

Answer: If two televisions are randomly selected, the probability that both televisions work is approximately
0.622.

(b) Although I could multiply the pairs of probabilities along each of the top three sets of branches of the tree, and
then add those three products together in order to determine the probability that at least one of the two TVs
selected is defective, I choose, instead, to use the idea of the complement. I will compute the probability that
both TVs are not defective – by computing the product of the probabilities along the bottom branch of the tree
– and then subtract that from 1.

P ( at least one TV is defective ) = 1 − P ( neither TV is defective )


= 1 − P ( D c and D c )
= 1 − P ( Dc Dc )
56
= 1− , where the 56/90 is the result of the first part of this problem,
90
56 90 56 34
and so, P ( at least one TV is defective ) = 1 − = − =  0.378 .
90 90 90 90

Answer: If two televisions are randomly selected, the probability that at least one of the two televisions does
not work is approximately 0.378.

Example 2: A study concluded that among people with a certain virus, 99.4% of tests conducted were (correctly)
positive, while for people without the virus, 97.9% of the tests were (correctly) negative. If 32% of
patients actually carry the virus, what’s the probability that a patient testing negative is truly free of the
virus? Round to three decimal places as needed.

I find that a tree diagram helps me solve nearly every probability problem involving fewer than 5 trials. So, I’ve
created one for this problem.

Let V represent the event “person has the virus” and Vc represent the event “person does not have the virus.”

Let + represent the event “test result is positive for the virus” and – represent the event “test result is negative for
the virus.”

Since we’re told P ( V ) = 0.32 , then P ( V c ) = 1 − 0.32 = 0.68 .

We were also given the conditional probabilities P ( + V ) = 0.994 and P ( − V c ) = 0.979 .


NOTE: Although the values of P ( − V ) and P ( + V c ) , we can see from the tree diagram that each of these conditional probabilities is a
complement of probabilities we do have. Thus, P ( − V ) = 1 − P ( + V ) = 1 − 0.994 = 0.006 and P ( + V c ) = 1 − P ( − V c ) = 1 − 0.979 = 0.021 .

A complete tree diagram, displaying all of the information, is shown below.

P ( + V ) = 0.994
+ V and +

P ( V ) = 0.32 V
P ( − V ) = 0.006 -
V and –
P( + V c
) = 0.021
+
P(V ) = 0.68
Vc
c

P ( − V c ) = 0.979 - Vc and +

Person’s Actual Condition Test Results

Vc and –
We are asked to find the probability that a patient testing negative is truly free of the virus.

That is, given (for a fact) that the patient’s test results is –, we want the probability that they don’t have the virus, Vc.
Thus, we want the conditional probability P Vc − . ( )
Since this conditional probability is in the reverse order of the one we do know the value of, P ( − V c ) = 0.979 , we
must follow the “reversing the conditioning” procedures described in chapter 14 of the textbook.

P( A B ) P ( Vc −)
That is, we must employ the fact that P ( A B ) =
P( B )
to our events. Thus, P V c − = ( ) P( − )
.

P ( Vc −)
In order to compute a value for P V − = ( c
) P( − )
, we must determine the value of P ( V c − ) and P ( − ) .

From the tree diagram, we see that the fourth set of branches corresponds to P ( V c −) ,
and so P ( V c − ) = P ( V c )  P ( − V c ) = ( 0.68 )  ( 0.979 ) = 0.66572 .

The second and fourth sets of branches correspond to the two ways one can have a negative (–) rest result (I’ve
highlighted these with the events in a red font).

Thus, P ( − ) = P ( ( V and − ) or (V c
) ((V
and − ) = P − ) or (Vc
−) )
= P( V −) + P( V c
−)

= P ( V )  P ( − V ) + P ( Vc )  P ( − Vc )

= ( 0.32 )( 0.006 ) + ( 0.68 )  ( 0.979 )


= 0.00192 + 0.66572
= 0.66764 .

P ( Vc −)
(
Therefore, P V − = c
) P( − )
0.66572
=
0.66764
 0.9971241987
 0.997 .

Answer: The probability that a patient testing negative is truly free of the virus is approximately 0.997.
Probability Rules
Example 1: There are two professors. The first is Professor Scedastic whose class has 70 open seats and in which
where Molly's chances of passing are 0.8. In contrast, with Professor Kurtosis, Molly's chances of
passing are 0.6 and there are 50 seats in this professor’s class.

(a) If Molly gets randomly assigned to a class, what are her chances of passing?

(b) If Molly doesn’t pass the statistics class, what is the probability that she was randomly assigned
to Professor Kurtosis’ class?

Let S represent the event that a student is assigned to Professor Scedastic's class.
Let K represent the event that a student is assigned to Professor Kurtosis' class.
Let P represent the event that a student passes the statistics class.
Let PC represent the event that a student does not pass the statistics class.

Observation: Here’s what we are given in this problem.

P(S) = 70/120 P(K) = 50/120 P(P | S) = 0.8

P(PC | S) = 0.2 P(P | K) = 0.6 P(PC | K) = 0.4

(a) If Molly gets randomly assigned to a class, what are her chances of passing?

A Solution
There are two ways for Molly to pass statistics:

(1) Molly will either be assigned to Professor Scedastic's class AND then she passes
OR
(2) Molly will be assigned to Professor Kurtosis' class AND then she passes.

Therefore, since the events (S and P) and (K and P) are disjoint (a.k.a. mutually exclusive) – they cannot both
happen at the same time – then …

P(P) = P((S and P) OR (K and P))

= P(S) * P(P | S) + P(K) * P(P | K)

= (70/120) x 0.8 + (50/120) x 0.6

= 43/60

= 0.71666666 .... (infinitely many sixes)

~ 0.717

Answer: The probability that Molly passes the statistics class she is randomly assigned to is approximately
0.717.
(b) If Molly doesn’t pass the statistics class, what is the probability that she was randomly assigned to
Professor Kurtosis’ class?

A Solution

P(K | PC) = [P(K and PC)] / P(PC)

= [(50/120) * (0.4)] / [1 - P(P)]

= [0.16666....] / [1 - 0.716666 ...]

= [0.16666....] / [0.2833 ....]

= 10/17

~ 0.5882352941

~ 0.588

Answer: If Molly doesn’t pass the statistics class, the probability that she was randomly assigned to
Professor Kurtosis’ class is approximately 0.588.
A Venn Diagram
Problem 39 in Chapter 14.

Example: According to estimates from the federal government’s 2010 National Health Interview Survey, based on
face-to-face interviews in 16,676 households, approximately 63.6% of U.S. adults have both a landline
in their residence and a cell phone, 25.4% has only cell phone service bu no landline, and 1.8% have no
telephone service at all.

(a) Polling agencies won’t phone cell phone numbers because customers object to paying for such calls.
What proportion of U.S. households can be reacherd by a landline call?

0.04 0.09 0.27


C A SL

0.254 0.636 ??

0.254 + 0.636 = 0.89 0.018

A Solution
Let C represent the event “American adult has cell phone service.”

Let L represent the event “American adult has landline phone service.”

The following information is given in the problem and/or can be deduced:


( )
P C L = 0.636

P ( C ) = 0.254 + 0.636 = 0.89

P ( L ) = ?? + 0.636

P ( C L ) = 1 − 0.018 = 0.982 .
Thus,

AC SL

0.254 0.636 ??

0.018
0.254 + 0.636 = 0.89

Solving for P ( L ) in the General Addition Rule, we have


P C ( ) ( ) ( ) (
L =P C +P L −P C L ) ( )
 P L =P C ( ) (
L +P C ) ( )
L − P C , and so with
the information from this problem entered into the equation [see the bottom of the previous page],
( )
P L = 0.982 + 0.636 − 0.89 = 0.728 .

Answer: Approximately 72.8% of households can be reached by a landline call.

(b) Are having a cell phone and having a landline independent? Explain.

A Solution:
Informally, two events are independent if knowing that one of the events has occurred, or not, changes the
liklihood of the second event occurring, or not. For example, knowing a person is a professional basketball
player affects the liklihood of the person working-out at least once a week during the season. Thus, the events
person is a professional basketball player and person works-out at least once a week during the season are
not independent.

Mathematically, two events are independent if either P ( A ) = P ( A B ) or P ( B ) = P ( B A ) .

( )
In this problem, we now know: P ( C ) = 0.89 , P L = 0.728 , P C ( )
L = 0.636 , and since

P C ( ) ( ) (
L = P L C , we also know that P L C = 0.636 . )
Using the (General) Multiplication Rule, we see that
P ( C L ) 0.636 0.636 1000 636 159
P (C L ) = = =  = =  0.8736 .
P ( L) 0.728 0.728 1000 728 182

Therefore, since P ( C L )  0.8736 is not equal to P ( C ) = 0.89 , then events C and L are not independent;
C and L are dependent events.

Alternatively, one can reach the same conclusion by comparing the values of P ( L C ) and P ( L ) .

P ( L C ) 0.636 0.636 1000 636 318


P(L C) = = =  = =  0.7146 , a value that is not equal to
P (C ) 0.89 0.89 1000 890 445

( )
P L = 0.728 . These probabilities are close, but they aren’t the same. Therefore, events C and L are not
independent; C and L are depedent events.

It is the second approach that leads to the values given in the answer to this problem in Appendix C of the
textbook.

Answer: No, having a cell phone and having a landline are not independent.

A Table
Example: Suppose 40 of 100 people are smokers. Suppose 70 of these 100 people are women and 14 of these
women are smokers. Address parts (a) – (f).

Let W represent the event "person is a woman."


Let S represent the event "person is a smoker."

Choosing to organize the information in a table – a Contingency Table like we saw in chapter 2 -- I
create the following way to summarize what’s given in this problem.

Smoker (S) Not a Smoker (Sc) Total


Woman (W) 14 56 70
Man (Wc) 26 4 30
Total 40 60 100

(a) Suppose one person is randomly selected. What is the probability that the person is a woman?

A Solution

( )
From the table, P W =
70
100
=
7
10
= 0.7 .

Answer: The probability that the randomly selected person is a woman is 7/10 or 0.7.
(b) Suppose one woman is randomly selected. What is the probability that the woman is a smoker?

A Solution

(
From the table, P S W = ) 14
70
=
1
5
= 0.2 .

14
( ) = 100
( )
P S W 14 1
By a probability rule, P S W = = = = 0.2
P W ( ) 70 70 5
100

Answer: The probability that the randomly selected woman is a smoker is 1/5 or 0.2.

(c) Suppose one person is randomly selected. What is the probability that the person is a woman and a
smoker? That is, what is the probability that the person is a woman smoker?

A Solution

(
From the table, P W S = ) 14
100
=
50
7
= 0.14 .

(
By a probability rule, P W )
S = P W P S W = ( ) ( ) 70
100 70

14
=
14
100
=
7
50
= 0.14

Answer: The probability that the randomly selected person is a woman is 7/50 or 0.14.

(d) Suppose one smoker is randomly selected. What is the probability that the smoker is a woman?

A Solution

(
From the table, P W S = ) 14
40
=
7
20
= 0.35 .

14
( ) = 100
( )
P W S 14 7
By a probability rule, P W S = = = = 0.35
P S ( ) 40 40 20
100

Answer: The probability that the randomly selected smoker is a woman is 7/20 or 0.35.
(e) Suppose one person is randomly selected. What is the probability that the person is a woman or a
smoker?

A Solution

Smoker (S) Not a Smoker (Sc) Total


Woman (W) 14 56 70
Man (Wc) 26 4 30
Total 40 60 100

From the table, looking at the counts in a blue font,

(
P W S = ) 14
+
56
+
26
100 100 100 100 25
=
96
=
24
= 0.96 .

(
By a probability rule, P W ) ( ) ( ) (
S =P W +P S −P W S )
70 40 14
= + −
100 100 100
96 24
= = = 0.35
100 25
Answer: The probability that the randomly selected smoker is a woman is 7/20 or 0.35.

(f) Suppose two people are randomly selected. What is the probability that both are non-smokers?

A Solution

Recall:
Smoker (S) Not a Smoker (Sc) Total
Woman (W) 14 56 70
Man (Wc) 26 4 30
Total 40 60 100

Since it wasn’t specified whether this sampling is done with replacement or without replacement,
the assumption is that it was done without replacement because the implication is that we end-up
with a sample comprised of two people … not one person that is then replaced back into the
population of 100 people before the second person is selected.

From the table, the probability that the first person selected is a non-smoker is

( )
P S1c =
60
100 5
3
= = 0.6 .

Now, since sampling is done without replacement, there are only 59 non-smokers remaining among
99 people when the second person is randomly chosen, thus, the probability that the second person
selected is a non-smoker is P S 2 c =
59
99
( )
= 0.59  0.596 .

Therefore, the probability that both randomly selected people are non-smokers is

P S1( ) ( )
S2 = P S1  P S2 S1 ( )
60 59
= 
100 99
3540
=
9900
59
=
165

Answer: The probability that the randomly selected smoker is a woman is 59/165 or 0.357 .

NOTE: Had this problem been worded as “Suppose two people are randomly selected with
replacement. What is the probability that both are non-smokers?” then the probability
that the second person selected was a non-smoker would have been the same as for
( ) ( ) ( )
the first person, and so, P S1 S 2 = P S1  P S 2 S1 = P S  P S =( ) ( ) 60 60

100 100
3600 9
= = = 0.36 .
10000 25

CHAPTER 5: The Standard Deviation as a Ruler and Understanding Data

To quantify the distance, measured in standard deviations, that a data value is away from the mean of the distribution’s
y− y
mean, we use a z-score. If y represents the name of the quantitative variable, then z = , where z measures the
sy
x−x
number of s y increments that y is from y . If x represents the name of the quantitative variable, then z = ,
sx
where z measures the number of s x increments that x is from x . If weight represents the name of the quantitative
weight − weight
variable, then z = , where z measures the number of sweight increments that weight is from weight . A
sweight
positive z value indicates that the original data value is above the mean. A negative z value indicates that the original
data value is below the mean. The larger the absolute value of the z-score z is, the more unusual the data value is.

Example: The average weight of a sample of fictional mammals called Morponias is 72 kilograms and the standard
deviation is 4 kilograms. Delgridge, a Morponia, weighs 82 kilograms. If I choose to define variable y to represent the
weight of a Morponia mammal in kilograms, then we’ve been given y = 72 kilograms and s y = 4 kilograms. The z-
y− y
score of Delgridge’s individual weight of y = 82 kilograms is z =
sy
82 − 72
= = 2.5 standard deviations. This tells us that Delgridge’s weight of 82 kilograms is 2.5 standard deviations
4
above this sample’s mean weight of 72 kilograms. That is, Delgridge’s weight is 2.5  s y = 2.5  4 = 10 kilograms
greater than 72 kilograms, which is indeed the case since 72 + 10 does equal 82! If you knew that Delgridge’s cousin,
Plwick, has a z-value of −0.5 then we an determine Plick’s weight as follows:
y− y
z=  zs y = y − y  zs y + y = y , where with z = −0.5 standard deviations, y = 72 kilograms, and s y = 4
sy
kilograms, we find y = ( −0.5)( 4 ) + 72 = −2 + 72 = 70 kilograms.

The phrase “shift the data” refers to moving the data set’s center, but keeping the spread the same. In contrast, “rescaling
the data” changes both the center and the spread. If you were to look at a histogram, for example, of data collected on a
quantitative variable, x, and then create a second histogram of the new variable w , where w = x − x , the two histograms
would look the same and they would have the same spread, but the center of each would be different. Specifically, the
mean of the distribution of the original variable x would be x while the mean of the new distribution comprised of
shifted data values (i.e. the values of w) would be 0. If you were now to construct a third histogram of shifted and
x−x
rescaled x values, say of values of z = , then a comparison with the histogram of the x values and the z values
sx
would show that the mean and standard deviation had changed from x and s x to 0 and 1, respectively. The mean and
standard deviation of a distribution of z-scores is always 0 and 1.

A Normal Distribution Model is a way to make assumptions about our distribution based on the shape of the distribution.
A distribution of a quantitative variable that is unimodal, symmetric, and bell-shaped is one for which a Normal
Distribution model is appropriate.

A Normal model is named based on its mean and standard deviation. Because it is a model, we use  for the mean
and  for the standard deviation. These two numbers are referred to as the parameters of the model. The notation
N (  , ) is used to designate the specific Normal Distribution Model being applied. The formula to convert values of
y−
a quantitative variable, say y, that is well-modeled by a Normal Distribution, to z-scores is z = … an equation

 value − mean 
in the same form as is used for any distribution  z =  . If variable y is well-modeled with a
 standard deviation 
y−
Normal Distribution N (  , ) , then when the values of y are converted to z-scores using z = , then the

distribution of these standardized scores will be the Standard Normal Model N ( 0,1) .

Unless specifically told that a Normal Model applies, one must verify that a set of data meets the “Nearly Normal
Conditions” before employing a Normal Model. These conditions can be checked by making a histogram and
verifying that it is unimodal, symmetric, and bell-shaped. You can also check to see if the mean, median, and mode
are approximately equal.

The Empirical Rule (68-95-99.7 Rule)

• In a Standard Normal Distribution Model, approximately 68% of the data falls within one standard deviation,
 z = 1 , of the mean  z = 0 . That is, approximately 68% of the data lie in the interval −1  z  1 . This means
that approximately 68% of the data in a Normal Distribution with mean  and standard deviation  lie in the
interval  −   y   +  .
• In a Standard Normal Distribution Model, approximately 95% of the data falls within two standard deviations,
 z = 1 , of the mean  z = 0 . That is, approximately 95% of the data lie in the interval −2  z  2 . This means
that approximately 95% of the data in a Normal Distribution with mean  and standard deviation  lie in the
interval  − 2  y   + 2 .
• In a Standard Normal Distribution Model, approximately 99.7% of the data falls within three standard
deviations,  z = 1 , of the mean  z = 0 . That is, approximately 99.7% of the data lie in the interval −3  z  3
. This means that approximately 99.7% of the data in a Normal Distribution with mean  and standard
deviation  lie in the interval  − 3  y   + 3 .

To draw a Normal Distribution, please the mean on the horizontal axis at the center. To find the location that
corresponds to one standard deviation above the mean, move to the upper inflection point -- the point on the right half
of the graph where the graph changes from being an upside-down ‘U’ shape to a right-side-up ‘U’ shape. Label this
location on the horizontal axis  +  if the horizontal axis represents the non-standardized Normally Distributed
variable; name this location z = 1 if the horizontal axis represents the z-scores of a Standardized Normal Distribution.
The inflection point on the lower half of a Normal Distribution’s curve corresponds to  −  and z = −1 .
Calculations involving Normal Models can involve determining an area under a Normal Distribution’s curve or any of
y−
the quantities (x, y,  , and  ) in the formula z = since algebra can be used to rewrite this formula with each of

y− y−
the quantities isolated: z = , y = z +  ,  = , and  = y − z .
 z
Technology is used to compute areas under a Normal Distribution’s curve in statistical research (and it’s available, and
recommended, for use on our homework problems). A free, simple-to-use app is available at https://s3-us-west-
2.amazonaws.com/oervm/stats/probs.html. [Leave the radio button next to “Normal Distribution” selected, enter
values for the “Mean” and “Standard deviation”, then select the radio button next to the type of percentage problem
under the “Probability” heading within the “Calculate” options that you wish to compute -- P x ( ) (
, P x )
, or

P ( x ) -- and enter the value(s) before clicking-on the gray “Calculate” button.]
Area approximations reported to the nearest ten-thousandths involving approximate z-scores reported to the nearest
hundredth can be obtained via a Z Table. Here’s a nice explanation of how to read a Z Table like the one that appears
in Appendix F of our textbook: https://www.youtube.com/watch?v=85G_PLBTX00.

Technology is used to determine the ‘cut score’ for a given percentile in statistical research (and it’s available, and
recommended, for use on our homework problems). A free, simple-to-use app is available at https://s3-us-west-
2.amazonaws.com/oervm/stats/probs.html. [Leave the radio button next to “Normal Distribution” selected, enter
values for the “Mean” and “Standard deviation”, then select the radio button next to the type of inverse probability
(
problem you wish to compute -- P ( x  ?? ) or P ( x  ?? ) , or P x ) -- and enter the area value before clicking-
on the gray “Calculate” button.]

A Z Table can be used to determine “cut scores” for specified percentiles. Once the corresponding percentile, or as
close to it as you can get, is found in the Z Table, it can be translated to a value of the original Normally Distributed
y−
variable by using the formula z = . Here’s a nice explanation of how to read a Z Table like the one that appears

in Appendix F of our textbook: https://www.youtube.com/watch?v=9KOJtiHAavE.

Don’t round any values until all calculations have been generated. Wait to round at the very last step if you must
round.

You might also like