0% found this document useful (0 votes)
529 views150 pages

QMT 11 Notes

This document contains lecture notes on business statistics. It introduces key concepts like population and samples, variables and data. It covers topics in descriptive statistics like frequency distributions, measures of central tendency and variation. It also discusses probability, random variables, probability distributions, sampling distributions and confidence intervals. Hypothesis testing and applications like ANOVA and regression are presented. Tables of statistical values are included in an appendix. The notes thank contributors and contain examples to illustrate statistical concepts.

Uploaded by

Justin Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
529 views150 pages

QMT 11 Notes

This document contains lecture notes on business statistics. It introduces key concepts like population and samples, variables and data. It covers topics in descriptive statistics like frequency distributions, measures of central tendency and variation. It also discusses probability, random variables, probability distributions, sampling distributions and confidence intervals. Hypothesis testing and applications like ANOVA and regression are presented. Tables of statistical values are included in an appendix. The notes thank contributors and contain examples to illustrate statistical concepts.

Uploaded by

Justin Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 150

Business Statistics

QMT 11 Lecture Notes


Semester II, AY 2018-2019

Bisenio, Zachary Nazarene S.

Special Thanks to:

• Yves James Cataluña, II-BS MAC, Ateneo de Manila University (for the first semester
QMT 11 notes)

• Emmanuel Jerrico Perez, BS Industrial Engineering 2018, University of the Philip-


pines Diliman (for doing chapters 4 and 5)

• Christian Paul Chan Shio, Ph.D., Ateneo de Manila University (for being my first
statistics professor and the 2016 second semester lecture notes in MA 151)
Contents

1 Introduction to Business Statistics 3


1.1 Population and Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Variables and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Descriptive Statistics 7
2.1 Definition of Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Constructing Frequency Distributions and Histograms . . . . . . . . . . . . . . . 7
2.3 Describing Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Measures of Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Measures of Relative Standing and Boxplots . . . . . . . . . . . . . . . . . . . . 27
2.6 Other Data Visualization Methods . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7 Problem Set 1 (Due the Next Session) . . . . . . . . . . . . . . . . . . . . . . . 35

3 Introduction to Probability 39
3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Elementary Probability Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . . . 48
3.6 Applications of Conditional Probabilities: Introduction to Decision Theory . . . 54
3.7 Problem Set 2 (Due the Next Session) . . . . . . . . . . . . . . . . . . . . . . . 61

4 Random Variables 65
4.1 Introduction to Random Variables and Probability Distribution Functions . . . . 65
4.2 Improper Integrals and Binomial Theorem (Optional) . . . . . . . . . . . . . . . 69
4.3 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Problem Set 3 (Due the Next Session) . . . . . . . . . . . . . . . . . . . . . . . 75

5 Special Distributions 78
5.1 The Bernoulli Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.6 Problem Set 4 (Due the Next Session) . . . . . . . . . . . . . . . . . . . . . . . 84

1
6 Sampling Distribution and Confidence Intervals 86
6.1 Unbiased Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 The Sampling Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . 88
6.3 The Sampling Distribution of the Sample Proportion . . . . . . . . . . . . . . . 90
6.4 Confidence Intervals for a Population Mean: σ Known . . . . . . . . . . . . . . 91
6.5 Confidence Intervals for a Population Mean: σ Unknown . . . . . . . . . . . . . 94
6.6 Confidence Intervals for a Population Proportion . . . . . . . . . . . . . . . . . . 96
6.7 Sample Size Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.8 Problem Set 5 (Due the Next Session) . . . . . . . . . . . . . . . . . . . . . . . 98

7 Hypothesis Testing and Applications 101


7.1 Null and Alternative Hypotheses and Errors in Hypothesis Testing . . . . . . . . 101
7.2 One-Tailed Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.3 Two-Tailed Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.4 Problem Set 6 (Due the Next Session) . . . . . . . . . . . . . . . . . . . . . . . 105
7.5 Two-Population Hypothesis Testing (Independent Sampling) . . . . . . . . . . . 107
7.6 One-Way Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.7 Chi-Square Tests for Independence . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.8 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.9 Problem Set 7 (Due the Next Session) . . . . . . . . . . . . . . . . . . . . . . . 128

A Tables 133

2
Chapter 1

Introduction to Business Statistics

1.1 Population and Samples


Do you still remember your ACC 10 lesson on Receivables? If you recall, companies study
receivables clients have on them in terms of the number of days they are outstanding. This
strategy requires an aging schedule similar to that of Zilmann Company below.

Customer Total Not Yet Due 1-30 DPD 31-60 DPD 61-90 DPD Over 90 DPD
Arndt $22,000.00 - $10,000.00 $12,000.00 - -
Blair $40,000.00 $40,000.00 - - - -
Chase $57,000.00 $16,000.00 $6,000.00 - $35,000.00 -
Drea $34,000.00 - - - - $34,000.00
Eleanor $132,000.00 $96,000.00 $16,000.00 $14,000.00 - $6,000.00
$285,000.00 $152,000.00 $32,000.00 $26,000.00 $35,000.00 $40,000.00
EPU 3% 6% 13% 25% 60%
TEBD $42,610.00 $4,560.00 $1,920.00 $3,380.00 $8,750.00 $24,000.00
1
Table 1: Zilmann Company’s Aging Schedule

From Table 1, we have the following:


• We are presented information of the outstanding receivables of all the current customers
of Zilmann Company.

– For instance, the company observes that it has a $57,000 from Chase. $16,000 is
still not yet due, but $6,000 is late by 1-30 days; $35,000, 61-90 days.

• We are also given insights about groups of customers.

– For example, accounts that are 1-30 days past due are 3% doubtful. Thus, an
allowance for doubtful accounts of $4,560.00 must be made.

Such information and observations are called data. All the five customers are called
population. Insights in terms of measurements, like those about the account of Chase, are
called a census. When a group from the population is analyzed, it is called a sample.
1
DPD: Days Past Due, EPU: Estimated Percentage Uncollectible, TEBD: Total Estimated Bad Debts

3
Definition 1.1.1. (Triola, 2018)

• Data are collections of observations, such as measurements, genders, or survey responses.

• A population is the complete collection of all measurements or data that are being con-
sidered.

• A census is the collection of data from every member of the population.

• A sample is a subcollection of members from a population.

These terms are jargons people use who specialize in the field of Statistics, which is the
science of the ”state”.

Example 1.1.1. (Triola, 2018) In the journal article, ”Residential Carbon Monoxide Detector
Failure Rates in the United States” (by Ryan and Arnold, American Journal of Public Health,
Vol. 101, No. 10), it was stated that there are 38 million carbon monoxide detectors installed
in the United States. When 30 of them were randomly selected and tested, it was found that 12
of them failed to provide an alarm in hazardous carbon monoxide conditions.

The population is the 38 million carbon monoxide detectors in the United States. The
sample is the 30 carbon monoxide detectors that were selected and tested.
The objective is to use the sample data as a basis for drawing a conclusion about the
population of all carbon monoxide detectors, and methods of statistics are helpful in drawing
such conclusions.

Example 1.1.2. (Berenson, Krehbiel, and Levine, 2012) In 2008, a university in the mid-
western United States surveyed its full-time first-year students after they completed their first
semester. Surveys were electronically distributed to all 3,727 students, and responses were ob-
tained from 2,821 students. Of the students surveyed, 90.1% indicated that they had studied
with other students, and 57.1% indicated that they had tutored another student. The report also
noted that 61.3% of the students surveyed came to class late at least once, and 45.8% admitted
to being bored in class at least once.

The population is the 3,727 students surveyed in the university. The sample is the 2,821
responses taken from the survey.
A sample must be chosen so that it best represents the population of study. One of the best
ways to achieve this goal is to do random sampling.

1.2 Variables and Data


Now that you are familiar with population and samples, the next section talks about how to
make a census in a study. Note that data values can either be quantitative or qualitative.

Definition 1.2.1. (Bowerman, Murphree, and O’Connell, 2014); (Triola, 2018)

• A variable is a characteristic of an item or individual.

4
– A variable is quantitative (or numerical) if its data consist of numbers represent-
ing counts or measurements.
– A variable is categorical (or qualitative or attribute) if its data consist of names
or labels (not numbers that represent counts or measurements).

• A ratio variable is a quantitative variable measured on a scale such that ratios of its
values are meaningful and there is an inherently defined zero value.

• An interval variable is a quantitative variable where ratios of its values are not mean-
ingful and there is not an inherently defined zero value.

• An ordinal variable is a qualitative variable for which there is a meaningful ordering,


or ranking, of the categories.

• A nominative variable is a qualitative variable for which there is no meaningful or-


dering, or ranking, of the categories.

• A response variable is the variable of interest of the study, influenced by manipulative


variables called factors.

Implied from Definition 1.2.1 is that a datum (singular for data) is a value associated
with a variable.
Variables such as salary, height, weight, time, and distance are ratio variables. For exam-
ple, a distance of zero miles is ”no distance at all,” and a town that is 30 miles away is ”twice
as far” as a town that is 15 miles away.
Temperature (on the Fahrenheit scale) is an interval variable. For example, zero degrees
Fahrenheit does not represent ”no heat at all,” just that it is very cold. Thus, there is no inher-
ently defined zero value since the temperature where there is no heat is arbitrary. Furthermore,
ratios of temperatures are not meaningful. For example, it makes no sense to say that 60◦ is
twice as warm as 30◦ . In practice, there are very few interval variables other than temperature.
Almost all quantitative variables are ratio variables.
The measurements of an ordinal variable may be nonnumerical or numerical. For example,
a student may be asked to rate the teaching effectiveness of a college professor as excellent,
good, average, poor, or unsatisfactory. Here, one category is higher than the next one; that is,
”excellent” is a higher rating than ”good,” ”good” is a higher rating than ”average,” and so
on. Therefore, teaching effectiveness is an ordinal variable having nonnumerical measurements.
On the other hand, if (as is often done) we substitute the numbers 4, 3, 2, 1, and 0 for the
ratings excellent through unsatisfactory, then teaching effectiveness is an ordinal variable having
numerical measurements.
A person’s gender, the color of a car, and an employee’s state of residence are examples of
nominative variables.
To concretize response variables, in studies of diet and cholesterol, patients’ diets are not
under the analyst’s control. Patients are often unwilling or unable to follow prescribed diets;
doctors might simply ask patients what they eat and then look for associations between the
factor diet and the response variable cholesterol.

Example 1.2.1. (Anderson, Sweeney, and Williams, 2011)

5
Company Exchange Ticker Market Cap Price/ Gross
Symbol ($ millions) Earnings Profit
Ratio Margin (%)
DeWolfe Companies AMEX DWL 36.4 8.4 36.7
North Coast Energy OTC NCEB 52.5 6.2 59.3
Hansen Natural Corp. OTC HANS 41.1 14.6 44.8
MarineMax, Inc. NYSE HZO 111.5 7.2 23.8
Nanometrics Incorporated OTC NANO 228.6 38.0 53.3

Table 2: Data Set for Five Shadow Stocks

Table 2 shows a data set containing information for five of the shadow stocks tracked
by the American Association of Individual Investors. Shadow stocks are common stocks of
smaller companies that are not closely followed by Wall Street analysts. Each company is
analyzed in terms of exchange (nominative), ticker symbol (nominative), market cap
(ratio), price/earnings ratio (ratio), and gross profit margin (%) (ratio).

Example 1.2.2. (Triola, 2018)

• (Ratio) Class Times: The times of 50 minutes and 100 minutes for a statistics class
(0 minutes represents no class time, and 100 minutes is twice as long as 50 minutes.)

• (Interval) Years: The years 1492 and 1776 can be arranged in order, and the difference
of 284 years can be found and is meaningful. However, time did not begin in the year 0,
so the year 0 is arbitrary instead of being a natural zero starting point representing ”no
time”.

• (Ordinal) Course Grades: A college professor assigns grades of A, B, C, D, or F.


These grades can be arranged in order, but we can’t determine differences between the
grades. For example, we know that A is higher than B (so there is an ordering), but we
cannot subtract B from A (so the difference cannot be found).

• (Nominal) Coded Survey Responses: For an item on a survey, respondents are given
a choice for possible answers, and they are coded as follows: ”I agree” is coded as 1; ”I
disagree” is coded as 2; ”I don’t care” is coded as 3; ”I refuse to answer” is coded as
4; ”Go away and stop bothering me” is coded as 5. The number 1, 2, 3, 4, and 5 don’t
measure or count anything.

6
Chapter 2

Descriptive Statistics

2.1 Definition of Descriptive Statistics


In the first chapter, we have tackled the following topics:

• definitions and examples of data, population, census, and sample

• Statistics as the science about the ”state”

• definitions and examples of variables

• defintions and eamples of ratio, interval, ordinal, and nominative variables

Statistics wishes to study the ”state” of interest using data and census. Now, what we will
attempt to answer is: what do the data reveal about that ”state”?

Definition 2.1.1. (Bowerman, Murphree, and O’Connell, 2014)


Descriptive statistics is the science of describing the important aspects of a set of mea-
surements.

2.2 Constructing Frequency Distributions and Histograms


In a nutshell, the numbers in this field describe the ”state”. Part of the analysis includes how
frequent a specific measurement occurs in the set.

Definition 2.2.1. (Berenson, Krehbiel, and Levine, 2012)

• A frequency distribution summarizes numerical values by tallying them into a set of


numerically ordered classes.

• Classes are groups that represent a range of values, called a class interval. Each value
can be in only one class and every value must be contained in one of the classes.

When we wish to summarize the proportion (or fraction) of items in each class, we employ
the relative frequency for each class.

7
Definition 2.2.2. (Bowerman, Murphree, and O’Connell, 2014)
Let fc be the frequency of a class. If the data set consists of n observations, we define the
relative frequency of a class (RF C) as follows:

fc
RF C = (2.1)
n
The percent frequency of a class is obtained by multiplying the relative frequency by
100. Listings of both frequencies are referred to as a relative frequency distribution and a
percent frequency distribution, respectively.

Example 2.2.1. (Triola, 2018)

Time (sec) Frequency


60-119 7
120-179 22
180-239 14
240-299 2
300-359 5

Table 3: McDonald’s Dinner Service Times

Table 3 shows a frequency distribution of service times (in seconds) of McDonald’s


dinners. The data set are grouped according to classes/class intervals of service times.
From the table, there are a total of 50 McDonald’s dinners. Hence, n = 50. Thus, the
relative frequency be obtained by dividing each frequencies by 50. Multiplying each ratio by
100 yields the percent frequency. Table 4 presents such listings.

Time (sec) Frequency Relative Percent


Frequency Frequency
60-119 7 0.14 14.00%
120-179 22 0.44 44.00%
180-239 14 0.28 28.00%
240-299 2 0.04 4.00%
300-359 5 0.05 5.00%

Table 4: Frequency Distributions of McDonald’s Dinner Service Times

From the study, one can describe McDonalds’ service times of dinner; for instance, 44.00%
of dinners in McDonalds are served within 120 to 179 seconds. This happens to be the most
frequent service time among the 50 samples. Furthermore, 4.00% is served within 240 to 299
seconds, the least frequent occurrence in the data set.
Now, you are familiar with frequency distributions. Let us proceed with their constructions.

Definition 2.2.3. (Bowerman, Murphree, and O’Connell, 2014), Constructing Frequency Dis-
tributions and Histograms

8
• Step 1: Find the number of classes. One rule for finding an appropriate number of
classes says that the number of classes should be the smallest whole number K that makes
the quantity 2K greater than the number of measurements in the data set.

• Step 2: Find the class length. Let L be the largest measurement in the data set;
S the smallest measurement. The class length CL is defined by the ratio of the total
length of the data set to the number of classes K. Mathematically,

L−S
CL = (2.2)
K
• Step 3: Form nonoverlapping classes of equal width. We can form the classes
of the frequency distribution by defining the boundaries of the classes. To find the first
class boundary, we find the smallest data value. This value is the lower boundary of the
first class. Adding the class length to this lower boundary, we obtain the upper boundary
of the first class, which is also the lower boundary of the second class. Similarly, the upper
boundary of the second class and the lower boundary of the third class equals the class
length added to the lower boundary of the second class. The process continues until the
K th class. In cases where the largest measurement is not contained in the last class, we
simply add another class.

• Step 4: Count the number of measurements in each class. Having formed the
classes, count the number of measurements that fall into each class, and proceed with the
frequency distributions.

• Step 5: Graph the histogram. We can graphically portray the distribution of payment
times by drawing a histogram. The histogram can be constructed using the frequency,
relative frequency, or percent frequency distribution. To set up the histogram, we draw
rectangles that correspond to the classes. The base of the rectangle corresponding to a
class represents the payment times in the class. The height of the rectangle can represent
the class frequency, relative frequency, or percent frequency.

To further the first step, that part is equivalent to continuously dividing the sample by
2 until one ”half” in the last division has one element. The number of divisions becomes a
possible number of classes. For instance, if a researcher has six members, continuous division
yields three possible classes - Class A: three members, Class B: two members; Class C: one
member.

Example 2.2.2. (Triola, 2018)


5 0 1 0 2 0 5 0 5 0
3 8 5 0 5 0 5 6 0 0
0 0 0 0 8 5 5 0 4 5
0 0 4 0 0 0 0 0 8 0
9 5 3 0 5 0 0 0 5 8

Table 5: Last Digits of the Weights of Respondents

Weights of respondents were recorded as part of the California Health Interview Survey. The
last digits of weights from 50 randomly selected respondents are presented in Table 5.

9
• Step 1: Find the number of classes. Note that the sample size n = 50. The smallest
number of classes K appropriate for the sample is 6 since K = 6 is the smallest K such
that 50 ≤ 64 = 26 .

• Step 2: Find the class length. The largest measurement is L = 9; the smallest,
S = 0. Thus, the appropriate class length CL is CL = 9−0
6
= 1.5.

• Step 3: Form nonoverlapping classes of equal width. The first class is formed
with the smallest measurement S = 0 as the lower boundary. Its upper boundary is the
sum of the class length CL = 1.5 and S = 0, which is 1.5. This will also be the lower
boundary of the next class. Its upper boundary will be 1.5+CL = 3. The process continues
until K = 6 classes are formed. The summary of the classes is presented in Table 6.

Classes
[0, 1.5]
(1.5, 3]
(3, 4.5]
(4.5, 6]
(6, 7.5]
(7.5, 9]

Table 6: Classes of The Last Digits of Weights from 50 Randomly Selected Respondents

To understand the classes, ”[0, 1.5]” is the class of all data from 0 to 1.5. In other words,
those that belong to this class are data greater than or equal to 0 but less than or equal to
1.5. Moreover, ”(1.5, 3]” is the class of all data that are greater than 1.5 but less than or
equal to 3. This insight applies to all other classes.

• Step 4: Count the number of measurements in each class. Table 7 presents


the frequency distributions of the given data set. Note that the second column contains
results of counting measurements that belong to each class.

Classes Frequency Relative Percent


Frequency Frequency
[0, 1.5] 27 0.54 54.00%
(1.5, 3] 3 0.06 6.00%
(3, 4.5] 2 0.04 4.00%
(4.5, 6] 13 0.26 26.00%
(6, 7.5] 0 0 0.00%
(7.5, 9] 5 0.1 10.00%
Total 50 1 100%

Table 7: The Frequency Distributions of The Last Digits of Weights from 50 Randomly
Selected Respondents

• Step 5: Graph the histogram. Figure 1 presents a frequency histogram of the given
data set. From the figure, most respondents, specifically 27 out of 50, have weights
with 0 or 1 as their last digit.

10
30

20

Frequency
10

0
0 2 4 6 8
Classes

Figure 1: A Frequency Histogram of the 50 Participants’ Weights in Terms of Their


Last Digits

The red line in Figure 1 is called a frequency polygon, which presents a line graph of
the frequencies of each class.
Another form of data visualization is called an ogive, which is a line graph of the cumula-
tive frequencies of data sets. If we continue from Example 2.2.2, the cumulative frequency
distribution of the data set is given in Table 8. Figure 2 presents an ogive of the given
cumulative frequencies.

Classes Frequency Relative Percent Cumulative


Frequency Frequency Frequency
[0, 1.5] 27 0.54 54.00% 54.00%
(1.5, 3] 3 0.06 6.00% 60.00%
(3, 4.5] 2 0.04 4.00% 64.00%
(4.5, 6] 13 0.26 26.00% 90.00%
(6, 7.5] 0 0 0.00% 90.00%
(7.5, 9] 5 0.1 10.00% 100.00%
Total 50 1 100%

Table 8: The Frequency Distributions (Including the Cumulative Frequency Distribution) of


The Last Digits of Weights from 50 Randomly Selected Respondents

One insight gained from the cumulative frequency table is that 90.00% of participants has
weights with last digits less than or equal to 6. Another is that 100.00% of all last digits is less
than or equal to 9. In English, the whole data set is a sample of numbers less than or equal to
9.

11
1.2

Cumulative Frequency
0.8

0.6

0.4

0.2

0
0 2 4 6 8
Classes

Figure 2: An Ogive of the 50 Participants’ Weights in Terms of Their Last Digits

In the past examples, we observe that classes are in terms of intervals. However, classes are
not necessarily that form. Consider the next example.

Example 2.2.3. (Bowerman, Murphree, and O’Connell, 2014)


Below, we give the overall dining experience ratings (Outstanding, Very Good, Good, Aver-
age, or Poor) of 30 randomly selected patrons at a restaurant on a Saturday evening.

Outstanding Good Very Good Very Good Outstanding Good


Outstanding Outstanding Outstanding Very Good Very Good Average
Very Good Outstanding Outstanding Outstanding Outstanding Very Good
Outstanding Good Very Good Outstanding Very Good Outstanding
Good Very Good Outstanding Very Good Good Outstanding

Table 9: The Overall Dining Experience Ratings of 30 Randomly Selected Patrons at a


Restaurant on a Saturday Evening

Going back to Definition 2.2.3, we can observe that the classes of the given data set are
the experience ratings themselves. Hence, finding the class length will no longer be necessary.
As a result, there are five classes for the five ratings. Table 10 presents relevant data in terms
of the frequency of the given ratings, arranged from the most frequent to the least.

Classes Frequency Relative Percent Cumulative


Frequency Frequency Frequency
Outstanding (O) 14 0.4667 46.67% 46.67%
Very Good (VG) 10 0.3333 33.33% 80.00%
Good (G) 5 0.1667 16.67% 96.67%
Average (A) 1 0.0333 3.33% 100.00%
Poor (P) 0 0 0.00% 100.00%
Total 30 1 100%

Table 10: The Frequency Distributions of The Overall Dining Experience Ratings of 30
Randomly Selected Patrons

12
For numerical data, we visualize through a histogram and an ogive. For qualitative data,
we visualize through a frequency bar chart and a Pareto chart. Figure 3 and Figure 4
presents the frequency bar chart and the Pareto chart of the 30 experience ratings, respectively.

15

Cumulative Frequency
10

0
O VG G A P
Classes

Figure 3: The Frequency Bar Chart of the 30 Dinner Ratings


1.2

1
Cumulative Frequency

0.8

0.6

0.4

0.2

0
O VG G A P
Classes

Figure 4: The Pareto Chart of the 30 Dinner Ratings

Bowerman, Murphree, and O’Connell (2014) explain that Pareto charts are representa-
tions of the Pareto principle, which states that most effects are due to a few causes. In
terms of Example 2.2.3, most of the overall dinner ratings are due to only one class, the
”Outstandings”.
Our previous discussions and examples tend to describe the ”state” in terms of one variable.
Consider the next example that analyzes the ”state” in terms of two variables.

Example 2.2.4. (Bowerman, Murphree, and O’Connell, 2014)


The marketing department at the Rola-Cola Bottling Company is investigating the attitudes
and preferences of consumers toward Rola-Cola and a competing soft drink, Koka-Cola. Forty
randomly selected shoppers are given a ”blind taste-test” and are asked to give their preferences.
The results are given in Table 11.

13
Shopper Cola Sweetness Shopper Cola Sweetness
Preference Preference Preference Preference
1 Koka Very Sweet 21 Koka Very Sweet
2 Rola Sweet 22 Rola Not So Sweet
3 Koka Not So Sweet 23 Rola Not So Sweet
4 Rola Sweet 24 Koka Not So Sweet
5 Rola Very Sweet 25 Koka Sweet
6 Rola Not So Sweet 26 Rola Very Sweet
7 Koka Very Sweet 27 Koka Very Sweet
8 Rola Very Sweet 28 Rola Sweet
9 Koka Sweet 29 Rola Not So Sweet
10 Rola Very Sweet 30 Koka Not So Sweet
11 Rola Sweet 31 Koka Sweet
12 Rola Not So Sweet 32 Rola Very Sweet
13 Rola Very Sweet 33 Koka Sweet
14 Koka Very Sweet 34 Rola Sweet
15 Koka Not So Sweet 35 Rola Not So Sweet
16 Rola Sweet 36 Koka Very Sweet
17 Koka Not So Sweet 37 Rola Not So Sweet
18 Rola Very Sweet 38 Rola Very Sweet
19 Rola Sweet 39 Koka Not So Sweet
20 Rola Sweet 40 Rola Sweet

Table 11: Rola-Cola Bottling Company Survey Results

To present the frequency distribution of two variables such as the ones presented above, we
will be constructing a contingency table similar to that of Table 12. Let the row variable
be the customers’ cola preferences; column, their sweetness preferences.

Sweet (S) Very Not So Total


Sweet (VS) Sweet (NSW)
Koka 4 6 6 16
Rola 9 8 7 24
Total 13 14 13 40

Table 12: The Contingency Table Rola-Cola Bottling Company Survey Results

The goal of this contingency table is, for instance, to count the number of customers who
prefer Koka-Cola and sweet cola drinks. As a result, out of the 16 customers who prefer Koka-
Cola, 4 of them love sweet cola drinks. In other words, through the table, the analyst wishes to
answer the question, ”Is there a relationship between the two variables?” Hence, bar charts will
be different for two-variable analyses as shown in Figure 5.

14
9

8
8

Frequencies
7

6 6
6

4
4

S VS NSW
Koka Rola

Figure 5: The Bar Chart that Visualizes The Contingency Table of the Rola-Cola Bottling
Company Survey Results

From the graph, one can conclude that most customers who prefer Koka-Cola also prefer
either very sweet or not so sweet cola drinks. Moreover, most customers who prefer Rola-Cola
also prefer sweet cola drinks. The relationships between the two preferences do not explain their
causality, however.

For more information of other forms of data visualization, you may refer to the reference
section of your syllabus.

2.3 Describing Central Tendency


In addition to describing the shape of the distribution of a sample or population of measure-
ments, we also describe the data set’s central tendency.

Definition 2.3.1. (Triola, 2018)


A measure of center is a value at the center or middle of a data set.

To determine the center, one must recognize the difference between a parameter and a
statistic.

Definition 2.3.2. (Triola, 2018)

• A parameter is a numerical measurement describing some characteristic of a population.

• A statistic is a numerical measurement describing some characteristic of a sample.

From this definition, we now determine the difference between the population mean µ
and the sample mean x.

Definition 2.3.3. (Bowerman, Murphree, and O’Connell, 2014)

• The population mean µ is the average of the population measurements. Thus,

15
PN
xi x1 +x2 +...+xN
µ= i=1
N = N

where N is the population size.

• The sample mean x is defined to be


Pn
i=1 xi x1 + x2 + ... + xn
x= = (2.3)
n n
and is the point estimate of the population mean µ, where n is the sample size.

Example 2.3.1. (Weiers, 2008)

$31.69 56.69 65.50 83.50 56.88


72.06 121.44 97.00 42.25 71.88
70.63 35.81 83.19 43.63 40.06

Table 13: The Closing Prices of 15 of the Stocks Held by an Institutional Investor

Table 13 presents a list of closing prices for 15 of the stocks held by an institutional investor.
What was the average closing price for this sample of stocks?
SOLUTION: Let xi be the closing price of the ith stock, where i ∈ {1, 2, ..., 15}. We are
looking for the sample mean x. To do this, we will apply equation (2.3).

P15
xi $31.69+56.69+65.50...+40.06
x= i=1
n = 15 = $64.814
.˙. The average closing price of the sample is approximately $64.81.

One variation of the average is the weighted mean. When different x data values are
assigned different weights w, we can compute a weighted mean.

Definition 2.3.4. (Bowerman, Murphree, and O’Connell, 2014)


The weighted mean equals
P
w i xi
x= P (2.4)
wi

where

• xi = the value of the ith measurement

• wi = the weight applied to the ith measurement

Example 2.3.2. Computing Quality Point Index (QPI) for the Semester

16
Course Code Course Title Units Letter Grade
ACC 30 Introduction to Managerial Accounting 3 C
MA 19 Applied Calculus for Business 6 D
TH 121 An Introduction to Doing Catholic Theology 3 A
PSY 101 General Psychology 3 C+
PS 1 Introductory Physics I, Lecture 3 F
PS 2 Introductory Physics II, Laboratory 1 B+
Table 14: The Academic Performance of a Sophomore at the End of His Second Semester in
the Ateneo

Presented in Table 14 is the academic performance of a student in the Ateneo at the end
of his second semester. The letter grades represent the quality points the student was able to
garner during the semester. If the student receives an A, the quality point in that subject is 4;
B+, 3.5; B, 3; C+, 2.5; C, 2; D, 1; F, 0. His QPI is the weighted average of his academic
performance with respect to the total units he has completed over the semester. Hence, using
equation (2.4),
P
x= Pwi xi = (3·2)+(6·1)+(3·4)+(3·2.5)+(3·0)+(1·3.5)
≈ 1.84
wi (3+6+3+3+3+1)

The QPI that will be shown in his AISIS account is 1.84. Sadly, he did not reach the QPI
retention for sophomores of 1.9. I am not sure if he is still here in the Ateneo, though. :(

The weighted mean also applies to analyzing frequencies and classes. The disadvantage,
however, is that the classes are usually intervals. Hence, statisticians use the midpoints of each
class since the value is the average of the extreme measurements in the intervals.

Definition 2.3.5. (Bowerman, Murphree, and O’Connell, 2014)


The sample mean for grouped data is defined as
P P
fi Mi fi Mi
x= P = (2.5)
fi n

The population mean for grouped data is defined as


P
fi Mi
µ= (2.6)
N
where

• fi = the frequency for class i

• Mi = the midpoint for class i


P
• n= fi = the sample size

• N = the population size

17
Example 2.3.3. Continuation of Example 2.2.2 - The Last Digits of the Weights of Respon-
dents

Consider the resulting frequency table Table 7. As previously mentioned, we will consider
the midpoints of each class interval for the computation of the sample mean.
Hence, for the first class, its midpoint is 1; the second, 3; the third, 5; the fourth, 7; the
fifth, 9; the last, 11. Thus, applying equation (2.5),
P
x= Pfi Mi = (27·1)+(3·3)+(14·5)+(1·7)+(5·9)+(0·11)
= 2.88
fi 27+3+14+1+5+0

.˙. The sample mean of the last-digit data set is 2.88.

Another way to determine the value at the center of a data set is to find the actual middle.
Definition 2.3.6. (Triola, 2018; Bowerman, Murphree, and O’Connell, 2014)

The median x̃ of a data set is the measure of center that is the middle value when the
original data values are arranged in order of increasing (or decreasing) magnitude.
The median is found as follows:
• If the number of measurements is odd, the median is the middlemost measurement in
the ordering,
• If the number of measurements is even, the median is the average of the two mid-
dlemost measurements in the ordering.
Example 2.3.4. (Weiers, 2008)
With the same data in Table 13, what is the median closing price?
SOLUTION: We will be arranging the given data set in ascending order. Table 15
presents this order (with arrangement from left to right).

31.69 35.81 40.06 42.25 43.63


56.69 56.88 65.50 70.63 71.88
72.06 83.19 83.50 97.00 121.44

Table 15: The Closing Prices of 15 of the Stocks Held by an Institutional Investor in
Ascending Order

The sample size is 15. Hence, the middle measurement, the 8th datum, is $ 65.50.
.˙. The median closing price of the sample is $65.50.
One insight here is that 50% of the closing prices is at most $65.50. That means
the other 50% is at least $65.50.

Example 2.3.5. (Anderson, Sweeney, and Williams, 2011)

Consider a sample with data values of 10, 20, 21, 17, 16, and 12. Compute the median.
SOLUTION: We will be arranging the data in ascending order. Thus,

18
10 12 16 17 20 21

Table 16: The Data Set in Ascending Order

Since the sample size is 6, the third and fourth measurements, 16 and 17, are the ”middle”
of the set. Hence, the median is the average of the two values.
16+17
.˙. The median of the sample is 2
= 16.50.

Another way to determine the central value of a data set is to find the most frequent
measurement.
Definition 2.3.7. (Triola, 2018)
The mode of a data set is the value(s) that occur(s) with the greatest frequency.

Example 2.3.6. (Anderson, Sweeney, and Williams, 2011)

Consider a sample with data values of 53, 55, 70, 58, 64, 57, 53, 69, 57, 68, and 53.
Compute the mode.
SOLUTION: We need to find the most frequent data value. Table 13 presents the fre-
quency table of the given data set.

Value Frequency
53 3
55 1
57 2
58 1
64 1
68 1
69 1
70 1

Table 17: The Frequency Table of the Given Data Set

The measurement 53 seems to be the most frequent in the sample.


.˙. The mode of the sample is 53.

After these measures of central tendency, an analyst wishes to see if the values change in
the presence of extreme values or outliers.

Definition 2.3.8. (Triola, 2018)


A statistic is resistant if the presence of extreme values (outliers) does not cause it to change
very much.

One must note that a disadvantage of the mean is that just one extreme value (outlier) can
change the value of the mean substantially. Given the above-mentioned definition, the mean is
not resistant.
The median, on the other hand, does not change by large amounts when we include just a
few extreme values since the measure does not directly use every data value. For instance, if
the largest value is changed to a much larger value, the median does not change. Hence, the
median is a resistant measure of center.

19
2.4 Measures of Variation
Now that we are able to describe the ”state” in terms of its measure of center, analysts also want
to gain insights on the differences among data values. For instance, describing the differences
among service times of cashiers in a fast food restaurant will reveal their efficiency in satisfying
customers’ orders. Hence, the manager must look for strategies to reduce variation to maximize
customer satisfaction. Thus, we proceed to the different measures of variation.

Definition 2.4.1. (Triola, 2018)


The range R of a set of data values is the difference between the maximum data value max
and the minimum data value min. Mathematically,

R = max − min (2.7)

One must note that since the range uses extreme values, any changes in the maximum and
the minimum will effectively alter the difference. Hence, the range is not resistant. Moreover,
as a consequence of equation (2.7), the range does not take into account every non-extreme
datum. Hence, the measure fails to reflect differences among values.

Example 2.4.1. (Berenson, Krehbiel, and Levine, 2012)


24 23 22 21 22 22 18 18 26
26 26 19 19 19 21 21 21 21
21 18 19 21 22 22 16 16
Table 18: The Data Set of The Overall Miles per Gallon of 26 2010 Small SUVs

The table above contains the overall miles per gallon (MPG) of 2010 small SUVs. Since
the maximum MPG is 26, and the minimum is 16, the range of the data set is the difference
between 26 and 16, which is 10 MPG.

Another way to measure the variation among data values is to determine how far each point
is from the center. To do this, statisticians use the standard deviation.

Definition 2.4.2. (Triola, 2018; Bowerman, Murphree, and O’Connell, 2014)

The standard deviation of a set of sample (population) values, denoted by s (σ), is a


measure of how much data values deviate away from the mean.
Let s be the sample standard deviation and σ be the population standard deviation. Moreover,
let n be the sample size; N , population size. Then,
rP
(x − x)2
s= (2.8)
n−1
rP
(x − µ)2
σ= (2.9)
N
Suppose we are given grouped data. Let fi and Mi be the frequency and the midpoint of the
th
i class, respectively. Let n and N be the sample and population size, respectively. Then,

20
rP
fi (Mi − x)2
s= (2.10)
n−1
rP
fi (Mi − µ)2
σ= (2.11)
N

Definition 2.4.3. The variance of a sample s2 and a population σ 2 is the square of their
respective standard deviations.

The explanation for s to have a denominator of n − 1 instead of n will be discussed in later


chapters. However, such division is necessary for the sample variance to be unbiased, which is
somewhat synonymous to resistant. In either equation, notice that the formulation seems to
be similar to getting the distance between any two points in the Cartesian plane. Remember
MA 11? ;)

Example 2.4.2. (Bowerman, Murphree, and O’Connell, 2014)


In order to control costs, a company wishes to study the amount of money its sales force spends
entertaining clients. The following is a random sample of six entertainment expenses (dinner
costs for four people) from expense reports submitted by members of the sales force.

$157 $132 $109 $145 $125 $139


Table 19: The Six Entertainment Expenses from Expense Reports Submitted by Members of
the Sales Force

In order to compute for the sample variance s2 , we need to find the sample mean x. Let
xi be the expense of the ith member of the sales force, where i ∈ {1, 2, ..., 6} Thus, by equation
(2.3),

P6
xi $157+$132+$109+$145+$125+$139
x= i=1
n = 6 = $134.5

Given the sample mean, we can proceed with calculating the sample standard deviation s.
By equation (2.8),
q P6 q
2
i=1 (xi −x) ($157−$134.5)2 +($132−$134.5)2 +...+($139−$134.5)2
s= n−1 = 6−1 ≈ $16.63

.˙. The sample standard deviation s is $16.63. The sample variance s2 is 276.7.

Example 2.4.3. (Bowerman, Murphree, and O’Connell, 2014)

The following frequency distribution summarizes the weights of 195 fish caught by anglers
participating in a professional bass fishing tournament.

21
Weight (Pounds) Frequency
1-3 53
4-6 118
7-9 21
10-12 3
Table 20: The Six Entertainment Expenses from Expense Reports Submitted by Members of
the Sales Force

We want to calculate for the sample standard deviation s. To do that, we will be needing
the sample mean x for grouped data.
Note that there are four classes. The midpoint of the first class (1-3) is 2; the second (4-6),
5; the third (7-9), 8; the last (10-12), 11. Hence, applying equation (2.5),
P
fi Mi (53·2)+(118·5)+(21·8)+(3·11)
x= n = 195 = 4.6

Now that we have x, applying equation (2.10),


qP q
fi (Mi −x)2 53(2−4.6)2 +118(5−4.6)2 +21(8−4.6)2 +3(11−4.62 )
s= n−1 = 195−1 ≈ 1.96

.˙. The sample standard deviation s is 1.96 pounds. The sample variance s2 is 3.83.

From Example 2.4.2, one can conclude that the average variation among data points from
the sample mean is $16.63. In layman’s terms, on average, each datum is $16.63 away from the
sample mean of $134.5.
One application of the standard deviation can be seen in analyzing a normal curve, a
symmetrical, bell-shaped graph similar to that in Figure 6. If the sample or the population
behaves similarly to a normal curve, then the given set is said to be normally distributed.
From Figure 6, approximately 68.26% of the sample is within the interval [x − s, x + s]. In
other words, that percentage lies within one standard deviation from the sample mean.
Moreover, approximately 95.44% is within [x − 2s, x + 2s] or two standard deviations from
the sample mean. Lastly, approximately 99.73% is within [x − 3s, x + 3s] or three standard
deviations from the mean. The said intervals are known as tolerance intervals, indicating a
certain percentage of the sample belongs to these. The same principle applies to a population
with mean µ and standard deviation σ. We, now, have the following results.

Definition 2.4.4. (Bowerman, Murphree, and O’Connell, 2014), The Empirical Rule for a
Normally Distributed Population

If a population (sample) has mean µ (x) and standard deviation σ (s), then

• 68.26% of the population (sample) measurements are within (plus or minus) one standard
deviation of the mean and thus lie in the interval [µ − σ, µ + σ] ([x − s, x + s]);

• 95.44% of the population (sample) are within (plus or minus) two standard deviations of
the mean and thus lie in the interval [µ − 2σ, µ + 2σ] ([x − 2s, x + 2s]);

22
• 99.73% of the population (sample) are within (plus or minus) three standard deviations
of the mean and thus lie in the interval [µ − 3σ, µ + 3σ] ([x − 3s, x + 3s]);

Figure 6: The Normal Curve (Empirical Rule)

Example 2.4.4. (Triola, 2018)

IQ scores have a bell-shaped distribution with a mean of 100 and a standard deviation of 15.
What percentage of IQ scores are between 70 and 130?
SOLUTION: Since the IQ scores behave in a bell-shaped manner, the population (to be
assumed) is normally distributed. Hence, we can apply the empirical rule. Since µ = 100 and
σ = 15, we can observe that

70 = 100 − 30 = µ − 2σ
130 = 100 + 30 = µ + 2σ

Thus, the interval [70, 130] is observed to be [µ − 2σ, µ + 2σ]. Hence, the problem is looking
for the percentage of the population that lies within two standard deviations from the mean. By
the empirical rule, 95.44% of the population are between 70 and 130.

Unfortunately, the empirical rule only applies to normally distributed data set. Nonetheless,
this next theorem can reconcile that problem.

23
Proposition 2.4.1. (Bowerman, Murphree, and O’Connell, 2014), Chebyshev’s Theorem

Consider any population (sample) that has mean µ (x) and standard deviation σ (s). Then,
for any value of k greater than 1, at least 100(1 − k12 )% of the population measurements lie in
the interval [µ − kσ, µ + kσ].

As much as I want to show the reason behind this theorem, let’s just try to accept the
proposition as it is for now. If ever you are interested about the proof, you can always enroll
in a masters class in Statistics (which I doubt you will huhu)! Okay, time for an example.

Example 2.4.5. (Berenson, Krehbiel, and Levine, 2012)

Consider a population of 1,024 mutual funds that primarily invest in large companies. You
have determined that µ, the mean one-year total percentage return achieved by all the funds, is
8.20 and that σ, the standard deviation, is 2.75.
According to the Chebyshev rule, at least 93.75% of these funds are expected to have one-year
total returns between what two amounts?
SOLUTION: Thus,

100(1− k12 ) = 93.75%

Solving for the unknown, k = 4. Hence, 93.75% of the 1,024 mutual funds lies within
four standard deviations from the population mean of 8.20. Thus, the tolerance interval is
[µ − 4σ, µ + 4σ] = [8.20 − 4(2.75), 8.20 + 4(2.75)] = [−2.8, 19.2]. Therefore, 93.75% of the
population are expected to have total returns that lie in the interval [-2.8, 19.2].

Now, consider two samples or populations with different means. As a result, their standard
deviations may be different. Hence, to effectively compare the distributions between the two
data sets, the coefficient of variation will be used.

Definition 2.4.5. (Triola, 2018)

The coefficient of variation (or CV) for a set of nonnegative sample or population data,
expressed as a percent, describes the standard deviation relative to the mean, and is given by
the following:

s
CVsample = · 100% (2.12)
x

σ
CVpopulation = · 100% (2.13)
µ

Example 2.4.6. (Triola, 2018)

Consider two data sets. For the Verizon data speeds, the sample mean x is 17.60 Mbps
while the sample standard deviation s is 16.02 Mbps. For earthquake magnitudes, x = 2.572

24
and s = 0.651. Note that we want to compare variation among data speeds to variation among
earthquake magnitudes.
Here, we have different scales and different units of measurement, so we use the coefficients
of variation:

16.02M bps
Verizon Data Speeds: CV = xs ·100% = 17.60M bps ·100% = 91.0%

Earthquake Magnitudes: CV = xs ·100% =


0.651
2.572 ·100% = 25.3%

We can now see that the Verizon data speeds (with CV = 91.0%) vary considerably more
than earthquake magnitudes (with CV = 25.3%).

Moreover, we can compare two data sets in terms of their relationships. In the previous
sections, we have introduced contingency tables to determine the relationships between two
variables in terms of frequencies. Now, we will approach the analysis of such relations mathe-
matically.

Definition 2.4.6. (Berenson, Krehbiel, and Levine, 2012)

The covariance measures the strength of the linear relationship between numerical variables
(X and Y ). The sample covariance sxy is given by:
P
(xi − x)(yi − y)
sxy = (2.14)
n−1

The sample covariance is calculated by using a sample of n pairs of observed values of x


and y. Consider the following example.

Example 2.4.7. (Berenson, Krehbiel, and Levine, 2012)

The following is a set of data from a sample of n = 11 items:

X 7 5 8 3 6 10 12 4 9 15 18
Y 21 15 24 9 18 30 36 12 27 45 54
Table 21: Two Sample Data Sets Each with 11 Items

To calculate the sample covariance sxy , we must compute the sample means x and y. The
reader must note that x = 9.7 and y = 29.1. Thus, if we let xi and yi be the ith observation in
terms of the two distributions, applying equation (2.15),
P11
i=1 (xi −x)(yi −y) (7−9.7)(21−29.1)+(5−9.7)(15−29.1)+...+(18−9.7)(54−29.1)
sxy = n−1 = (11−1) = 67.857

To illustrate what this number means, consider Figure 7.

25
60

40

Y
20

0
0 5 10 15 20
X

Figure 7: The Scatter Plot of the Two Sample Data Sets

Figure 7 presents the scatter plot of X and Y, which is a simple graph that can be used to
study the relationship between two variables. In the figure, the blue line represents the sample
mean of the X data set; the red, the sample mean of the Y data set. As you can see, the two
lines form four quadrants. The first quadrant, the upper right region, presents the set of all
coordinates whose x and y values are greater than x and y, respectively. The second quadrant,
the upper left region, presents the set of all coordinates whose x values are less than x, but their
y values are greater than y. Moreover, the third quadrant, the lower left region, presents the set
of all coordinates whose x and y values are less than x and y, respectively. Lastly, the fourth
quadrant, the lower right region, presents the set of all coordinates whose x values are greater
than x, but their y values are less than y.
Observe that all data points lie in the first and third quadrants, which indicate that for every
increase (decrease) in X, there is an increase (decrease) in X. This means that X and Y observe
a positive relationship. This makes sense since to have sxy = 67.857 means that on average,
the relationship between X and Y is positive.

However, one problem with using the covariance as a measure of the strength of the linear
relationship between x and y is that the value of the covariance depends on the units in which x
and y are measured. A measure of the strength of the linear relationship between x and y that
does not depend on the units in which x and y are measured is the correlation coefficient.

Definition 2.4.7. (Bowerman, Murphree, and O’Connell, 2014)

The sample correlation coefficient is denoted as r and is defined as follows:

sxy
r= (2.15)
sx sy

Here, sx y is the previously defined sample covariance, sx is the sample standard deviation
of the sample of x values, and sy is the standard deviation of the sample of y values.

The coefficient of correlation measures the relative strength of a linear relationship


between two numerical variable.s The values of the coefficient of correlation range from −1

26
for a perfect negative correlation to +1 for a perfect positive correlation. Perfect in this case
means that if the points were plotted on a scatter plot, all the points could be connected with
a straight line.
If we go back to Figure 7, on average, we can say that two variables are positively correlated
if most data points lie in the first and third quadrant. Moreover, two variables are negatively
correlated if most data points lie in the second and fourth quadrant. In these two regions, there
exists an inverse relationship between the analyzed two variables.

Example 2.4.8. Continuation of Example 2.4.7

If we continue from Example 2.4.7, the reader must note that sample standard deviations
sx and sy are 4.755944 and 14.26783, respectively. Thus, applying equation (2.16),

sxy 67.857
r= sx sy = (4.755944)(14.26783) =1

Therefore, since r = 1, X and Y have a perfect positive correlation. This means that
X and Y can be modelled linearly.

2.5 Measures of Relative Standing and Boxplots


In the past sections, we have been describing the ”state” in terms of central tendencies and
variations among data sets. Now, we will try to measure the location of points in the data set.

Definition 2.5.1. (Triola, 2018)

A z-score (or standard score or standardized value) is the number of standard devi-
ations that a given value x is above or below the mean. The z-score is calculated by using one
of the following:

x−x
zsample = (2.16)
s

x−µ
zpopulation = (2.17)
σ

Note that a positive z-score says that x is above (greater than) the mean, while a negative
z-score says that x is below (less than) the mean. For instance, a z-score equal to 2.3 says that
x is 2.3 standard deviations above the mean. Similarly, a z-score equal to −1.68 says that x is
1.68 standard deviations below the mean. A z-score equal to zero says that x equals the mean.
Another useful measure of relative standing involves the location of each point with respect
to other values within the data set. This is called percentiles.

Definition 2.5.2. (Triola, 2018)

Percentiles are measures of location, denoted by P1 , P2 ,...,P99 , which divide a set of data
into 100 groups with about 1% of the values in each group.

27
Let n(x) be the number of x values in the data set. To find the percentile P (x∗ ) (rounded
up to the nearest whole number) of a data value x∗ :

n(x ≤ x∗ )
P (x∗ ) = · 100 (2.18)
n(x)

Note that n(x ≤ x∗ ) is the number of all x values less than or equal to x∗ . To concretize
the equation, the 40th percentile, denoted by P40 , has about 40% of the data values below it
and 60% of the data values above it. Furthermore, the 50th percentile, denoted by P50 , has
about 50% of the data values below it and about 50% of the data values above it, so the 50th
percentile is the same as the median.

Example 2.5.1. (Bowerman, Murphree, and O’Connell, 2014)

Thirteen internists in the Midwest are randomly selected, and each internist is asked to
report last year’s income. The incomes obtained (in thousand of dollars) are 152, 144, 162,
154, 146, 241, 127, 141, 171, 177, 138, 132, 192.
To effectively find the 90th percentile, we must arrange the data in ascending order. Hence,

127 132 138 141 144 146 152 154 162 171 177 192 241
Table 22: Last Year’s Income (in thousand of dollars) of Thirteen Internists in the Midwest

Note that n(x) = 13 and P (x∗ ) = 90. We are looking for n(x ≤ x∗ ). Then, manipulating
equation (2.19),

[P (x∗ )][n(x)]
n(x ≤ x∗ ) = 100 = 90·13
100 = 11.7 ≈ 12

Hence, x∗ is the 12th data point, which is 192.

Note that if n(x ≤ x∗ ) is an integer i, the percentile is the average of the ith and
(i + 1)th terms. Remember that n(x ≤ x∗ ) is the number of terms x that are less than or
equal to x∗ . That means we have to include x∗ in counting. That is the reason for getting the
average of the ith and (i + 1)th terms to include x∗ in the counting.

Definition 2.5.3. (Triola, 2018)

Quartiles are measures of location, denoted by Q1 , Q2 , and Q3 , which divide a set of data
into four groups with about 25% of the values in each group.

Note the the first quartile Q1 has the same value as the 25th percentile P25 . It separates the
bottom 25% of the sorted values from the top 75%. The second quartile Q2 is the same as the
50th percentile P50 and the median. It separates the bottom 50% of the sorted values from the
top 50%. The third quartile Q3 is the same as the 75th percentile P75 . It separates the bottom
75th percentile P75 . It separates the bottom 75% of the sorted values from the top 25%.

28
Example 2.5.2. Continuation of Example 2.5.1

Note that since n(x) = 13 is odd, the median is the 7th data point, which is 152. To prove
this, the median is the second quartile Q2 and the 50th percentile. Thus, with P (x∗ ) = 50,

[P (x∗ )][n(x)]
n(x ≤ x∗ ) = 100 = 50·13
100 = 6.5 ≈ 7

Hence, the 7th data point 152 is the median, indeed.


If we want to find the first quartile Q1 , that is the same as looking for the 25th percentile
P25 . Hence, with P (x∗ ) = 25,

[P (x∗ )][n(x)]
n(x ≤ x∗ ) = 100 = 25·13
100 = 3.25 ≈ 4

Hence, the fourth data point is the first quantile Q1 , which is 141.
If we want to find the third quartile Q3 , that is the same as looking for the 75th percentile
P75 . Hence, with P (x∗ ) = 75,

[P (x∗ )][n(x)]
n(x ≤ x∗ ) = 100 = 75·13
100 = 9.75 ≈ 10

Hence, the 10th data point is the third quartile Q3 , which is 171.

We next define the interquartile range, denoted by IQR, to be the difference between
the third quartile Q3 and the first quartile Q1 . Mathematically,

IQR = Q3 − Q1 (2.19)

This quantity can be interpreted as the length of the interval that contains the middle 50
percent of the measurements. For instance, given Example 2.5.2, the interquartile range of
13 incomes is Q3 − Q1 = 30. This says that we estimate that the middle 50% of all incomes
fall within a range that is $30,000 long. Now, we are ready to construct a box plot.

Definition 2.5.4. (Bowerman, Murphree, and O’Connell, 2014), Constructing a Box-and-


Whiskers Display (Box Plot)

1. Draw a box that extends from the first quartile Q1 to the third quartile Q3 . Also, draw a
vertical line through the box located at the median P50 or Q2 .

2. Determine the values of the lower and upper limits. The lower limit is located
1.5(IQR) below Q1 , and the upper limit is located 1.5(IQR) above Q3 . That is, the
lower and upper limits are Q1 − 1.5(IQR) and Q3 + 1.5(IQR), respectively.

3. Draw whiskers as dashed lines that extend below Q1 and above Q3 . Draw one whisker
from Q1 to the smallest measurement that is between the lower and upper limits. Draw
the other whisker from Q3 to the largest measurement that is between the lower and upper
limits.

29
4. A measurement that is less than the lower limit or greater than the upper limit is an
outlier. Plot each outline using the symbol ∗.

You might be asking, ”1.5(IQR)?” Well, John Tukey, the statistician behind such de-
cree, uses such allowance so that 99.3% of the measurements fall within the interval [Q1 −
1.5(IQR), Q3 + 1.5(IQR)]. This tolerance interval guarantees a sufficient number of outliers (if
any) in analyzing any given data set. If you recall the empirical rule, 95.44% of the population
is within two standard deviations of the mean. However, 4.56% as outliers is too large, mak-
ing the tolerance interval too exclusive. Moreover, 99.73% of the population is within three
standard deviations of the mean. Having 0.27% as outliers is too small, making the tolerance
interval too inclusive. Hence, using the tolerance interval [Q1 − 1.5(IQR), Q3 + 1.5(IQR)] as
basis balances the two extreme cases.

Example 2.5.3. Continuation of Example 2.5.1 and Example 2.5.2

We have established that Q1 = 141, Q2 = 152, Q3 = 171, and IQR = 30.


The limits of the sample are calculated as follows:

Lower Limit: Q1 − 1.5(IQR) = 141 − 1.5(30) = 96

Upper Limit: Q3 + 1.5(IQR) = 171 + 1.5(30) = 216

Since the lower limit is 96, we can say that the minimum is the smallest measurement of
the data set. Hence, the minimum is 127.
Furthermore, the supposed maximum of the data set is 241. However, the measurement is
an outlier since it is beyond the upper limit of 216. Hence, the maximum data value that is less
than the upper limit is 192.
Note that we have identified all outliers, which is the measurement 241.
Now that we have all ingredients, we can construct the boxplot of the thirteen internists’
incomes. Figure 8 is modified with constant lines in order to visualize clearly the quartiles,
the median, and the limits as shown in Figure 9.

Figure 8: The Boxplot of Last Year’s Incomes of Thirteen Internists

30
Figure 9: The Boxplot of Last Year’s Incomes of Thirteen Internists (with Emphasis on
Elements)

From Figure 9, the lower blue line represents the first quartile of 141; the upper, the third
quartile of 171. The green line represents the median of 152 while the reds are the limits of 96
(lower) and 216 (upper). The black lines in between the blues and the reds are the minimum
(127) and the maximum (192) of the data set. The outlier (241) is represented by a hollowed
circle.
Through the given boxplot, one can conclude that 99.3% of the sample measurements lie in
the interval [127, 192]. It is expected that 25% of the sample has incomes at least $127,000 and
at most $141,000. Moreover, 25% of the sample has incomes at least $171,000 and at most
$192,000. There are a lot more insights that can be gained from the boxplot. You think you can
find something? :)

We construct boxplots based on five significant data points - Q1 , Q2 , Q3 , the maximum, and
the minimum. These information make up what statisticians call the five-number summary.

Example 2.5.4. (Bowerman, Murphree, and O’Connell, 2014)

Construct a box-and-whiskers display of the following 12 household incomes.

7,524 11,070 18,211 26,817 36,551 41,286


49,312 57,283 72,814 90,416 135,540 190,250
Table 23: Two Sample Data Sets Each with 11 Items

SOLUTION: Note that the incomes are arranged in ascending order, tailored for our
convenience. Now, we need to find Q1 , Q2 , Q3 , IQR, and the limits. Note that n(x) = 12.

• Finding Q1 : Note that Q1 = P25 , implying that P (x∗ ) = 25.

[P (x∗ )][n(x)]
n(x ≤ x∗ ) = 100 = 25·12
100 =3

18,211+26,817
Hence, we get the average of the 3rd and the 4th terms, which is 2
= 22, 514.
Thus, the first quartile is 22,514.

31
• Finding the median (Q2 ): Verify that the median is 45,299.

• Finding Q3 : Verify that the third quartile of the set is 81,615.

• Finding IQR:

IQR = Q3 − Q1 = 81, 615 − 22, 514 = 59, 10154, 603

Therefore, the interquantile range is 59,101.

• Finding the limits:

Lower Limit: Q1 − 1.5(IQR) = 22, 514 − 1.5(59, 101) = −66, 137.5

Upper Limit: Q3 + 1.5(IQR) = 81, 615 + 1.5(59, 101) = 170, 266.5

Therefore, the lower limit is -66,137.5; the upper, 170,266.5.

• Finding the maximum and the minimum: Since all values are greater than the lower limit,
the minimum is 7,524. However, the supposed maximum of 190,250 is beyond the up-
per limit of 170,266.5. Hence, the maximum datum that is less than the upper limit is
135,540.

Note that that only the outlier in the data set is 190,250.
Figure 10 presents the boxplot of the given household incomes.

Figure 10: The Boxplot of Twelve Household Incomes

32
2.6 Other Data Visualization Methods
So far, you have been exposed to a histogram (p. 10), a frequency polygon (p. 10), an
ogive (p. 11), a frequency bar chart (pp. 12, 14), a Pareto chart (p.12), a scatter plot
(p. 24), and a boxplot (p. 29). Now, allow me to introduce to you two more interesting data
visualizations statisticians use in their analyses.

Definition 2.6.1. (Triola, 2018)

A dot plot consists of a graph of quantitative data in which each data value is plotted as a
point (or dot) above a horizontal scale of values. Dots representing equal values are stacked.
Example 2.6.1. (Bowerman, Murphree, and O’Connell, 2014)

The following data consist of the number of students who were absent in a professor’s statis-
tics class each day during the last month.

2 0 3 1 2 5 8 0 1 4
1 10 6 2 2 0 3 6 0 1

Table 24: A 20-Day Sample of the Daily Number of Absent Students in Class

Constructing a dot plot requires a placing a dot on the corresponding measurement in the
x-axis. Whenever there is a repeated measurement, the analyst must stack another dot above a
previously drawn one. Figure 11 presents the dot plot of the daily number of absent students,
with ”x-values” ranging from 0 to 10.

Figure 11: The Boxplot of Twelve Household Incomes

From the figure, one can say that the set is trimodal; the sample has three modes (0, 1, 2).

33
Definition 2.6.2. (Bowerman, Murphree, and O’Connell, 2014)

The stem-and-leaf display is a kind of graph that places the measurements in order from
smallest to largest, and allows the analyst to simultaneously see all of the measurements in the
data set and see the shape of the data set’s distribution.

Definition 2.6.3. (Bowerman, Murphree, and O’Connell, 2014), Constructing a Stem-and-


Lead Display

1. Decide what units will be used for the stems and the leaves. Each leaf must be a single
digit and the stem values will consist of appropriate leading digits.

2. Place the stem values in a column to the left of a vertical line with the smallest value at
the top of the column and the largest value at the bottom.

3. To the right of the vertical line, enter the leaf for each measurement into the row cor-
responding to the stem value. Each leaf should be a single digit - these can be rounded
values that were originally more than one digit if we are using an appropriately defined
leaf unit.

4. Rearrange the leaves so that they are in increasing order from left to right.

Example 2.6.2. (Bowerman, Murphree, and O’Connell, 2014)

Recall that 65 purchasers have participated in a survey and have rated the XYZ-Box video
game system. The composite ratings that have been obtained are as follows:

39 38 40 40 40 46 43 38 44 44 44
45 42 42 47 46 45 41 43 46 44 42
38 46 45 44 41 45 40 36 48 44 47
42 44 44 43 43 46 43 44 44 46 43
42 40 42 45 39 43 44 44 41 39 45
41 30 46 45 43 47 41 45 45 41
Table 25: The Ratings of 65 Purchasers on the XYZ-Box Video Game System

Note that the ratings range from 36 to 48. Hence, the analyst must count the frequencies of
each data point in the range. Figure 12 presents the stem-and-leaf display of the 65 ratings.

34
Figure 12: The Stem-and-Leaf Display of 65 Ratings on the XYZ-Box Video Game System

2.7 Problem Set 1 (Due the Next Session)


1. (Bhattacharyya and Johnson, 2010) A student at the University of Wisconsin surveyed
40 students in her dorm concerning their participation in extracurricular activities during
the past week. The data on number of activities are:

1 5 0 1 4 3 0 2 1 6 1 1 0 0
2 0 0 3 1 2 1 2 2 2 2 2 1 0
2 2 3 4 2 7 2 2 3 3 1 1

Table 26: The Number of Activities 40 Students Have Participated in the Past Week

Present these data in a frequency table and in a relative frequency bar chart (not a
histogram).

2. (Ang, 2011) Use the following data to construct a frequency distribution table with 4
classes of equal size:

28.53 75.53 21.15 7.13 82.84 52.15 18.24 17.87 45.58 51.21
38.72 57.39 42.89 19.18 28.91 31.18 44.41 39.52 65.46 29.39
16.86 18.15 29.37 42.81 74.12 62.84 28.26 9.29 18.91 49.52

Table 27: A Sample of 30 Numerical Data

3. (Ang, 2005) A start-up company manufacturing mobile phone car chargers is greatly
concerned with the growing number of defective products being produced by its sole as-
sembly line. To be able to initiate changes to improve production efficiency, the company
gathered data regarding the number of defective products and their corresponding loss
expense. Provided herein is a sample obtained showing 10-week figures for number of
defective products and $ loss expense per unit.

35
Week # of Defective Chargers $ Loss Expense per Unit
1 13 153.85
2 18 430.56
3 12 125.00
4 14 246.43
5 15 314.67
6 12 87.50
7 10 80.00
8 14 235.71
9 15 306.67
10 13 173.08

Table 28: The 10-Week Figures for Number of Defective Products and $ Loss Expense
per Unit

a. Compute the mean and standard deviation for the sample data on the number of
defective chargers.
b. Compute the mean and standard deviation for the sample data on the total $ loss
expense.
c. Using the computed mean and standard deviation in part a as estimates of the mean
and standard deviation of the weekly number of defective products for a given year,
utilize Chebyshev’s theorem to determine the range within which 75% of the weekly
number of defective products must fall.
d. Assume that the distribution of the weekly number of defective products is bell-
shaped. Using the computed mean and standard deviation in part a as estimates of
the mean and standard deviation of the weekly number of defective products for a
given year, use the empirical rule to determine the range within which 95% of the
weekly number of defective products must fall.

4. (Ang, 2012) A stock analyst is following two stocks: A and B. Suppose a random sample
of daily closing prices of the stocks shows that their sample standard deviations are
sA = $0.60 and sB = $6.00. Suppose that further analysis shows that the mean prices of
the stocks were $2.40 and $120.00, respectively. Which stock is more volatile? Explain
your answer.

5. (Anderson, Sweeney, and Williams, 2011) The cost of consumer purchases such as single-
family housing, gasoline, Internet services, tax preparation, and hospitalization were pro-
vided in The Wall-Street Journal (January 2, 2007). Sample data typical of the cost of
tax-return preparation by services such as H&R Block are shown below.

120 230 110 115 160


130 150 105 195 155
105 360 120 120 140
100 115 180 235 255

Table 29: Sample Data of Cost of Tax-Return Preparation by Services Presented in The
Wall-Street Journal (January 2, 2007)

36
a. Compute the mean, median, and mode.
b. Compute the first and third quartiles.
c. Compute and interpret the 90th percentile.

6. (Bowerman, Murphree, and O’Connell, 2014)


The Data and Story Library website (a website devoted to applications of statistics) gives
a histogram of the ages of a sample of 60 CEOs. We present the data in the form of a
frqeuency distribution below.

Age (Years) Frequency


28-32 1
33-37 3
38-42 3
43-47 13
48-52 14
53-57 12
58-62 9
63-67 1
68-72 3
73-77 1

Table 30: The Frequency Distribution of the Ages of a Sample of 60 CEOs

Calculate the (approximate) sample mean, variance, and standard deviation of these data.

7. (Triola, 2018) A larger sample of 50 sleep times (hours) has a mean of 6.3 hours and a
standard deviation of 1.4 hours. What is the z-score for a sleep time of 5 hours?
8. (Triola, 2018) The prediction errors (minutes) that are differences between actual erup-
tion times and predicted eruption times are listed as follows: 4, -7, 0, 1, -1, 1, -4, -7, 22,
7, -5, 1. Positive numbers correspond to eruptions that occurred later than predicted,
and negative numbers correspond to eruptions that occurred before they were predicted.
Find the a. mean ; b. median; c. mode; d. range; e. standard deviation; f. variance;
g. Q1 ; h. Q3 . Afterwards, construct a boxplot, and include the values of the 5-number
summary.
9. (Bhattacharyya and Johnson, 2010) Heating and combustion analyses were performed in
order to study the composition of moon rocks. Recorded here are the determinations of
hydrogen (H) and carbon (C) in parts per million (ppm) for 11 specimens.

Hydrogen 120 82 90 8 38 20 2.8 66 2.0 20 85


(ppm)
Carbon 105 110 99 22 50 50 7.3 74 7.7 45 51
(ppm)

Table 31: The Hydrogen and Carbon Determination in Parts per Million for 11
Specimens

37
Calculate r.

10. (Ang, 2005) For the last 15 years, there has been a steady decline in employment op-
portunities around the world, thus translating to an ever-increasing global rate of unem-
ployment. Provided hereafter is year 2003 and year 2004, employment-generation figures
(i.e., number of jobs created on a per annual basis) for 20 cities across the globe.

City Employment Generation (’000)


2003 2004
Copenhagen 75 54
Hong Kong 92 74
Johannesburg 84 57
Kuala Lumpur 64 39
London 64 46
Madrid 86 68
Manila 81 72
Melbourne 61 50
Milan 73 48
Montreal 93 75
Moscow 66 50
New York 64 52
Paris 77 55
Rio de Janeiro 80 61
Rome 81 54
Seoul 64 50
Singapore 90 75
Sydney 68 55
Tokyo 79 59
Vancouver 57 43

Table 32: Employment Generation among 20 International Cities in 2003 and 2004

a. Prepare a stem-and-leaf display for employment-generation in year 2003.


b. Prepare a stem-and-leaf display for employment-generation in year 2004.
c. Prepare a dot plot for employment-generation in year 2003.
d. Prepare a dot plot for employement-generation in year 2004.
e. Use the stem-and-leaf display from part b to determine the number of cities having
employment-generation figures greater than 80,000.

38
Chapter 3

Introduction to Probability

In Chapter 2, we have tackled the following topics:

• introduction to Descriptive Statistics

• definition of frequency distribution, relative frequency distribution, and per-


cent frequency distribution and the construction of frequency distributions using
classes/class intervals

• construction of a histogram using frequency tables

• data visualization forms such as a frequency polygon, an ogive, a frequency bar


chart, a Pareto chart, a dot plot, a stem-and-leaf display, and contingency tables

• describing central tendency through the mean, the median, and the mode

• differentiating parameter from a statistic

• definition of population mean, sample mean, and weighted mean

• definition of resistant measures

• understanding variations such as range, standard deviation, and variance

• applying the concept of variance in the empirical rule and Chebyshev’s Theorem

• understanding covariance and correlation coefficient

• determing the relative standing of each data point through z-scores, percentiles,
quantiles, interquartile range, and boxplots

Now, we will be understanding the concept of chances. Their existence justifies the system
that we live in as human beings as chaotic since every event is subject to possibilities. Hence,
we must speak of a degree of certainty; in other words, how sure is an event going to happen?
That is the nutshell of probabilities. Before we delve into this topic, let’s stroll down memory
lane. Remember this in your elementary and/or high school days?

39
3.1 Counting
Definition 3.1.1. Multiplication Counting Rule (Triola, 2018)
For a sequence of events in which the first event can occur n1 ways, the second event can
occur n2 ways, the third event can occur n3 ways, and so on, the total number of outcomes is
n1 · n2 · n3 ...

In other words, multiplication counting happens when a task is completed only through
multiple steps. Otherwise, the task will not be complete if only one step is considered for
counting.
For instance, consider a three-item true-or-false pop quiz. There are three questions to be
answered in order to complete the quiz. For the first question, there are two possible outcomes;
the second, also two; the third, also two. Using the multiplication counting rule, the total
number of experimental outcomes is (n1 )(n2 )(n3 ) = (2)(2)(2) = 8.

Definition 3.1.2. (Triola, 2018)

• The factorial symbol (!) denotes the product of decreasing positive whole numbers.
For example, 4! = 4 · 3 · 2 · 1 = 24.
• Factorial Rule: The number of different arrangements of n different items when all n
of them are selected is n!.

Proposition 3.1.1. (Triola, 2018)


0! = 1 (3.1)
5! 6!
Proof. Note that 5! = 5 · 4 · 3 · 2 · 1 = 5 · 4!. Therefore, 5 = 4! . Moreover, 6 = 5! . Furthermore,
7 = 7! n! 1! 1
6! . Hence, for any positive number n, n = (n−1)! . If n = 1, 1 = 0! = 0! . Multiplying the
1
denominator 0! to both sides of 1 = 0! yields 0! = 1.

Example 3.1.1. (Triola, 2018)

A statistics researcher must personally visit the presidents of the Gallup, Nielsen, Harris,
Pew, and Zogby polling companies. How many different travel itineraries are possible?
SOLUTION: For those 5 different presidents, the number of different travel itineraries is
5! = 5 · 4 · 3 · 2 · 1 = 120.

Note that this solution could have been done by applying the multiplication counting rule.
The first person can be any one of the 5 presidents, the second person can be any one of the 4
remaining presidents, and so on. the result is again 5 · 4 · 3 · 2 · 1 = 120. Use of the factorial
rule has the advantage of including the factorial symbol, which is sure to impress.

Now, we will revisit the difference between a permutation and a combination.


Definition 3.1.3. (Triola, 2018)

• Permutations of items are arrangements in which different sequences of the same items
are counted separately. (The letter arrangements of abc, acb, bac, bca, cab, and cba are
all counted separately as six different permutations.)

40
• Combinations of items are arrangements in which different sequences of the same items
are counted as being the same. (The letter arrangements of abc, acb, bac, bca, cab, and
cba are all considered to be same combination.)

I bet you remember these words to know the difference between the two. In permutations,
the order matters. In combinations, the order does not matter.

Definition 3.1.4. (Triola, 2018)

• Permutations Rule: When n different items are available and r of them are selected
without replacement, the number of different permutations (order counts) is given by

n n!
Pr = (3.2)
(n − r)!
• Combinations Rule: When n different items are available, but only r of them are
selected without replacement, the number of different combinations (order does not matter)
is found as follows:
 
n n n!
Cr = = (3.3)
r (n − r)!r!

Example 3.1.2. (Triola, 2018)

In a horse race, a trifecta bet is won by correctly selecting the horses that finish first and
second and third, and you must select them in the correct order. The 140th running of the
Kentucky Derby had a field of 19 horses. How many different trifecta bets are possible?
SOLUTION: There are n = 19 horses available, and we must select r = 3 of them without
replacement. The number of different sequences of arrangements is found as shown:
19 19!
P3 = (19−3)!
= 5814

Example 3.1.3. (Triola, 2018)

In California’s Fantasy 5 lottery game, winning the jackpot requires that you select 5 different
numbers 1 to 39, and the same 5 numbers must be drawn in the lottery. The winning numbers
can be drawn in any order, so order does not make a difference. How many different lottery
tickets are possible?
SOLUTION: There are n = 39 different numbers available, and we must select r = 5 of
them without replacement (because the selected numbers must be different). Because order does
not count, we need to find the number of different possible combinations. We get

C5 = 39 39! 39!
39

5
= (39−5)!5! = 34!·5! = 575,757

41
3.2 Sets
(Aczel and Sounderpandian, 2008)
Remember this lesson before you entered college? :D
To understand probability, some familiarity with sets and with operations involving sets is
useful.
Definition 3.2.1. A set is a collection of elements.

The elements of a set may be people, horses, desks, cars, files in a cabinet, or even numbers.
We may define our set as the collection of all horses in a given pasture, all people in a room,
all cars in a given parking lot a given time, all the numbers between 0 and 1, or all integers.
The number of elements in a set may be infinite.
A set may also have no elements.

Definition 3.2.2. The empty set is the set containing no elements. It is denoted by ∅.

We now define the universal set.

Definition 3.2.3. The universal set is the set containing everything in a given context. We
denote the universal set by S.

Given a set A, we may define its complement.


Definition 3.2.4. The complement of set A is the set containing all the elements in the
universal set S that are not members of set A. We denote the complement of A by A. The set
A is often called ”not A”.

A Venn diagram is a schematic drawing of sets that demonstrates the relationships be-
tween different sets. In a Venn diagram, sets are shown as circles, or other closed figures, within
a rectangle corresponding to the universal set, S. Figure 12 is a Venn diagram demonstrating
the relationship between a set A and its complement A.
As an example of a set and its complement, consider the following. Let the universal set S
be the set of all students at a given university. Define A as the set of all students who own a car
(at least one car). The complement of A, or A, is thus the set of all students at the university
who do not own a car.
Sets may be related in a number of ways. Consider two sets A and B within the context of
the same universal set S. (We say that A and B are subsets of the universal set S.) If A and
B have some elements in common, we say they intersect.

Definition 3.2.5. The intersection of A and B, denoted by A ∩ B, is the set containing all
elements that are members of both A and B.

When we want to consider all the elements of two sets A and B, we look at their union.

Definition 3.2.6. The union of A and B, denoted A ∪ B, is the set containing all elements
that are members of either A or B or both.

42
As you can see from these definitions, the union of two sets contains the intersection of the
two sets. Figure 12 presents Venn diagrams showing two sets A and B, their intersection
A ∩ B, and their union A ∪ B.
As an example of the union and intersection of sets, consider again the set of all students
at a university who own a car. This is set A. Now define set B as the set of all students at the
university who own a bicycle. The universal set S is, as before, the set of all students at the
university. A ∩ B is the intersection of A and B —it is the set of all students at the university
who own both a car and a bicycle. A ∪ B is the union of A and B —it is the set of all students
at the university who own either a car or a bicycle or both.
Two sets may have no intersection: they may be disjoint. In such a case, we say that
the intersection of the two sets is the empty set ∅. In symbols, when A and B are disjoint,
A ∩ B = ∅. As an example of two disjoint sets, consider the set of all students enrolled in
a business program at a particular university and all the students at the university who are
enrolled in an art program. (Assume no student is enrolled in both programs.) A Venn diagram
of two disjoint sets is shown in Figure 12.

Figure 12: Venn Diagrams of Set Operations between Sets A and B

3.3 Probability
(Aczel and Sounderpandian, 2008)
In probability theory, we make use of the idea of a set and of operations involving sets. We
will now provide some basic definitions of terms relevant to the computation of probability.

43
These are an experiment, a sample space, and an event.

Definition 3.3.1.

• An experiment is a process that leads to one of several possible outcomes.

• An outcome of an experiment is some observation or measurement.

Drawing a card out of a deck of 52 cards is an experiment. One outcome of the experiment
may be that the queen of diamonds is drawn.
A single outcome of an experiment is called a basic outcome or an elementary event. Any
particular card drawn from a deck is a basic outcome.

Definition 3.3.2. The sample space is the universal set S pertinent to a given experiment.
The sample space is the set of all possible outcomes of an experiment.

The sample space for the experiment of drawing a card out of a deck is the set of all cards
in the deck. The sample space for an experiment of reading the temperature is the set of all
numbers in the range of temperatures.

Definition 3.3.3. An event is a subset of a sample space. It is a set of basic outcomes. We


say that the event occurs if the experiment gives rise to a basic outcome belonging to the event.

For example, the event ”an ace is drawn out of a deck of cards” is the set of the four aces
within the sample space consisting of all 52 cards. This event occurs whenever one of the four
aces (the basic outcomes) is drawn.
The sample space for the experiment of drawing a card out of a deck of 52 cards is shown
in Figure 13. The figure also shows event A, the event that an ace is drawn.
In this context, for a given experiment, we have a sample space with equally likely basic
outcomes. When a card is drawn out of a well-shuffled deck, every one of the cards (the basic
outcomes) is as likely to occur as any other. In such situations, it seems reasonable to define
the probability of an event as the relative size of the event with respect to the size of the sample
space. Since a deck has 4 aces and 52 cards, the size of A is 4, and the size of the sample space
4
is 52. Therefore, the probability of A is equal to 52 .
The rule we use in computing probabilities, assuming equal likelihood of all basic outcomes,
is as follows:

Definition 3.3.4.

The probability of an event A, P(A), is given by:

n(A)
P (A) = (3.4)
n(S)
where

• n(A) = the number of elements in the set of the event A

44
• n(S) = the number of elements in the sample space S

n(A) 4
The probability of drawing an ace is P (A) = n(S) = 52 .

Figure 13: Sample Space for Drawing a Card

Example 3.3.1. (Johnson and Bhattacharyya, 2010)

A letter is chosen at random from the world ”TEAM”. What is the probability that it is a
vowel?
SOLUTION: Let V be the event that a vowel will be chosen from the word ”TEAM”. Let
S be the sample space containing all the letters that can be chosen from the word ”TEAM”.
Note that V = {E, A} and S = {T, E, A, M }. Hence, applying equation (3.4), we have

n(V )
P (V ) = n(S) = 42 = 12 = 50%.

.˙. There is 50% chance that a vowel will be chosen from the word ”TEAM”.

45
Note that an event A is a subset of S, the collection of all possible outcomes. This means
that A contains elements that are also in S. Therefore, S must have at least the same number
of elements than A has. Mathematically,

n(A)
n(A) ≤ n(S) ⇒ n(S) ≤ 1 ⇒ P (A) ≤ 1

Note that we can divide both sides of the inequality by n(S) since n(S) is always positive.
This means that the highest value any probability can be is 1, which implies that the
event A is the sample space S.
Moreover, the empty set ∅ is always a subset of any event A. Suppose the contradiction is
true. Then, there must be an element in the empty set that is not in A. However, the empty
set has no elements. Therefore, it is impossible that ∅ has an element that is not in A. Hence,
the empty set is a subset of any event A.
Following a similar logic as the previous inequality,
n(A)
n(∅) ≤ n(A) ⇒ 0 ≤ n(A) ⇒ 0 ≤ n(S) ⇒ 0 ≤ P (A)

This means that the lowest value any probability can be is 0, which implies that the
event A is impossible to occur given the sample space S. In summary,

Proposition 3.3.1. (Bowerman, Murphree, and O’Connell, 2014)

• The probability assigned to each sample space outcome must be between 0 and 1. That
is, if E represents a sample space outcome and if P (E) represents the probability of this
outcome, then 0 ≤ P (E) ≤ 1.

• The probabilities of all of the sample space outcomes must sum to 1.

The second bullet point is a consequence of the statement that the highest value of any
probability can be is 1. This is because the union of all sample space outcomes is S itself.
n(S)
Note that by equation (3.4), P (S) = n(S) = 1.

3.4 Elementary Probability Rules


(Bowerman, Murphree, and O’Connell, 2014)
We can often calculate probabilities by using formulas called probability rules. We will
begin by presenting the simplest probability rule: the rule of complements.
Figure 12 presents the Venn diagram depicting the complement A of an event A. In any
probability situation, either an event A or its complement A must occur. Therefore, we have

P (A) + P (A) = 1

This implies the following result:

46
Proposition 3.4.1. The Rule of Complements: Consider an event A. Then, the proba-
bility that A will not occur is

P (A) = 1 − P (A) (3.5)

Example 3.4.1. (Triola, 2018)

In a recent year, there were 3,000,000 skydiving jumps and 21 of them resulted in death.
Find the probability of not dying when making a skydiving jump.
SOLUTION: Among 3,000,000 jumps, there were 21 deaths, so it follows that the other
2,999,979 jumps were survived. We get

2,999,979
P (not dying when making a skydiving jump) = 3,000,000 = 0.999993

.˙. The probability of not dying when making a skydiving jump is 0.999993.

Let’s return to Figure 12. The Venn diagram for the union of A and B contains three
events —A, B, and A ∩ B. With this, we have the following general result:

Proposition 3.4.2. The Addition Rule: Let A and B be events. Then, the probability
that A or B (or both) will occur is

P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (3.6)

The reasoning behind this result is that when we observe the Venn diagram for the union
in Figure 12, computing P (A) + P (B) yields counting A ∩ B twice. We correct for this by
subtracting P (A ∩ B).
There is a possibility that A and B have no elements in common.

Definition 3.4.1. Two events A and B are mutually exclusive if they have no sample space
outcomes in common. In this case, the events A and B cannot occur simultaneously, and thus

P (A ∩ B) = 0 (3.7)

Example 3.4.2. Consider randomly selecting a card from a standard deck of 52 playing cards.
Let J be the event that the randomly selected card is a Jack; Q, Queen; R, a red card (that is,
a diamond or a heart).
Because there is no card that is both a jack and a queen, the events J and Q are mutually
exclusive. On the other hand, there are two cards that are both jacks and red cards —the jack
of diamonds and the jack of hearts —so the events J and R are not mutually exclusive.

As a consequence of equation (3.7), we now have the following result.

47
Proposition 3.4.3. Let A and B be mutually exclusive events. Then, the probability that
A or B will occur is

P (A ∪ B) = P (A) + P (B) (3.8)

Example 3.4.3. (Continuation of Example 3.4.2)

4
We have established that J and Q are mutually exclusive events. Since P (J) = 52 (the 52-
4
card deck has Jacks of Ace, Spades, Clubs, and Hearts) and P (Q) = 52 (the 52-card deck has
Queens of Ace, Spades, Clubs, and Hearts), applying equation (3.8), the probability of randomly
getting a Jack or a Queen is

4 4 8 2
P (J ∪ Q) = P (J) + P (Q) = 52 + 52 = 52 = 13

.˙. There is a 2-out-of-13 chance that either a Jack or a Queen will be picked from
a standard bicycle deck.

Fortunately, equation (3.8) can be extended to more than two mutually exclusive events
such as the event of getting a Jack, a Queen, or a King.

Proposition 3.4.4. The events A1 , A2 ,...AN are mutually exclusive if no two of the events
have any sample space outcomes in common. In this case, no two of the events can occur
simultaneously, and
N
X
P (A1 ∪ A2 ∪ ... ∪ AN ) = P (Ai ) (3.9)
i=1

Going back to the Jack-Queen-King event, note that there are four kings in a standard 52
4
deck. Hence, P (K) = 52 . Thus,

4
P (J ∪ Q ∪ K) = P (J) + P (Q) + P (K) = 3 · 52 = 12 3
52 = 13

This means that there is a 3-out-of-13 chance that either a Jack, a Queen, or a King will
be picked out of the deck.

3.5 Conditional Probability and Independence


Now, let’s take it up a notch. Consider a fair die being tossed, with A defined as the event ”6
appears”. Clearly, P (A) = 16 . Suppose that the die has already been tossed by someone who
refuses to tell us whether or not A occurred but does enlighten us to the extent of confirming
that the event B (”even number appears”) occurred. What are the chances of A now? Here,
common sense can help us: there are three equally likely even numbers making up the event B
—one of which satisfies the event A, so the ”updated” probability is 31 .
What just happened? The probability suddenly changed given that another event has
occurred. Therefore, the chances of an event occurring has been ”conditioned” because of a
certain information. That is the nutshell of conditional probabilities.

48
Definition 3.5.1. (Larsen and Marx, 2018)

Let A and B be any two events defined on S such that P (A) > 0. The conditional
probability of B, assuming that A has already occurred, is written as P (B|A) and is given by

P (B ∩ A)
P (B|A) = (3.10)
P (A)

Example 3.5.1. (Triola, 2018)

a. If 1 of the 555 test subjects is randomly selected, find the probability that the subject had
a positive test result, given that the subject actually uses drugs. This means that we have
to find P (positive test result|subject uses drugs).

b. If 1 of the 555 test subjects is randomly selected, find the probability that the subject
actually uses drugs, given that he or she had a positive test result. This means that we
have to find P (subject uses drugs|positive test result).

Positive Test Result Negative Test Result


(Test shows drug use.) (Test shows no drug use.)
Subject Uses Drugs 45 5
(True Positive) (False Negative)
Subject Does Not Use Drugs 25 480
(False Positive) (True Negative)
Table 33: Results from Drug Tests of Job Applicants

SOLUTION:

a. • Intuitive Approach: We want P (positive test result|subject uses drugs), the proba-
bility of getting someone with a positive test result, given that the selected subject uses
drugs. Here is the key point: if we assume that the selected subject actually uses
drugs, we are dealing only with the 50 subjects in the first row of Table 33. Among
those 50 subjects, 45 had positive test results, so we get this result:

45
P (positive test result|subject uses drugs) = 50 = 0.900
• Formal Approach: The same result can be found by using the formula for P (B|A)
given with the formal approach. We use the following notation.

P (B|A) = P (positive test result|subject uses drugs)

where B = positive test result and A = subject uses drugs.


In the following calculation, we use P (subject uses drugs and had a positive test
45 50
result) = 555 and P (subject uses drugs) = 555 to get the following results:

45
P (B∩A)
P (B|A) = P (A)
= 555
50 = 0.900
555

49
By comparing the intuitive approach to the formal approach, it should be clear that
the intuitive approach is much easier to use, and it is also less likely to result in er-
rors. The intuitive approach is based on an understanding of conditional probability,
instead of manipulation of a formula, and understanding is so much better.

b. Here, we want P (subject uses drugs|positive test result). This is the probability that the
selected subject uses drugs, given that the subject had a positive test result. If we assume
that the subject had a positive test result, we are dealing with the 70 subjects in the first
column of Table 33. Among those 70 subjects, 45 use drugs, so

45
P (subject uses drugs|positive test result) = 70 = 0.643

Again, the same result can be found by applying the formula for conditional probability,
but we will leave that for those with a special fondness for manipulations with formulas.

Let’s try to extend conditional probabilities. Suppose you have n events in consideration
—A1 , A2 , ..., An such that every outcome in the sample space S belongs to one and only one
of the A0i s —that is, the A0i s are mutually exclusive, and their union is S. Refer to Figure 14.

Figure 14: Partition of S with an event B

Let B, as pictured, denote any event defined on S. From Figure 14, one can deduce that

B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ ... ∪ (B ∩ An )

Note that from Figure 14, (B ∩ A1 ), (B ∩ A2 ), ..., (B ∩ An ) are also mutually exclusive.
Hence, by equation (3.9),
N
X
P (B) = P (B ∩ A1 ) + P (B ∩ A2 ) + ... + P (B ∩ An ) = P (B ∩ Ai ) (3.11)
i=1

Note that equation (3.10) can be manipulated as

P (B ∩ A) = P (B|A)P (A) (3.12)

Hence, for i ∈ {1, 2, ..., n},

P (B ∩ Ai ) = P (B|Ai )P (Ai ) (3.13)

50
Therefore, from equation (3.11),
n
X
P (B) = P (B|Ai )P (Ai ) (3.14)
i=1

Proposition 3.5.1. Let A1 , A2 , ..., An be a set of events defined over S such that they are
mutually exclusive, have positive probabilities, and their union is S. For any event B,
n
X
P (B) = P (B|Ai )P (Ai ) (3.15)
i=1

To visualize this proposition, the reader is recommended to use a tree diagram.

Example 3.5.2. (Larsen and Marx, 2018)

Urn I contains two red chips and four white chips; urn II, three red and one white. A chip
is drawn at random urn I and transferred to urn II. Then a chip is drawn from urn II. What
is the probability that the chip drawn from urn II is red?
SOLUTION: Let’s utilize a tree diagram as shown in Figure 15.

Figure 15: The Tree Diagram for the Two-Urn Problem

Let’s analyze the decision tree. The probability of getting a red chip from urn I is 26 since
there are two red chips out of 6. If I picked up a red chip from urn I and placed it in urn II,
the probability of getting a red chip from urn II will be 3+1
4+1
= 45 . Originally, urn II has three red
chips and one white chip. After the addition of one red chip from urn I, there will be a total of
five chips, four of them being red and one being white. Therefore, the probability of getting a
white chip from urn II will be 51 .
Moreover, the probability of getting a white chip from urn I is 46 since there are four white
chips out of 6. If I picked a white chip from urn I and placed it in urn II, the probability of
3
getting a red chip from urn II will be 4+1 = 35 . Originally, urn II has three red chips and one
white chip. After the addition of one white chip from urn I, there will be a total of five chips,
three of them being white and two being red. Therefore, the probability of getting a white chip
from urn II will be 25 .
Let B be the event that a red chip will be drawn from urn II. This event will happen depending
on two cases. The first one is that a red chip is drawn from urn II given a red chip is drawn

51
from urn I. The second one is that a red chip is drawn from urn II given a white chip is drawn
from urn I. In either case, the event B will depend on the chip I will draw from urn I. Let AR
be the event that a red chip will be drawn from urn I; AW , the event that a white chip will be
drawn from urn I.
Hence, the probability of getting a red chip from urn II given that a red chip was drawn from
urn I is given by P (B|AR ) = 45 . Moreover, the probability of getting a red chip from urn II
given that a white chip was drawn from urn I is given by P (B|AW ) = 35 .
Furthermore, the probability of getting a red chip from urn I is given by P (AR ) = 62 . Also,
the probability of getting a white chip from urn I is given by P (AW ) = 46 .
Therefore, applying equation (3.15),

P (B) = P (B|AR )P (AR ) + P (B|AW )P (AW ) = ( 45 )( 26 ) + ( 35 )( 46 ) = 23

.˙. There is a two-thirds chance of drawing a red chip from urn II.

What if instead of P (B|A), you are asked for P (A|B)? Don’t worry! If we do some few
changes in equation (3.10),

P (A ∩ B)
P (A|B) = (3.16)
P (B)

Note that P (A ∩ B) = P (B ∩ A). Hence, by substituting equation (3.12) to equation (3.16),

P (B|A)P (A)
P (A|B) = (3.17)
P (B)

What if you are asked P (Aj |B) for some event Aj ? Easy! We can do some few changes in
equation (3.17).

P (B|Aj )P (Aj )
P (Aj |B) = (3.18)
P (B)

Since we have established equation (3.15),

P (B|Aj )P (Aj )
P (Aj |B) = Pn (3.19)
i=1 P (B|Ai )P (Ai )

Proposition 3.5.2. (Larsen and Marx, 2018), Bayes’ Theorem

With the same initial conditions as those of Proposition 3.5.1 and P (B) > 0 for any
event B defined in S,

P (B|Aj )P (Aj )
P (Aj |B) = Pn (3.20)
i=1 P (B|Ai )P (Ai )

52
Example 3.5.3. Continuation of Example 3.5.2

If we continue the analysis from the two-urn problem, suppose you were asked to find the
probability of getting a white chip from urn I given that you drew a red chip from urn II.
Mathematically, we are to look for P (AW |B). Applying equation (3.20),

( 35 )( 46 )
P (AW |B) = 2 = 35
3

.˙. There is a three-fifths chance of drawing a white chip from urn I on the condition
that a red chip was drawn from urn II.

So far, we have been discussing conditional probabilities, the chances that a certain event
will happen given what has already occurred. However, there are events that will always occur
regardless whether something happened or not.

Definition 3.5.2. (Bowerman, Murphree, and O’Connell, 2014)

Two events A and B are independent if and only if

1. P (A|B) = P (A), or, equivalently,

2. P (B|A) = P (B)

Here, we assume the P (A) and P (B) are greater than 0.

In other words, for P (A|B) = P (A), the event A will happen whether or not B occurred.
For instance, in the Philippines, the sun will rise from the east regardless of my attendance in
school.
Note that by manipulating equation (3.16), we have P (A ∩ B) = P (A|B)P (B). If A and B
are independent, then P (A|B) = P (A). Thus,

Proposition 3.5.3. (Bowerman, Murphree, and O’Connell, 2014)

If A and B are independent events, then

P (A ∩ B) = P (A)P (B) (3.21)

Such proposition can be extended to more than two events.

Proposition 3.5.4. (Bowerman, Murphree, and O’Connell, 2014)

If A1 , A2 , ..., AN are independent events, then

P (A1 ∩ A2 ∩ ... ∩ AN ) = P (A1 )P (A2 )...P (AN ) (3.22)

53
Example 3.5.4. (Larsen and Marx, 2018)

Suppose that P (A ∩ B) = 0.2, P (A) = 0.6, and P (B) = 0.5. Are A and B independent?

SOLUTION: Let’s proceed with applying Proposition 3.5.3. However,

P (A ∩ B) = 0.2 6= 0.3 = (0.6)(0.5) = P (A)P (B)

.˙. A and B are not independent.

3.6 Applications of Conditional Probabilities: Introduc-


tion to Decision Theory
Now that you have seen a tree diagram in Figure 15, let’s get to the nitty-gritty of what it
does in making a decision.
Consider a situation close to home: your student life! Everyday, you always have two
choices; either you go to school or you cut your classes for the day. It is a constant struggle.
You would rather sleep at home for the rest of the day for studying 20 chapters last night for
a QMT exam. However, if you do cut your classes, you will be lost in the next meeting since
everyone else has gone ahead of you. On the other hand, if you go to school, you may not be
able to absorb all of the information your professor will be teaching you for the day. In fact,
he/she might be annoyed by you for sleeping in class. Nonetheless, you will be saving your
cuts. Who knows? You could save the cuts for the final weeks of the semester and use the time
to prepare for the exams. Great idea, right?
In summary, you are trying to answer three questions in order to make a good decision.

• What are my choices? (e.g., cut or not cut?)

• With my choice, what could possibly happen? (e.g., restful day, left behind, cannot
absorb, more cuts for future cramming)

• With my choice, what is the payoff? (e.g., rest always more important than studying)

In a nutshell, decision making requires alternatives, states of nature, and payoffs. This
is the foundation of Decision Theory. The tree diagrams used in Decision Theory is called a
decision tree.
Notice that the states of nature are dependent on your chosen alternative, and the payoffs
are calculated based on what could possibly happen. However, we have to stress ”possibly
happen” since we have no clear idea of what really is going to happen with a decision:
Hence, we have to consider three possibilities: a specific event or state of nature will occur
(certainty), we have no information at all of the likelihoods of states of nature (uncertainty),
or we can somehow gauge the likelihoods of states of natures (risk).

54
Example 3.6.1. (Bowerman, Murphree, and O’Connell, 2014)

Consider the following example that involves a capacity-planning problem in which a com-
pany must choose to build a small, medium, or large production facility. The payoff obtained
will depend on whether future demand is low, moderate, or high, and the payoffs are given in
the following table.

Possible Future Demand


Alternatives Low Moderate High
Small facility $10 $10 $10
Medium Facility 7 12 12
Large Facility -4 2 16
Table 34: Payoff Table of Future Demand Given Size of Facility1

To better visualize the problem, Figure 16 presents the decision tree of the given capacity-
planning problem. Note that the hollowed circle at the left side of Figure 16 denotes the point
of decision.

Figure 16: The Decision Tree for the Capacity-Planning Problem

a. Suppose with certainty that the demand will be low. As a rational decision maker,
the best alternative will be that with the highest payoff of $10 million, building a small
facility. If the demand will be moderate, building a medium facility will yield the
highest payoff of $12 million. If the demand will be high, then building a large facility
would be optimal with a gain of $16 million.

b. Suppose you have no information of the likelihood of the nature of state. Hence, you
could choose the alternative either with the maximum worst possible payoff (maximin
criterion) or with the maximum best possible payoff (maximax criterion).
1
The payoffs are in present value in $ millions

55
The worst possible payoff for building a small facility is $10; medium facility, $7; large fa-
cility, ($4). Hence, building all small facility would be the best option under the maximin
criterion.

Furthermore, the best possible payoff for building a small facility is $10; medium facility,
$12; large facility, $16. Hence, building a large facility would be the best option under the
maximax criterion.

You may have observed that the maximin criterion is a pessimistic approach while the
maximax is optimistic.

c. Suppose you have an idea of the risk of each state of nature. Now, the company assigns
prior probabilities of 0.3, 0.5, and 0.2 to low, moderate, and high demands, respectively.
This means that if I choose to build a large facility, then I am expecting 0.3 of ($4),
0.5 of $2, and 0.2 of $16 will happen. Thus, there is an expectation of (0.3)(−$4) +
(0.5)($2) + (0.2)($16) = $3. The same is true for building other facilities. Hence,

Small Facility: Expected Value = (0.3)($10) + (0.5)($10) + (0.2)($10) = $10

Medium Facility: Expected Value = (0.3)($7) + (0.5)($12) + (0.2)($12) = $10.5

Large Facility: Expected Value = (0.3)(−$4) + (0.5)($2) + (0.2)($16) = $3

As a rational decision maker, I would want to choose the alternative with the highest
expected payoff, which is to build a medium facility with payoff $10.5 million. This
approach uses what analysts call the expected monetary value criterion.

d. Suppose we can find out which state of nature will occur. Analysts say that we have ob-
tained perfect information. This is different from the certainty assumption since per-
fect information can help us ”find out” the state of nature that will occur while the criterion
assumes that we ”know” what is going to happen. For instance, to ”find out” if the de-
mand is low, moderate, or high, the company can hire a research and development team in
the market. Another would be to install a software in the company systems to gain access
to information about global markets. In either strategies, there is an involvement of a cost
to obtain perfect information. This is called the expected value of perfect informa-
tion(EVPI), which is the difference between the expected payoff under certainty and the
expected payoff under risk.

To calculate the expected payoff under certainty, that is the same as applying the ex-
pected monetary value criterion onto the certainty assumption in a. Using the prior proba-
bilities and the results in a, the expected payoff under certainty is (0.3)($10)+(0.5)($12)+
(0.2)($16) = $12.2. Given that the expected payoff under risk is the result of c, which
is $10.5 million, the EVPI is $12.2 million − $10.5 million = $1.7 million.

Since we assigned prior probabilities to states of nature and make a decision based from such
chances, we are doing prior decision analysis. Suppose we introduce some new information
that can potentially affect our final decision. When we use the expected monetary value criterion
to choose the best alternative based on expected values computed using prior probabilities, we
call this prior decision analysis. Often, however, sample information can be obtained to help

56
us make decisions. In such a case, we compute expected values by using posterior probabilities,
and we call the analysis posterior decision analysis.

Example 3.6.2. (Bowerman, Murphree, and O’Connell, 2014)

An oil company wishes to decide whether to drill for oil on a particular site. The company
has assigned prior probabilities 0.7, 0.2, and 0.1 to the states of nature S1 = no oil, S2 = some
oil, and S3 = much oil, respectively. Table 35 presents the payoff table of the given scenario.

State of Nature
Alternatives S1 = no oil S2 = some oil S3 = much oil
Drill ($700,000) $500,000 $2,000,000
Do not Drill $0 $0 $0
Table 35: Payoff Table of the Oil Company Case

To visualize the given case study, Figure 17 presents its decision tree.

Figure 17: The Decision Tree for a Prior Analysis of the Oil Company Case

Using the prior probabilities, the expected monetary value associated with drilling is

(0.7)(−$700, 000) + (0.2)($500, 000) + (0.21)($2, 000, 000) = −$190, 000

while the expected monetary value associated with not drilling is

57
(0.7)(−$0) + (0.2)($0) + (0.21)($0) = $0

Therefore, prior analysis tells us that the oil company should not drill. Hence, we cross out
the alternative of drilling.
Suppose the oil company has obtained more information about the drilling site by perform-
ing a seismic experiment with three possible readings - low, medium, and high. The resulting
probabilities are listed below:

P (high|none) = 0.04 P (high|some) = 0.02 P (high|much) = 0.96


P (medium|none) = 0.05 P (medium|some) = 0.94 P (medium|much) = 0.03
P (low|none) = 0.91 P (low|some) = 0.04 P (low|much) = 0.01

To better visualize the conditional probabilities, Figure 18 presents a tree diagram incor-
porating the new information.

Figure 18: A Tree Diagram of the Oil Company Case

Given such information, we will now revise the prior probabilities to posterior probabil-
ities (since these are likelihoods after new information) by using Bayes’ Theorem. Applying
equation (3.15),

P (high) = P (none)P (high|none) + P (some)P (high|some) + P (much)P (high|much)


= (0.7)(0.04) + (0.2)(0.02) + (0.1)(0.96) = 0.128

Applying Bayes’ Theorem defined by equation (3.20),

58
P (none)P (high|none)
P (none|high) = P (high) = (0.7)(0.4)
0.128 = 0.21875

P (some)P (high|some)
P (some|high) = P (high) = (0.2)(0.02)
0.128 = 0.03125

P (much)P (high|much)
P (much|high) = P (high) = (0.1)(0.96)
0.128 = 0.75

The same applies when the seismic experiment results in medium and high readings. The
resulting decision tree that combines Figure 17 and Figure 18 is shown in Figure 19.

Figure 19: The Decision Tree with Posterior Probabilities of the Oil Company Case

In this decision tree, Bowerman, Murphree, and O’Connell, use squares to indicate decision
points and circles for possible states of nature. Moreover, the authors use two slashes to indicate
the alternative should not be chosen.
We can now use the decision tree to determine the alternative (drill or do not drill) that
should be selected given that the seismic experiment has been performed and has resulted in

59
a particular outcome. First, suppose that the seismic experiment results in a high reading.
Looking at the branch of the decision tree corresponding to a high reading, the expected monetary
values associated with the ”drill” and ”do not drill” alternatives are

Drill: (0.21875)(−$700, 000) + (0.03125)($500, 000) + (0.75)($2, 000, 000) = $1, 362, 500

Do Not Drill: (0.21875)($0) + (0.03125)($0) + (0.75)($0) = $0

These expected monetary values are placed on the decision tree corresponding to the ”drill”
and ”do not drill” alternatives. They tell us that, if the seismic experiment results in
a high reading, then the company should drill and the expected payoff will be
$1,362,500. The double slash placed through the ”do not drill” branch (at the very bottom of
the decision tree) blocks off that branch and indicates that the company should drill if a high
reading is obtained.
Next, suppose that the seismic experiment results in a medium reading. Looking at the
branch corresponding to a medium reading, the expected monetary values are

Drill: (0.15487)(−$700, 000) + (0.83186)($500, 000) + (0.01327)($2, 000, 000) = $334, 061

Do Not Drill: (0.15487)($0) + (0.83186)($0) + (0.01327)($0) = $0

Therefore, if the seismic experiment results in a medium reading, the oil com-
pany should drill, and the expected payoff will be $334,061.
Finally, suppose that the seismic experiment results in a low reading. Looking at the branch
corresponding to a low reading, the expected monetary values are

Drill: (0.98607)(−$700, 000) + (0.01238)($500, 000) + (0.00155)($2, 000, 000) = −$680, 959

Do Not Drill: (0.98647)($0) + (0.01238)($0) + (0.00155)($0) = $0

Therefore, if the seismic experiment results in a low reading, the oil company
should not drill on the site.
We can summarize the results of our posterior analysis as follows:

Outcome of Probability of Decision Expected Payoff


Seismic Experiment Outcome
High 0.128 Drill $1,362,500
Medium 0.226 Drill $334,061
Low 0.646 Do Not Drill $0
Table 36: Results of the Posterior Analysis of the Oil Case Study

If we carry out the seismic experiment, we now know what action should be taken for each
possible outcome (low, medium, or high). However, there is a cost involved when we conduct the
seismic experiment. If, for instance, it costs $100,000 to perform the seismic experiment, we
need to investigate whether it is worth it to perform the experiment. This will depend on the ex-
pected worth of the information provided by the experiment. Naturally, we must decide whether

60
the experiment is worth it before our posterior analysis is actually done. Therefore, when we
assess the worth of the sample information, we say that we are performing a preposterior
analysis.
In order to assess the worth of the sample information, we compute the expected payoff
of sampling. To calculate this result, we find the expected papyoff and the probability of each
sample outcome (that is, at each possible outcome of the seismic experiment). Looking at Table
36, the expected payoff of sampling, which is denoted EPS, is

(0.646)($0) + (0.226)($334, 061) + (0.01327)($1, 362, 500) = $249, 898

To find the worth of the sample information, we compare the expected payoff of sampling
to the expected payoff of no sampling, which is denoted ENPS. The ENPS is the
expected payoff of the alternative that we would choose by using the expected
monetary value criterion with the prior probabilities. Recalling that we summarized
our prior analysis in the tree diagram of Figure 17, we found that (based on the prior probabil-
ities) we should choose not to drill and that the expected payoff of this action is $0. Therefore,
EP N S = $0.
We compare the EPS and the EPNS by computing the expected value of sample infor-
mation, which is denoted by EVSI and is defined to be the expected payoff of sampling
minus the expected payoff of no sampling. Therefore,

EVSI = EPS −EP N S = $249, 898 − $0 = $249, 898

The EVSI is the expected gain from conducting the seismic experiment, and the oil company
should pay no more than this amount to carry out the seismic experiment. If the experiment
costs $100,000, then it is worth the expense to conduct the experiment. Moreover, the differ-
ence between the EVSI and the cost of sampling is called the expected net gain of
sampling, which is denoted by ENGS. Here,

ENGS = EVSI −$100, 000 = $249, 898 − $100, 000 = $149, 898

As long as the ENGS is greater than $0, it is worthwhile to carry out the seismic experiment.
That is, the oil company should carry out the seismic experiment before it chooses
whether or not to drill. Then, as discussed earlier, our posterior analysis says
that if the experiment gives a medium or high reading, the oil company should
drill, and if the experiment gives a low reading, the oil company should not drill.

3.7 Problem Set 2 (Due the Next Session)


1. (Ang, 2011) Ten weightlifters are competing in a team weight-lifting contest. Of the
lifters, 3 are from the United States, 4 are from Russia, 2 are from China, and 1 is from
Canada. If the scoring takes account of the countries that the lifters represent but not
their individual identities, how many different outcomes are possible from the point of
view of scores? How many different outcomes correspond to results in which the U.S. has
1 competitor in the top 3 and 2 in the bottom 3?

61
2. (Triola, 2018) A common computer programming rule is that names of variables must be
between one and eight characters long. The first character can be any of the 26 letters,
while successive characters can be any of the 26 letters of any of the 10 digits. For
example, allowable variable names include A, BBB, and M3477K. How many different
variable names are possible? (Ignore the difference between uppercase and lowercase
letters.)

3. (Larsen and Marx, 2018) A telemarketer is planning to set up a phone bank to bilk
widows with a Ponzi scheme. His past experience (prior to his most recent incarceration)
suggests that each phone will be in use half the time. For a given phone at a given time,
let 0 indicate that the phone is available and let 1 indicate that a caller is on the line.
Suppose that the telemarketer’s ”bank” is comprised of four telephones.

a. Write out the outcomes in the sample space.


b. What outcomes would make up the event that exactly two phones are being used?
c. Suppose the telemarketer had k phones. How many outcomes would allow for the
possibility that at most one more call could be received? (Hint: How many lines
would have to be busy?)

4. (Berenson, Krehbiel, and Levine, 2012) A sample of 500 respondents in a large metropoli-
tan area was selected to study consumer behavior. Among the questions asked was ”Do
you enjoy shopping for clothing?” Of 240 males, 136 answered yes. Of 260 females, 224
answered yes. Construct a contingency table to evaluate the probabilities. What is the
probability that a respondent chosen at random

a. enjoys shopping for clothing?


b. is a female and enjoys shopping for clothing?
c. is a female or enjoys shopping for clothing?
d. is a male or a female?

5. (Aczel and Sounderpandian, 2008) One of the greatest problems in marketing research
and other survey fields is the problem of nonresponse to surveys. In home interviews, the
problem arises when the respondent is not home at the time of the visit or, sometimes,
simply refuses to answer questions. A market researcher believes that a respondent will
answer all questions with probability 0.94 if found at home. He further believes that the
probability that a given person will be found at home is 0.65. Given this information,
what percentage of the interviews will be successfully completed?

6. (Bowerman, Murphree, and O’Connell, 2014) A product is assembled using 10 different


components, each of which must meet specifications for five different quality characteris-
tics. Suppose that there is a 0.9973 probability that each individual specification will be
met. Assuming that all 50 specifications are met independently, find the probability that
the product meets all 50 specifications.

7. (Larsen and Marx, 2018) Given that P (A) + P (B) = 0.9, P (A|B) = 0.5, and P (B|A) =
0.4, find P (A).

62
8. (Larsen and Marx, 2018) Suppose that two cards are drawn simultaneously from a stan-
dard fifty-two-card poker deck. Let A be the event that both are either a jack, queen,
king, or ace of hearts, and let B be the event that both are aces. Are A and B inde-
pendent? (Note: There are 1326 equally likely ways to draw two cards from a poker
deck.)

9. Refer to Figure 18 in Example 3.6.2. Verify that the probability of getting a medium
reading is 0.226, and the probability of getting a low reading is 0.646. Given these values,
show that the chances of having much oil given that the reading is medium is 0.01327,
and the probability of no oil given that the reading is low is 0.98607.

10. (Bowerman, Murphree, and O’Connell, 2014) A firm designs and manufactures automatic
electronic control devices that are installed at customers’ plant sites. The control devices
are shipped by truck to customers’ sites; while in transit, the devices sometimes get
out of alignment. More specifically, a device has a prior probability of 0.10 of getting
out of alignment during shipment. When a control device is delivered to the customer’s
plant site, the customer can install the device. If the customer installs the device, and
if the device is in alignment, the manufacturer of the control device will realize a profit
of $15,000. If the customer installs the device, and if the device is out of alignment,
the manufacturer must dismantle, realign, and reinstall the device for the customer. This
procedure costs $3,000, and therefore the manufacturer will realize a profit of $12,000. As
an alternative to customer installation, the manufacturer can send two engineers to the
customer’s plant site to check the alignment of the control device, to realign the device
if necessary before installation, and to supervise the installation. Because it is les costly
to realign the device before it is installed, sending the engineers costs $500. Therefore, if
the engineers are sent to assist with the installation, the manufacturer realizes a profit of
$14,500 (this is true whether or not the engineers must realign the device at the site).

Before a control device is installed, a piece of test equipment can be used by the customer
to check the device’s alignment. The test equipment has two readings, ”in” or ”out” of
alignment. Given that the control device is in alignment, there is a 0.8 probability that
the test equipment will read ”in”. Given that the control device is out of alignment, there
is a 0.9 probability that the test equipment will read ”out”.

a. Identify and list each of the following for the control device situation:
• The firm’s alternative actions,
• The states of nature.
• The possible results of sampling (that is, of information gathering).
b. Write out the payoff table for the control device situation.
c. Construct a decision tree for a prior analysis of the control device situation. Then,
determine whether the engineers should be sent, assuming that the piece of test
equipment is not employed to check the device’s alignment. Also, find the expected
monetary value associated with the best alternative action.
d. Set up probability revision tables to
• Find the probability that the test equipment ”reads in”, and find the posterior
probabilities of in alignment and out of alignment given that the test requipment
”reads in”.

63
• Find the probability that the test equipment ”reads out”, and find the posterior
probabilities of in alignment and out of alignment given that the test equipment
”reads out”.
e. Construct a decision tree for a posterior and preposterior analysis of the control
device situation.
f. Carry out a posterior analysis of the control device problem. That is, decide whether
the engineers should be sent, and find the expected monetary value associated with
either sending or not sending (depending on which is best) the engineers assuming:
• The test equipment ”reads in”.
• The test equipment ”reads out”.
g. Carry out a preposterior analysis of the control device problem by finding:
• The expected monetary value associated with using the test equipment; that is,
find the EPS.
• The expected monetary value associated with not using the test equipment; that
is, find the EPNS.
• The expected value of sample information, EVSI.
• The maximum amount that should be paid for using the test equipment.

64
Chapter 4

Random Variables

In the third chapter, we have tackled the following topics:


• counting principles and methods such as permutations and combinations
• understanding sets and their relationships
• measuring the likelihood of events through the concept of probability
• extending the rules of probability in conditional probabilities, Bayes’ Theorem, and
independence
• applying conditional probabilities in decision theory
Congratulations for accomplishing your first long test in QMT! I hope that you are still
motivated to study the course (even though it’s a lot of math huhu). I just want to say I am
proud of you for going through the process. :) Take your time learning, okay? :)

4.1 Introduction to Random Variables and Probability


Distribution Functions
Consider a random experiment on the outcomes of three coins after being tossed one at a time.
Using the counting techniques we have learned in the third chapter, there are eight outcomes.
Suppose we want to count the number of heads of each outcome. The results are listed in the
table below.

Outcome Number of Heads


TTT 0
HTT 1
THT 1
TTH 1
HHT 2
HTH 2
THH 2
HHH 3
Table 36: The Results of Counting the Number of Heads of Outcomes after Tossing Three
Coins One at a Time

65
If we analyze the results, it seems that for each possible outcome, there is a corresponding
number of heads. In other words, there is a mapping or a pairing between an outcome of the
experiment and the number of heads. Sounds like a function, right? This function is actually
called a random variable.

Definition 4.1.1. (Perez, 2019)


A random variable is a function that assigns a real number to each outcome in the sample
space of a random experiment.

Random variables are usually represented by a capital letter. If we continue from the
experiment and let X be the random variable defined as the number of heads and Y as the
number of tails, we have the following results.

Outcome X Y
TTT 0 3
HTT 1 2
THT 1 2
TTH 1 2
HHT 2 1
HTH 2 1
THH 2 1
HHH 3 0
Table 37: The Results of Counting the Number of Heads and Tails of Outcomes after Tossing
Three Coins One at a Time (with Random Variables)

We can extend the analysis of the experiment by mapping events to their likelihood of
occurrence. For instance, we are asked to find the probability of having two heads in the
random experiment. Only three events out of eight satisfy what we are looking for — {HHT,
HTH, THH}. Hence, the probability of having two heads after tossing three coins is 83 . The
results are summarized in the table below.

Number of Heads (X = x) Probability of Event P (X = x)


1
0 8

3
1 8

3
2 8

1
3 8

Table 38: The Mapping of Events from the Random Experiment to their Probabilities of
Occurrence

Notice that this is also a function between events and their corresponding probabilities.
This is called a probability distribution function.

66
Definition 4.1.2. (Perez, 2019)

A function that provides (or is used to calculate) probabilities for the possible values in the
range of random variable is called a probability distribution function.

In the random experiment, the range (different from that we have learned in Chapter 2)
of X are 0, 1, 2, and 3. Hence, the probability distribution PX (x) function of X, where x is a
possible value in the range of X, is given by
1
 ,x = 0
8





 3

 ,x = 1
8


PX (x) = 3 , x = 2 (4.1)
8



1




 ,x = 3
8



0 otherwise

Notice that from the PDF above, one can ”count” the range of X. We can say that there are
four possible events — when there are no heads, one head, two heads, or three heads. However,
suppose we have gY (y) defined as
1
 ,0 ≤ y < 1
8





 3

 ,1 ≤ y < 2
8


gY (y) = 3 , 2 ≤ y < 3 (4.2)
8



1


,3 ≤ y






 8
0 otherwise

The range of Y cannot be ”counted” since there are a lot of numbers that are greater than
or equal to 0 but less than 1, for example. Random variables with a function like that of
equation (4.1) are called discrete while those with a PDF like that of equation (4.2) are called
continuous.

Definition 4.1.3. (Perez, 2019)


Discrete random variables are random variables with countably finite or countably infi-
nite ranges. Let X be a discrete random variable with a probability mass function (PMF)
PX (x) = P (X = x).

• PX (x) ≥ 0 for any value of x


P
• PX (x) = 1

Continuous random variables are random variables with an interval of continuous real
numbers, either finite (e.g., [0,2]) or infinite (e.g., [2, ∞]). Let Y be a continuous random
variable with a probability density function (PDF) fY (y).

67
• fY (y) ≥ 0 for any value of y
Rb
• P (a ≤ Y ≤ b) = a gY (y)dy
R∞
• The total area under the curve of gY (y) should be equal to 1; −∞
gY (y)dy = 1

The definitions of each random variable satisfy the axioms of probability that we have
learned in the previous chapter. However, don’t panic on the integral. HAHHA!

Example 4.1.1. (Weiss, 2017)

When two balanced die are rolled, 36 equally likely outcomes are possible. Let Y denote the
sum of the dice. Find the probability distribution of Y.
SOLUTION: Intuitively, the least sum is 2, and the greatest is 12. The results of the
experiment are in Table 39. Note that the outcomes are in coordinate form (x, y) where x is
the face of the first die; y, the face of the second die.

Y Outcomes Number of Outcomes


2 (1,1) 1
3 (2,1), (1,2) 2
4 (1,3), (3,1), (2,2) 3
5 (1,4), (4,1), (2,3), (3,2) 4
6 (1,5), (5,1), (2,4), (4,2), (3,3) 5
7 (1,6), (6,1), (2,5), (5,2), (3,4), (4,3) 6
8 (2,6), (6,2), (3,5), (5,3), (4,4) 5
9 (3,6), (6,3), (4,5), (5,4) 4
10 (4,6), (6,4), (5,5) 3
11 (5,6), (6,5) 2
12 (6,6) 1
TOTAL 36
Table 39: The Sums of the Faces of Two Die

From Table 39, we can already find the probability distribution PY (y) of Y .
1


 , y = 2, 12


 36
2

, y = 3, 11


36





 3

 , y = 4, 10
 36


PY (y) = 4 (4.3)
, y = 5, 9
36



5




 , y = 6, 8


 36
6


,y = 7





 36

0 otherwise

Note that the sum of each probability is 1.

68
Example 4.1.2. (Larsen and Marx, 2018)

2
For the random variable Y with PDF fY (y) = 3
+ 23 y, 0 ≤ y ≤ 1, find P ( 34 ≤ Y ≤ 1).
SOLUTION: By a property of a PDF of a continuous random variable Y ,
R1 R1 2 2
3 fY (y)dy = 3( + 3 y)dy
4
3 4

1
y2
= ( 23 y + 23 · )
2 3
4

(1)2 ( 34 )2
= [ 32 (1) + 23 · 2
] − [ 23 ( 34 ) + 23 · 2
]

= [1] − [0.6875]

= 0.3125

.˙. P ( 34 ≤ Y ≤ 1) = 0.3125.

4.2 Improper Integrals and Binomial Theorem (Optional)


Definition 4.2.1. (Stewart, 2015)
Rt
a. If a f (x)dx exists for every number t ≥ a, then
Z ∞ Z t
f (x)dx = lim f (x)dx (4.4)
a t→∞ a

provided this limit exists (as a finite number).


Rb
b. If t f (x)dx exists for every number t ≤ b, then
Z b Z b
f (x)dx = lim f (x)dx (4.5)
−∞ t→−∞ t

provided this limit exists (as a finite number).


Rt Rb
c. If both a
f (x)dx and t
f (x)dx exist, then we define
Z ∞ Z a Z ∞
f (x)dx = f (x)dx + f (x)dx (4.6)
−∞ −∞ a

In part c, any real number a can be used.

Definition 4.2.2. (Binomial Theorem)

If n is any real number, then

n        
n
X n k n−k n n n−1 n 2 n−2 n
(x + y) = x y =y + xy + xy + ... + xn−1 y + xn (4.7)
k=0
k 1 2 n − 1

69
4.3 Cumulative Distribution Function
One application of the probability distribution function is the cumulative distribution func-
tion.
Definition 4.3.1. (Chan Shio, 2015)

If X is a P
random variable, then the function F defined on all real numbers
Rx by FX (x) =
P (X ≤ x) = xi=−∞ P (X = i) (if X is discrete) or FX (x) = P (X ≤ x) = −∞ fX (u)du (if X
is continuous) is called the cumulative distribution function (CDF) of X.
Example 4.3.1. (Chan Shio, 2015)

Consider the experiment of flipping a fair coin twice, and let X be the number of tails that
come up. Calculate FX (x).
SOLUTION: The results of the random experiment are given in Table 40.

Outcome X = x P (X = x)
1
TT 2 4

1
HT 1 4

1
TH 1 4

1
HH 0 4
TOTAL 4 1
Table 40: The Results of Two-Coin Experiment

Hence,


0 ,x < 0


 1
,0 ≤ x < 1


4

FX (x) = P (X ≤ x) = 1 1 1 3
 + + = ,1 ≤ x < 2
4 4 4 4



1 1 1 1



 + + + =1
 ,x ≥ 2
4 4 4 4


 0 ,x < 0
1


,0 ≤ x < 1



.˙. FX (x) = 4 .
3
, 1 ≤ x < 2





 4
1 ,x ≥ 2

Example 4.3.2. (Larsen and Marx, 2018)

If Y is an exponential random variable, fY (y) = λe−λy , y ≥ 0, find FY (y).


SOLUTION: Note that if y < 0, fY (y) = 0. Hence, FY (y) = 0, y ≥ 0. We proceed with
y ≥ 0.

70
Ry
FY (y) = −∞
fY (u)du
Ry
= 0
λe−λu du
y
e−λu
= (λ · )
−λ
0
y
= −e−λu 0

= [−e−λy + e−λ(0) ]

= 1 − e−λy
(
1 − e−λy ,y ≥ 0
.˙. FY (y) = .
0 ,y < 0

Notice that for any random variable X and real numbers a and b, P (a < X ≤ b) = P (X ≤
b) − P (X ≤ a) = FX (b) − FX (a).

4.4 Expected Value


Since the outcomes are random, what to be expected in the experiment is also being asked.
For instance, given the results in Table 40, what is the expected number of tails?
Let’s test each case. There is a 14 probability that there are two heads after flipping a fair
coin twice. That is equivalent to saying that we expect 14 of 2 or 41 · 2 = 12 tails will happen.
Furthermore, we expect 12 · 1 = 12 tails will happen for the case of one tail. Lastly, we expect
1
2
· 0 = 0 tails will happen for the case of no tails. In total, we expect that there will be
1
2
+ 12 + 0 = 1 tail in the experiment. This is the motivation for the definition of an expected
value.

Definition 4.4.1. (Perez, 2019)

The expected value E(X) is the weighted average value. For a discrete random variable
X,
X
E(X) = xPX (x) (4.8)
x

For a continuous random variable X,


Z ∞
E(X) = xfX (x)dx (4.9)
−∞

Example 4.4.1. (Bowerman, Murphree, and O’Connell, 2014)

The following table summarizes investment outcomes and corresponding probabilities for a
particular oil well. Let X be the random variable associated with the investment outcomes.

71
x = the outcome in $ P(X = x)
-$40,000 (no oil) 0.25
$10,000 (some oil) 0.70
$70,000 (much oil) 0.05
Table 41: The Results of The Investments on an Oil Well
Find the expected monetary outcome.
P P
SOLUTION: E(X) = x xfX (x) = x xP (X = x) = (−$40, 000)(0.25)+($10, 000)(0.70)+
($70, 000)(0.05) = $500.
.˙. One can expect that investments will have an average of $500.

Example 4.4.2. (Larsen and Marx, 2018)

1
Let the random variable Y have the uniform distribution over [a, b]; that is, fY (y) = b−a
for
a ≤ y ≤ b. Find E(Y ).
b
y2
Rb 1 1
Rb 1 1 2 2
SOLUTION: E(Y ) = a y · b−a dy = b−a a ydy = b−a · 2 = b−a ( b2 − a2 )
a
2 2
Hence, E(Y ) = 1
b−a
( b −a
2
) = b+a
2
b+a
.˙. E(Y ) = 2
.

We can further the expectation to a function of a random variable.


Definition 4.4.2. (Perez, 2019)

The expected value of a function h(X) for a discrete random variable X with PMF
PX (x) is given by
X
E[h(X)] = h(x)PX (x) (4.10)
x

The expected value of a function h(X) for a continuous random variable X with PDF
fX (x) is given by
Z ∞
E[h(X)] = h(x)fX (x)dx (4.11)
−∞

Example 4.4.3. (Larsen and Marx, 2018)

Suppose that X is a random variable whose PMF is nonzero only for the three values -2, 1,
and 2:
k P (X = k)
5
-2 8

1
1 8

2
2 8
TOTAL 1

72
Table 42: The Probability Distribution Function of X

Find E(X 2 ).
SOLUTION: E(X 2 ) = x2 P (X = x) = (−2)2 ( 58 ) + (1)2 ( 18 ) + (2)2 ( 82 )
P
x

Hence, E(X 2 ) = 4( 58 ) + 1( 18 ) + 4( 82 ) = 20
8
+ 81 + 8
8
= 29
8
.
29
.˙. E(X 2 ) = 8
.

Example 4.4.4. (Larsen and Marx, 2018)

If Y has probability density function

fY (y) = 2y, 0 ≤ y ≤ 1, (4.12)


find E[(Y − 32 )2 ].
R1 R1
SOLUTION: E[(Y − 32 )2 ] = 0
(y − 23 )2 (2y)dy = 0
(y 2 − 34 y + 49 )(2y)dy
Hence,
1
y4 y3 y2
2 2
R1 3 8 2 8 8 8
E[(Y − 3
)] = 0
(2y − 3
y + 9
y)dy = (2 · 4
− · 3 3
+ ·9
)
2
0
2 2 4 3 2 4 03 02
E[(Y − 3
= (2 · 14 − 83 · 13 + 89 · 12 ) − (2 · 04
)] − · 8
3 3
+ ·8
9 2
)
2 2
E[(Y − 3
= 24 − 89 + 18
)] 8
− 0 = 18−32+16
36
= 181

.˙. E[(Y − 23 )2 ] = 18
1
.

Note that for all P


real numbers a and b,
PE(aX + b) = aE(X) + b.P If X is discrete
Pwith PMF
PX (x), E(aX +b) = x (ax+b)PX (x) = x [axPX (x)+bPX (x)] = a x xPX (x)+b x PX (x) =
aE(X) + b. This is due to the definition of an expected value and a property of the PMFs of
discrete random variables. The same rule applies to continuous random variables. This means
that expected values are linear.

4.5 Variance
One application of the expected value is the variance.
Definition 4.5.1. (Perez, 2019)
The variance is the expected square deviation of X from its expected value. For discrete
random variables with PMF PX (x) and E(X) = µ, we have:
X
V (X) = E[(X − µ)2 ] = (x − µ)2 fX (x) (4.13)
x

If X is continuous with PDF fX (x) and E(X) = µ,


Z ∞
2
V (X) = E[(X − µ) ] = (x − µ)2 fX (x)dx (4.14)
−∞

Note that if V (X) = E[(X − µ)2 ], and expected values are linear,
V (X) = E[X 2 − 2µX + µ2 ] = E(X 2 ) − 2µE(X) + µ2 = E(X 2 ) − 2µ2 + µ2 = E(X 2 ) − µ2

73
Proposition 4.5.1. (Perez, 2019)

Let X be any random variable.

V (X) = E(X 2 ) − [E(X)]2 (4.15)


Example 4.5.1. (Larsen and Marx, 2018)

A certain hospitalization policy pays a cash benefit for up to five days in the hospital. It pays
$250 per day for the first three days and $150 per day for the next two. The number of days of
1
hospitalization, X, is a discrete random variable with probability function P (X = k) = 15 (6−k),
for k = 1, 2, 3, 4, 5. Find V ar(X).
SOLUTION: To apply equation (4.15), we need E(X 2 ) and E(X).
1 1 1
P
E(X) = k kP (X = k) = (1) · 15 (6 − 1) + (2) · 15 (6 − 2) + ... + (5) · 15 (6 − 5)
Hence,
(1)(5)+(2)(4)+...+(5)(1)
E(X) = 15
= 37 .
Moreover,
1 1 1
E(X 2 ) = k2 kP (X = k) = (1)2 · − 1) + (2)2 · − 2) + ... + (5)2 ·
P
15
(6 15
(6 15
(6 − 5)
Thus,
(1)(5)+(4)(4)+...+(25)(1)
E(X 2 ) = 15
=7
Using equation (4.15), V (X) = 7 − ( 73 )2 = 14
9
14
.˙. V (X) = 9
.

Example 4.5.2. Let the random variable Y have the uniform distribution over [a, b]; that is,
1
fY (y) = b−a , a ≤ y ≤ b. Find V ar(Y ).
SOLUTION: We know E(Y ) = b+a 2
from Example 4.4.2. Now, we find E(X 2 ).
b
y3
Rb Rb 2 3 3 3 3
1
E(Y 2 ) = a y 2 · b−a 1
dy = b−a a
y dy = 1
b−a
· 3
1
= b−a ( b3 − a3 ) = b−a
1
( b −a
3
)
a
Therefore, factoring the difference of two cubes,
3 3 1 (b−a)(b2 +ab+a2 ) b2 +ab+a2
E(Y 2 ) = 1
b−a
( b −a
3
) = b−a
[ 3
] = 3

Using equation (4.15),


b2 +ab+a2 b2 +ab+a2 b2 +2ab+a2 4(b2 +ab+a2 )−3(b2 +2ab+a2 )
V ar(Y ) = 3
− ( b+a
2
)2 = 3
− 4
= 12

Thus,
b2 −2ab+a2 (b−a)2
V ar(Y ) = 12
= 12
(b−a)2
.˙. V (Y ) = 12
.

Note that the standard deviation of any random variable X is the square root of the
variance similar to what we have discussed in Chapter 2.
Moreover, if we let X be any random variable and a and b be any real numbers, then
V ar(aX + b) = a2 V ar(X). By the linearity of expected values, E[(aX + b)2 ] = E[a2 X 2 +

74
2abX + b2 ] = a2 E(X 2 ) + 2abE(X) + b2 and E(aX + b) = aE(X) + b. By equation (4.15),
V ar(aX + b) = E[(aX + b)2 ] − [E(aX + b)]2 = a2 E(X 2 ) + 2abE(X) + b2 − [aE(X) + b]2 =
a2 E(X 2 ) + 2abE(X) + b2 − a2 [E(X)]2 − 2abE(X) − b2 = a2 E(X 2 ) − a2 [E(X)]2 = a2 [E(X 2 ) −
E(X)] = a2 V ar(X).
Moreover, for any two independent random variables X and Y , V ar(aX + bY ) =
a V ar(X) + b2 V ar(Y ). If we remove the independence assumption, then there must be an
2

extra term which is the variation of one random variable with respect to the other.

4.6 Problem Set 3 (Due the Next Session)


1. (Larsen and Marx, 2018) An urn contains five balls numbered 1 to 5. Two balls are
drawn simultaneously.

a. Let X be the larger of the two numbers drawn. Find PX (x).


b. Let V be the sum of the two numbers drawn. Find PV (v)

2. (Anderson, Sweeney, and Williams, 2011) A psychologist determined that the number
of sessions required to obtain the trust of a new patient is either 1, 2, or 3. Let X be
a random variable indicating the number of sessions required to gain the patient’s trust.
The following probability function has been proposed.

x
PX (x) = , for x = 1, 2, or 3 (4.16)
6
a. Is this probability function valid? Explain.
b. What is the probability that it takes exactly 2 sessions to gain the patient’s trust?
c. What is the probability that it takes at least 2 sessions to gain the patient’s trust?

3. (Bisenio, 2019) Let X be a Bernoulli random variable with PMF


(
p ,x = 1
PX (x) = (4.17)
1 − p , x = 0.

a. Find the CDF PX (x) of X.


b. Show that E(X) = p.
c. Show that V ar(X) = p(1 − p).

4. (Larsen and Marx, 2018) For persons infected with a certain form of malaria the length
of time spent in remission is described by the continuous pdf

1
fY (y) = y 2 , 0 ≤ y ≤ 3, (4.18)
9
where Y is measured in years. What is the probability that a malaria patient’s remission
lasts longer than one year?

5. (Weiss, 2017) Suppose T and Z are random variables.

75
a. If P (T > 0.3) = 0.1 and P (T < −3.0) = 0.05, determine P (−3.0 ≤ T ≤ 0.3).
b. Suppose that P (−0.8 ≤ Z ≤ 0.8) = 0.90 and also suppose that P (Z > 0.8) =
P (Z < −0.8). Find P (Z > 0.8).

6. (Chan Shio, 2015) Let Y have the PDF

fY (y) = 3(1 − y)2 , 0 < y < 1. (4.19)

Find the variance of W , where W = 12 − 5Y .

7. (Triola, 2018) In the Ohio Pick 4 lottery, you can bet $1 by selecting four digits, each
between 0 and 9 inclusive. If the same four numbers are drawn in the same order, you
win and collect $5,000.

a. How many different selections are possible?


b. What is the probability of winning?
c. If you win, what is your net profit?
d. Find the expected value for a $1 bet.
e. If you bet $1 on the pass line in the casino dice game of craps, the expected value is
-1.4¢. Which bet is better in the sense of producing a higher expected value: a $1
bet in the Ohio Pick 4 lottery or a $1 bet on the pass line in craps?

8. (Triola, 2018) The probability of getting the first success on the k th trial is given by
fX (k) = p(1 − p)k−1 , where X is a discrete random variable, and p is the probability
of success on any one trial. Subjects are randomly selected for the National Health and
Nutrition Examination Survey conducted by the National Center for Health Statistics,
Centers for Disease Control and Prevention. The probability that someone is a universal
donor (with group O and type Rh negative blood) is 0.06. Find the probability that the
first subject to be a universal blood donor is the fifth person selected.

9. (Bisenio, 2018) Let X represent the number of customers arriving in Coco from 3:00PM to
4:00PM, with an average of λ. X follows a Poisson distribution, where the probability
of having n customers in the same time frame is given by:

λk
P (X = k) = (e−λ ) , k = 0, 1, 2, 3, 4, ... (4.20)
k!
P∞ −λ λk
a. Assume that k=0 (e ) k! = 1. Show that E(X) = λ.
P∞ −λ λk
b. Assume that k=0 (e ) k! = 1. Show that V ar(X) = λ.

10. (Berenson, Krehbiel, and Levine, 2012) The manager of the commercial mortgage de-
partment of a large bank has collected data during the past two years concerning the
number of commercial mortgages approved per week. The results from these two years
(104 weeks) indicated the following:

76
Number of Commercial Frequency
Mortgages Approved
0 13
1 25
2 32
3 17
4 9
5 6
6 1
7 1

a. Compute the expected number of mortgages approved per week.


b. Compute the standard deviation.

77
Chapter 5

Special Distributions

In the fourth chapter, we have tackled the following topics:

• understanding random variables and probability distribution functions

• distinguishing discrete from continuous random variables

• extending probability distribution functions to cumulative distribution functions

• finding the expected values and variances of random variables

In this chapter, we will be extending what we have learned in the previous chapter to
different special distributions. In other words, there is nothing much new. Heheh!

5.1 The Bernoulli Random Variable


Definition 5.1.1. (Larsen and Marx, 2018)

Let X be defined as:



1 if an event occurs with probability p
X= (5.1)
0 if otherwise
X defined in this way on an individual trial is called a Bernoulli random variable with
expected value p and variance p(1 − p).

We have seen this in the problem set in Chapter 4! Hopefully, you got the expected value
and variance right!
One application is the case of a food processing company. It markets a soft cheese spread
that is sold in a plastic container with an ”easy pour” spout. Although this spout works
extremely well and is popular with consumers, it is expensive to produce. Because of the
spout’s high cost, the company has developed a new, less expensive spout. While the new,
cheaper spout may alienate some purchasers, a company study shows that its introduction
will increase profits if fewer than 10% of the cheese spread’s current purchasers are lost. That
means, a purchaser can either remain buying the cheese spread upon the introduction of the
new spout or stop. Either event will have a chance of occurring. This situation is a business
example of the Bernoulli random variable.

78
Example 5.1.1. Let X be a Bernoulli random variable associated with the acceptance of your
request to transfer from the Ateneo de Manila University to National University of Singapore.
1
If the probability of your successful entry is 100 , what are E(X) and V ar(X)?
1
SOLUTION: Using Definition 5.0.1, X is Bernoulli with expected value 100
and vari-
1 1 99
ance 100 (1 − 100 ) = 10000 .

Suppose we have n Bernoulli experiments. Each trial is independent with the same expected
value p and variance p(1 − p). Thus, n Bernoulli experiments will yield an expected value of np
and variance np(1 − p). These results are summarized in a behavior known as the binomial
distribution.

5.2 Binomial Distribution


Definition 5.2.1. (Perez, 2019)
Let X be a discrete random variable. It is said to follow a binomial distribution if
it consists of n independent and identical Bernoulli trials, each resulting in a success (with
probability p) or a failure (with probability 1 − p). Its PMF PX (k) is defined as:
 
n k
PX (k) = p (1 − p)n−k , k = 0, 1, 2, 3, ..., n (5.2)
k
The random variable X with this PMF is said to be binomially distributed with mean
np and variance np(1 − p).

The binomial distribution determines the probability of having k successes in n Bernoulli


trials with p as the probability of success. The combination term represents the number of ways
to arrange k successes among n trials. It is included in the equation because, in general, each
sample space outcome describing the occurrence of k successes in n trials represents a different
arrangement of k success in n trials.

Example 5.2.1. (Perez, 2019)


Define X as the number of students that pass their final exams. The probability that student
apsses the exam is p = 0.70. Considering 5 students, what is the PMF of this random variable
X? Find the probability of each outcome.
SOLUTION: Let us, first, get the sample space of X.

Outcomes X=k
No Students Pass: 0
1 Student Passes: 1
2 Students Pass: 2
3 Students Pass: 3
4 Students Pass: 4
5 Students Pass: 5
Table 43: The Sample Outcomes of X

To obtain the PMF, we should note that each student has a 0.70 probability to pass and 0.30
probability to fail, and that there are 5 possible outcomes. Using equation (5.2),

79
 
5
PX (k) = (0.70)k (0.30)5−k , k = 0, 1, 2, 3, 4, 5 (5.3)
k
or
 
 5 (0.70)0 (0.30)5−0

 ≈ 0.0024 , k = 0



 0
 
5


(0.70)1 (0.30)5−1 ≈ 0.0283 , k = 1






 1
 
5


2 5−2
≈ 0.1323 , k = 2

 2 (0.70) (0.30)


PX (k) =   (5.4)
5
(0.70)3 (0.30)5−3


 ≈ 0.3087 , k = 3
3




  
 5
(0.70)4 (0.30)5−4


 ≈ 0.3602 , k = 4
4




  
 5
(0.70)5 (0.30)5−5


 ≈ 0.1681 , k = 5
5

As a result, X is said to be binomially distributed with mean (5)(0.70) = 3.5 and variance
(5)(0.70)(0.30) = 1.05.

5.3 Poisson Distribution


Suppose we want to find out the probability of having k occurrences in an interval or region.
For instance, what are the chances that five cars will arrive at a car wash in three hours? What
is the probability of having no typhoons in the Philippines within six months? What are the
odds of having two frog maple leaves fall in one square yard? Random variables associated with
such scenarios follow a Poisson distribution (”pwa-son”).

Definition 5.3.1. (Perez, 2019)


Let X be a discrete random variable. X is said to follow a Poisson distribution if the
probability of the event’s occurrence is the same for any two intervals of equal length, and
whether the event occurs in any interval is independent of whether the event occurs in any
other nonoverlapping interval. Its PMF PX (k) is defined as:

e−µ µk
PX (k) = , k = 0, 1, 2, ... (5.5)
k!
where µ > 0 is the mean (or expected) number of occurrences of the event in the specified
interval.

Again, you have seen this in the problem set of the previous chapter. Any Poisson vari-
able has a mean and a variance µ. The distribution determines the probability of having k
occurrences in a given time interval or region.

Example 5.3.1. (Perez, 2019)

80
The number of telephone calls that arrive at a phone exchange is often modelled as a Poisson
random variable. Assume that on average, there are 10 calls per hour.
a. What is the probability that there are exactly 5 calls in one hour?
b. What is the probability that there are 15 calls in 2 hours?
SOLUTION:
a. From the problem, we are given µ = 10 and k = 5. We are asked to find the P (X = 5) =
PX (5). Using these values we get:

e−10 105
PX (5) = ≈ 0.0378 (5.6)
5!
b. For one hour, the average number of calls is 10. Thus, for two hours, the average number
of call is 20. We are asked to find P (X = 15) = PX (15). Using these values, we get:

e−20 2015
PX (5) = ≈ 4.8102 · 10−41 (5.7)
15!
Given this scenario, X observes a Poisson distribution with mean and variance 10.

5.4 Uniform Distribution


Suppose that events between two values are equally likely. Then, their behavior follows a
uniform distribution.
Definition 5.4.1. (Perez, 2019)

Let X be a continuous random variable. X is said to be uniformly distributed if given


real numbers a and b, its PDF fX (x) is defined as:

 1

,a ≤ x ≤ b
fX (x) = b − a (5.8)
0 otherwise

b+a (b−a)2
with mean 2
and variance 12
.
We have shown in the previous chapter the expected value and the variance of a uniformly
distributed random variable.
Example 5.4.1. (Perez, 2019)

Suppose you need to graph the probability distribution function of a uniform random variable
with a lower bound of 0 and an upper bound of 100. Let X be the associated random variable.
Its PDF is defined as:

 1 1

= , 0 ≤ x ≤ 100
fX (x) = 100 − 0 100 (5.9)
0 , otherwise

Its graph is presented in Figure 20.

81
Figure 20: The Graph of a Uniformly Distributed X

100+0 (100−0)2 2500


Moreover, E(X) = 2
= 50 and V ar(X) = 12
= 3
.

5.5 Normal Distribution


Now, we will be moving on to the most common distribution observed in most data sets we see
in this world. It is called the normal (or Gamma) distribution.

Definition 5.5.1. (Perez, 2019)

Let X be a continuous random variable. X is said to be normally distributed with mean


µ and variance σ 2 if its PDF fX (x) is given by:
1 1 x−µ 2
fX (x) = √ e[− 2 ( σ ) ] (5.10)
σ 2π
for all real number x.

The normal curve is given in Figure 21.

Figure 21: The Normal Curve

82
The normal distribution has a bell-shaped curve with parameters µ and σ 2 . This means
that the data points cluster around the mean. µ describes the center or the average of the
data. The normal curve is symmetric around this value. This means that the area under the
normal curve to the right of the mean equals the area under the normal curve to the left of
the mean. Each of these areas is equal to 0.5. The total area under the curve is 1. Moreover,
σ (the standard deviation) describes the spread of the data around the mean. The tails of the
normal curve extend to infinity in both directions and never touch the horizontal axis.
The distribution is said to be standard normal if µ = 0 and σ = 1. To evaluate prob-
abilities, statisticians use the z-table presented in Figures A1 and A2. The first column
and the first row present the z-scores (recall from Chapter 2) of data points. Hence, if X
is normally distributed, finding P (X < x) is the same as getting P ( X−µ
σ
< x−µ
σ
) = P (Z < z),
X−µ
where Z = σ .
Note that the z-tables only show the cumulative distribution frequency of a normal
behavior. In other words, for all normally distributed X, the table only gives P (X < x). Thus,
necessary transformations (such as complements) must be performed.

Example 5.5.1. (Perez, 2019)

In an IE 27 class, the long exam scores are normally distributed with mean 73 and standard
deviation 7. What is the probability that a randomly picked student has a failing score (that is,
below 60)?
SOLUTION: Let X be the random variable associated to the exam scores of students in
an IE 27 class. We are being asked P (X < 60). We will find the equivalent Z using z-scores.
With µ = 73 and σ = 7,
P (X < 60) = P ( X−73
7
< 60−73
7
) = P (Z < −1.86)
Using the z-tables, the probability is located at the row with heading ”-1.8” and at the column
with heading ”0.06”, which is 0.0314.
.˙. The probability that a randomly picked student has a failing score is 0.0314.

We can actually approximate a normal behavior using the binomial distribution. This is
possible if the sample size is large enough to approximate the normal distribution.
Definition 5.5.2. (Perez, 2019)
Consider a binomial random variable X, where n is the number of trials performed and
p is the probability of success on each trial. If n and p have values so that np ≥ 5 and
n(1 − p) ≥ 5, p
then X is approximately normally distributed with mean µ = np and standard
deviation σ = np(1 − p).
Example 5.5.2. (Perez, 2019))

The manufacturing of semiconductor chips produces products witha 2% chance of being


defective. Assume that the chips are independent and that a lot contains 1,000 chips. What is
the probability that more than 25 chips are defective?
SOLUTION: Let X be the number of defective chips. We need to find P (X > 25).
Computing
p for the mean and the standard deviation, we get µ = (1000)(0.02) = 20 and σ =
1000(0.02)(0.98) ≈ 4.43. Note that np = 20 ≥ 5 and n(1 − p) = 980 ≥ 5. Using these values,
we get:

83
P (X > 25) = P ( X−20
4.43
> 25−20
4.43
) = P (Z > 1.13) = 1 − P (Z ≤ 1.13).

Based from the z-tables, the probability can be found in the row with heading ”1.1” and the
column with heading ”0.03”, which is 0.8708.
.˙. The probability that more than 25 chips are defective is 1 − 0.8708 = 0.1392.

5.6 Problem Set 4 (Due the Next Session)


1. (Bowerman, Murphree, and O’Connell, 2014) The January 1986 mission of the Space
Shuttle Challenger was the 25th such shuttle mission. It was unsuccessful due to an
explosion caused by an O-ring seal failure.

a. According to NASA, the probability of such a failure in a single mission was 1/60,000.
Using this value of p and assuming all missions are independent, calculate the prob-
ability of no mission failures in 25 attempts. Then, calculate the probability of at
least one mission failure in 25 attempts.
b. According to a study conducted for the Air Force, the probability of such a failure
in a single mission was 1/35. Recalculate the probability of no mission failures in 25
attempts and the probability of at least one mission failure in 25 attempts.
c. How small must p be made in order to ensure that the probability of no mission
failures in 25 attempts is 0.999?

2. (Larsen and Marx, 2018) An investment analyst has tracked a certain blue-chip stock for
the past six months and found that on any given day, it either goes up a point or goes
down a point. Furthermore, it went up on 25% of the days and down on 75%. What is
the probability that at the close of trading four days from now, the price of the stock will
be the same as it is today? Assume that the daily fluctuations are independent events.

3. (Triola, 2018) The recent rate of car fatalities was 33,561 fatalities for 2,969 billion miles
traveled (based on data from the National Highway Traffic Safety Administration). Find
the probability that for the next billion miles traveled, there will be at least one fatality.
What does the result indicate about the likelihood of at least one fatality?

4. (Berenson, Krehbiel, and Levine, 2012) Assume that the number of network errors ex-
perienced in a day on a local area network (LAN) is distributed as a Poisson random
variable. The mean number of network errors experienced in a day is 2.4. What is the
probability that in any given day

a. zero network errors will occur?


b. exactly one network error will occur?
c. two or more network errors will occur?
d. fewer than three network errors will occur?

5. (Bowerman, Murphree, and O’Connell, 2014) A weather forecaster predicts that the May
rainfall in a local area will be between three and six inches but has no idea where within
the interval the amount will be. Let X be the amount of May rainfall in the local area,
and assume that X is uniformly distributed over the interval three to six inches.

84
a. Write the formula for the probability curve of X.
b. What is the probability that May rainfall will be at least four inches? At least five
inches?

6. (Bowerman, Murphree, and O’Connell, 2014) Suppose that an airline quotes a flight
time of 2 hours, 10 minutes between two cities. Furthermore, suppose that historical
flight records indicate that the actual flight time between the two cities, X, is uniformly
distributed between 2 hours and 2 hours, 20 minutes. Letting the time unit be one minute,
a. Calculate the mean flight time and the standard deviation of the flight time.
b. Find the probability that the flight time will be within one standard deviation of
the mean.
7. (Triola, 2018) A professor gives a test, and the scsores are normally distributed with
mean of 60 and a standard deviation of 12. She plans to curve the scores.

a. If she curves by adding 15 to each grade, what is the new mean and standard
deviation?
b. If the grades are curved so that grades of B are given to scores above the bottom
70% and below the top 10%, find the numerical limits for a grade of B.

8. (Weiss, 2017) A preliminary behavioral study of the Jingdong black gibbon, a primate
endemic to the Wuliang Mountains in China, found that the mean song bout duration in
the wet season is 12.59 minutes with a standard deviation of 5.31 minutes. [SOURCE: L.
Sheeran et al., ”Preliminary Report on the Behavior of the Jingdong Black Gibbon (Hy-
lobates concolor jingdongensis),” Tropical Biodiversity, Vol. 5(2), pp. 113-125] Assuming
that song bout is normally distributed, determine the percentage of song bouts that have
durations within

a. one standard deviation to either side of the mean.


b. two standard deviations to either side of the mean.
c. three standard deviations to either side of the mean.

9. (Weiss, 2017) According to the Australian Bureau of Statistics, during a particular year,
the country’s population was 17,843,000, out of which 22.6% were born overseas. If
a sample of 200 people is selected at random out of the total population, what is the
probability that the number of overseas births is
a. fewer than 65?
b. between 30 and 40, inclusive?
c. less than 30 or more than 50?
10. (Larsen and Marx, 2018) A sell-out crowd of 42,200 is expected at Cleveland’s Jacobs
Field for next Tuesday’s game against the Baltimore Orioles, the last before a long road
trip. The ballpark’s concession manager is trying to decide how much food to have on
hand. Looking at records from games played earlier in the season, she knows that, on the
average, 38% of all those in attendance will buy a hot dog. How large an order should
she place if she wants to have no more that a 20% chance of demand exceeding supply?

85
Chapter 6

Sampling Distribution and Confidence


Intervals

In the fifth chapter, we have tackled four known distributions in the field of statistics: the
binomial, Poisson, uniform, and normal distributions.
At this point, if you are reading this part of the lecture, I want to give you big ups for
your perseverance and your persistence to be the excellent version of yourself in class. Round
of applause, everybody! I just want to say to keep going! You are almost there!

6.1 Unbiased Estimate


If you can still recall Chapter 1, we discussed about the difference between a population and
a sample. One of the goals of a statistician is to choose a sample that best represents the
population of study. For instance, if you want to study the favorite Netflix series of Ateneans,
you can do a random sampling of students from the four main colleges of the Loyola Schools.
To know that your sample data is sufficient to be a representative of the population, the sample
mean must somehow be near to the population mean. If this criterion is satisfied, the estimate
of the population mean is said to be unbiased.

Definition 6.1.1. (Larsen and Marx, 2018)

Suppose that Y1 , Y2 , ..., Yn is a random sample from the continuous pdf fY (y; θ), where θ is
an unknown parameter. An estimator θ̂ (which is a function of the random sample) is said to
be biased (for θ) if E(θ̂) = θ for all θ.
Similarly, suppose that X1 , X2 , ..., Xn is a random sample from the discrete pdf pX (k; θ). An
estimator θ̂ (which is a function of the random sample) is said to be biased (for θ) if E(θ̂) = θ
for all θ.

In other words, the estimator is expected to represent the true value of a measurement of
the population.

Example 6.1.1. Suppose that X1 , X2 , ..., Xn is a random sample (from a population) of inde-
pendent and identically distributed variables with population mean µ and variance σ 2 .
Recall from Chapter 2 that the sample mean X is defined to be

86
Pn
X1 + X2 + ... + Xn i=1 Xi
X= = (6.1)
n n
We want to show that E (X) = µ, proving that the sample mean is unbiased. In fact,
applying the theorems related to expected values, as discussed in Chapter 5,

E (X) = E ( X1 +X2n+...+Xn )
1
E (X) = n
E (X1 + X2 + ... + Xn )
E (X) = n1 [E (X1 ) + E (X2 ) + ... + E (Xn )]
E (X) = n1 [µ + µ + ... + µ]
E (X) = n1 [nµ]
E (X) = µ

Example 6.1.2. Suppose X1 , X2 , ..., Xn be the same random sample as defined in Example
6.1.1.
Recall from Chapter 2 that the sample variance S 2 is defined to be
Pn
2 (X1 − X)2 + (X2 − X)2 + ... + (Xn − X)2 (Xi − X)2
S = = i=1 (6.2)
n−1 n−1
Note that S 2 is also unbiased.

(X1 −X)2 +(X2 −X)2 +...+(Xn −X)2


E (S 2 ) = E [ n−1 ]

1
E (S 2 ) = n−1
E[(X1 − X)2 + (X2 − X)2 + ... + (Xn − X)2 ]
E (S 2 ) = n−1 1
E[ ni=1 (Xi − X)2 ]
P

2
E (S 2 ) = n−11
E[ ni=1 (Xi2 − 2Xi X + X )]
P

2
E (S 2 ) = n−11
E[ ni=1 Xi2 − 2X ni=1 Xi + ni=1 X ]
P P P

2
E (S 2 ) = n−1 1
E[ ni=1 Xi2 − 2X(nX) + nX ] 1
P

2 2
E (S 2 ) = n−11
E[ ni=1 Xi2 − 2nX + nX ]
P

2
E (S 2 ) = n−1 1
E[ ni=1 Xi2 − nX ]
P

2
E (S 2 ) = n−1 1
{E[ ni=1 Xi2 ] − E[nX ]}
P

2
E (S 2 ) = n−1 1
{ ni=1 E(Xi2 ) − n E(X )}
P

2
E (S 2 ) = n−11
{ ni=1 (σ 2 + µ2 ) − n( σn + µ2 )} 2 .
P

Pn
1 Xi
Note that X = i=1 n , and X is constant (which makes its square also constant).
2
Recall that V ar(Xi ) = E(Xi2 ) − [E(Xi )]2 . Thus, σ 2 = E(Xi2 ) − µ2 , implyingthat E(Xi2 ) = σ 2 + µ2 . The
2
same is true for the sample mean. The statement V ar(X) = σn will be verified in the next section

87
1
E (S 2 ) = n−1
{n(σ 2 + µ2 ) − σ 2 − nµ2 } 3

1
E (S 2 ) = n−1
(nσ 2 + nµ2 − σ 2 − nµ2 )
1
E (S 2 ) = n−1
(nσ 2 − σ2)
1
E (S 2 ) = n−1
σ 2 (n − 1)
2 2
E (S ) = σ

That is actually the reason the denominator is n − 1 instead of n. This is to make the
sample variance S 2 the best estimate for the population variance σ 2 .

6.2 The Sampling Distribution of the Sample Mean


Previously, the sample mean X has already been introduced. It has been shown that E(X) = µ.
Now, we want to show the variance of the sample mean. With the same conditions presented
in Example 6.1.1,

V ar(X) = V ar( X1 +X2n+...+Xn )

V ar(X) = V ar( X1 +X2n+...+Xn )

1
V ar(X) = n2
V ar(X1 + X2 + ... + Xn )
1
V ar(X) = n2
[V ar(X1 ) + V ar(X2 ) + ... + V ar(Xn )]
1
V ar(X) = n2
[σ 2 + σ 2 + ... + σ 2 ]
1
V ar(X) = n2
[nσ 2 ]

σ2
V ar(X) = n

σ2
Henceforth, V ar(X) = n
.
Also, since the procedure of sampling is random, the possible sample means roughly follow
a normal curve. Hence, analysts can assume that the population is normally distributed. We
now have the following proposition.

Proposition 6.2.1. (Bowerman, Murphree, and O’Connell, 2014)

Assume that the population from which we will randomly select a sample of n measurements
has mean µ and standard deviation σ. Then, the population of all possible sample means:

1. has a normal distribution, if the sampled population has a normal distribution.

2. has mean E(X) = µ.


σ2 √σ ).
3. has variance V ar(X) = n
(thus, standard deviation n
3
Note that σ 2 + µ2 is constant.

88
Example 6.2.1. (Bowerman, Murphree, and O’Connell, 2014)

When a pizza restaurant’s delivery process is operating effectively, pizzas are delivered in an
average of 45 minutes with a standard deviation of 6 minutes. To monitor its delivery process,
the restaurant randomly selects five pizzas (n = 5) each night and records their delivery times.

a. For the sake of argument, assume that the population of all delivery times on a given
evening is normally distributed with a mean of µ = 45 minutes and a standard deviation
of σ = 6 minutes. (That is, we assume that the delivery process is operating effectively.)
Find the mean and the standard deviation of the population of all possible sample means,
and calculate an interval containing 99.73% of all possible sample mean.

b. Suppose that the mean of the five sampled delivery times on a particular evening is x = 55
minutes. Using the interval that you calculated in a, what would you conclude about
whether the restaurant’s delivery process is operating effectively? Why?

SOLUTION:

a. Let µx and σx be the expected value and the standard deviation of the population of all
possible sample means, respectively.

By Proposition 6.2.1, µx = 45 minutes, and σx = √65 ≈ 2.6833 minutes. Since the


sampling distribution is normally distributed, we can apply the empirical rule as defined
in Definition 2.4.4.

As a result of the rule, 99.73% of the population of all possible sample means are within
three standard deviations of the mean. Thus, 99.73% lies in the interval [µx − 3σx , µx +
3σx ] = [45 − 3 · √65 , 45 + 3 · √65 ] ≈ [36.9502, 53.0498]

b. Suppose x = 55 minutes. Hence, the value is beyond the tolerance interval presented in
a. In fact, the sample mean is greater than the maximum limit of 53.0498 minutes. This
means that the population mean may be greater than 45 minutes, making the
delivery process as operating ineffectively possible.

It is also possible that the population of sample means may not observe a normal distribu-
tion. The Central Limit Theorem reconciles this possibility.

Proposition 6.2.2. (Triola, 2018), The Central Limit Theorem

For all samples of the same size n with n > 30, the sampling distribution of X can be
approximated by a normal distribution with mean µ and standard deviation √σn .

In other words, if the sample size n is sufficiently large, then the population of all possible
sample means is approximately normally distributed.
The assumption of n > 30 is most commonly used since it is at this set of numbers that
the normal curve is visible. To show this, a set of R codes will be presented in class (please do
remind me if I have forgotten about this huhu).

89
Example 6.2.2. (Bowerman, Murphree, and O’Connell, 2014)

Suppose that we will take a random sample of size n from a population having mean µ
and standard deviation σ. For each of the following situations, find the mean, variance, and
standard deviation of the sampling distribution of the sample mean x. In which cases, must we
assume that the population is normally distributed? Why?

a. µ = 10, σ = 2, n = 25

b. µ = 500, σ = 0, 5, n = 100

SOLUTION:
a. Given these information and Proposition 6.2.1, the population of all possible sample
2
means has a distribution with mean 10, variance (2)
25
4
= 25 = 0.16, and standard deviation
2
5
= 0.4. However, since n = 25 < 30, we cannot use the results of the Central Limit
Theorem. Thus, we cannot conclude that the population is normally distributed.

b. Given these information and Proposition 6.2.1, the population of all possible sample
2
means has a distribution with mean 500, variance (0.5)
100
= (0.25)
100
= 0.0025, and standard
deviation 0.5
10
= 0.05. Since n = 100 > 30, by the Central Limit Theorem, the population
of all possible sample means is normally distributed.

6.3 The Sampling Distribution of the Sample Propor-


tion
To continue from the cheese-spout story (Chapter 5, p. 78), if we let p be the true proportion
of all current purchasers who would stop buying the cheese spread if the new spout were used,
profits will increase as long as p is less than 0.10.
Suppose that (after trying the new spout) 63 of 1,000 randomly selected purchasers say that
they would stop buying the cheese spread if the new spout were used. The point estimate of
63
the population proportion p is the sample proportion p̂ = 1000 = 0.063. This sample proportion
says that we estimate that 6.3% of all current purchasers would stop buying the cheese spread if
the new spout were used. Suppose that from another set of 1,000 randomly selected purchasers,
58
58 would stop buying the cheese spread if the new spout were used. Hence, p̂ = 1000 = 0.058.
Lastly, if 72 of another 1,000 randomly selected purchasers would stop buying the cheese spread,
72
then p̂ = 1000 = 0.072.
Therefore, the proportion changes depending on the responses of randomly selected partic-
ipants. This means p̂ is also a random variable. The probability distribution of this random
variable is called the sampling distribution of the sample proportion p̂.
Definition 6.3.1. (Bowerman, Murphree, and O’Connell, 2014)

The population of all possible sample proportions

a. Approximately has a normal distribution, if the sample size n is large.

b. Has mean E(p̂) = p.

90
q
p(1−p) p(1−p)
c. Has variance V ar(p̂) = n
(thus, standard deviation n

Parts b and c of this definition are actually some parts of the results of the Bernoulli random
variable.

Example 6.3.1. (Bowerman, Murphree, and O’Connell, 2014)

Suppose p = .5 and n = 2500. The estimate of the sample proportion p̂ is distributed with
mean 0.5 and variance (0.5)(1−0.5)
2500
= 0.01 (thus, with standard deviation 0.1). Since n = 2500 >
30, by the Central Limit Theorem, p̂ is approximately normally distributed.

6.4 Confidence Intervals for a Population Mean: σ Known


As you may have noticed in the previous sections, the sample mean X depends on the results
of the random sampling X1 , X2 , ..., Xn of analysts. Theoretically, we expect that the average of
all samples is the population mean, the one that we are looking for. However, the sample mean
X is not necessarily the population mean µ (unless you are lucky). One way to estimate the
true population mean is to ”guess” an interval of values from the sample mean, which contains
the population mean.
If you recall Chapter 2, we ”guess” such intervals through the empirical rule or Chebyshev’s
theorem. For instance, if the population is normally distributed, by the empirical rule, 95.44%
of values lie within two standard deviations from the population mean. In other words, 95.44%
of values lie in the interval [µ − 2σ, µ + 2σ]. Since the population of all possible sample means
is normally distributed with mean µ and standard deviation σX , the same principle applies;
95.44% of all sample means lie in the interval [µ−2σX , µ+2σX ]. Equivalently, there is a 95.44%
probability that the sample mean is within two standard deviations from the population mean,
or we are 95.44% confident that the given interval contains the population mean.
As a result, statisticians utilize the concepts of a confidence interval and the confidence
level.

Definition 6.4.1. (Triola, 2018)

• A confidence interval (or interval estimate) is a range (or an interval) of values used
to estimate the true value of a population parameter. A confidence interval is sometimes
abbreviated as CI.

• The confidence level is the probability 1 − α (such as 0.9544 or 95.44%) that the confi-
dence interval actually does contain the population parameter, assuming that the estima-
tion process is repeated a large number of times. (The confidence level is also called the
degree of confidence, or the confidence coefficient.)

From the definition, α is the probability that the population parameter is beyond the con-
fidence interval. To illustrate, refer to Figure 22.

91
Figure 22: The Confidence Interval at Confidence Level 95% (Triola, 2018)

The readers must note that all sampling distributions are assumed to be normally dis-
tributed. Hence, for convenience of analysis, we take the corresponding z-scores of the limits.
By doing so, we make a symmetric normal curve similar to that of the figure above.
The 95% (1 − α = 95%) confidence interval is actually the area under the curve of the two
z-score limits z α2 and -z α2 . Recall from Chapter 2 that z-scores are defined as
x−µ
z= (6.3)
σ
Since we are inferring on the population mean based on the sample mean,
x−µ
z= (6.4)
√σ
n

where n is the sample size. You might be wondering, ”Why α2 ?” Consequently, the area
not covered by the confidence level is α = 5%. Since the normal curve is symmetric, the area
of the two ends is 2.5% each. Since the population data values x are assumed to be normally
distributed, the z-scores are also normally distributed. Hence, given a random variable Z that
is normally distributed and a confidence level 1 − α,

P (−z α2 ≤ Z ≤ z α2 ) = 1 − α (6.5)

In the case of the figure above, when α = 0.05,

P (−z0.05 ≤ Z ≤ z0.05 ) = 1 − 0.05 = 0.95 (6.6)

This is also equivalent to

P (Z ≤ z0.05 ) = 0.95 + 0.025 = 0.975 (6.7)

since, from Figure β, the area at the leftmost part of the graph is also less than z0.05 .
In general,

α α
P (Z ≤ z α2 ) = 1 − α + =1− (6.8)
2 2
92
If we combine equations (6.5) and (6.9),
x−µ α
P (Z ≤ )=1− (6.9)
√σ 2
n

From equations (6.9) and (6.10),

x−µ
= z α2 (6.10)
√σ
n

which implies that

σ
x = µ + √ z α2 (6.11)
n

Note that this value is the right-hand limit of the confidence interval. Since our best estimate
of the population mean µ is the sample mean x, then the right-hand limit of the 100(1 − α)%
confidence interval xR is
σ
xR = x + √ z α2 (6.12)
n

It can be shown that the left-hand limit of the 100(1 − α)% confidence interval xL is
σ
xL = x − √ z α2 (6.13)
n

The term √σn z α2 is known in statistics as the margin of error. Loosely speaking, this is an
”allowance” of errors around the sample mean to estimate the population mean.
Given such results, we now have the following proposition regarding the confidence interval
for a population mean µ and a known standard deviation σ.

Definition 6.4.2. (Bowerman, Murphree, and O’Connell, 2014)

Suppose that the sampled population is normally distributed with mean µ and known standard
deviation σ. Then, a 100(1 − α)% confidence interval for µ is
σ σ
[x − z α2 √ , x + z α2 √ ] (6.14)
n n

Example 6.4.1. (Bowerman, Murphree, and O’Connell, 2014)

Consider the trash bag problem. Suppose that an independent laboratory has tested trash bags
and has found that no 30-gallon bags that are currently on the market have a mean breaking
strength of 50 pounds or more. On the basis of these results, the producer of the new, improved
trash bag feels sure that its 30-gallon bag will be the strongest such bag on the market if the
new trash bag’s mean breaking strength can be shown to be at least 50 pounds. The mean of
the sample of 40 trash bag breaking strengths is x = 50.575. If we let µ denote the mean of the
breaking strengths of all possible trash bags of the new type and assume that σ = 1.65:

93
a. Calculate the 95% and 99% percent confidence intervals for µ.

b. Using the 95 percent confidence interval, can we be 95% percent confident that µ is at
least 50 pounds? Explain.

c. Using the 99 percent confidence interval, can we be 99 % confident that µ is at least 50


pounds? Explain.

SOLUTION:

a. Since we are looking for the 95% confidence interval for µ, α = 5% = 0.05. Thus, based
on the z-table in Figure A2, z0.025 = 1.96. This means that the 95% confidence interval
1.65 1.65
for µ is [50.575 − (1.96) √ 40
, 50.575 + (1.96) √ 40
] ≈ [50.0637, 51.0863].

Since we are looking for the 99% confidence interval for µ, α = 1% = 0.01. Thus, based
on the z-table in Figure A2, z0.005 = 2.575. This means that the 99% confidence interval
1.65 1.65
for µ is [50.575 − (2.575) √ 40
, 50.575 + (2.575) √ 40
] ≈ [49.9032, 51.2468].

b. If the 95% confidence interval for µ is [50.0637, 51.0863], we can be 95% confident that
the population mean breaking strength is at least 50 pounds.

c. Unfortunately, we cannot say the same for the 99% confidence interval. Since the interval
is [49.9032, 51.2468], there are evidences that the population mean breaking strength can
be less than 50 pounds.

6.5 Confidence Intervals for a Population Mean: σ Un-


known
Suppose that the population standard deviation is unknown. The closest that we can do is to
get the sample standard deviation s (recall Chapter 2). Hence, altering the formula for the
z-scores yields in the following equation:
x−µ
t= (6.15)
√s
n

If the sampled population is normally distributed, then for any sample size n, this sampling
distribution is what is called a t distribution. The standard deviation of the distribution
depends on the number of degrees of freedom (denoted by df ). In equation (6.15), t is said
to have n-1 degrees of freedom due to the unbiased sample standard deviation. If we are to
refer to Figure 23, the t distribution is almost like a normal curve. The only difference is that
the t distribution is more spread out than the normal. Note that as the degrees of freedom
increases, the less spread out the data values are.

94
Figure 23: Comparisons among Normal and t Distributions (Bowerman, Murphree, and
O’Connell, 2014)

Definition 6.5.1. Suppose that the sampled population is normally distributed with mean µ
and unknown standard deviation σ. Then, a 100(1-α)% confidence interval for µ is
s s
[x − t α2 √ , x + t α2 √ ] (6.16)
n n

Note that the margin of error here is t α2 √sn .

Example 6.5.1. (Berenson, Krehbiel, and Levine, 2012)


Listed below are the cost per ounce ($) for a sample of 14 dark chocolate bars:

0.68 0.72 0.92 1.14 1.42 0.94 0.77


0.57 1.51 0.57 0.55 0.86 1.41 0.90

a. Construct a 95% confidence interval estimate for the population cost per ounce ($) of dark
chocolate bars.

b. What assumption do you need to make about the population distribution to construct the
interval in a?

SOLUTION:

a. We are given n = 14 and α = 5%. The sample mean x is equal to $0.93 while the
sample standard deviation s is $0.33. Moreover, with n − 1 = 13 degrees of freedom, using
the t-table in Figure A3, t α2 = 2.160. Hence, the 95% confidence interval around the
0.33 0.33
population mean µ is [0.93 − (2.160) √ 14
, 0.93 + (2.160) √ 14
] ≈ [$ 0.74, $ 1.11].

b. To do the process in a, the population of all sample means must be normally distributed.

95
6.6 Confidence Intervals for a Population Proportion
Now that we are exposed to the confidence intervals for the population mean, think you can
guess those for the population proportion? ;) Hint: Definition 6.4.1 in page 91!

Definition 6.6.1. (Bowerman, Murphree, and O’Connell, 2014)

If the sample size n is large, a 100(1-α)% confidence interval for the population
proportion p is
r r
p̂(1 − p̂) p̂(1 − p̂)
[p̂ − z α2 , p̂ + z α2 ] (6.17)
n n
where p̂ is the sample proportion. Here n should be considered large if both np̂ and n(1 − p̂)
are at least 5.
q
From the definition, the margin of error in relation to the sample proportion is z α2 p̂(1−p̂)
n

Example 6.6.1. (Berenson, Krehbiel, and Levine, 2012)

Have you ever negotiated a pay raise? According to an Accenture survey, 52% of U.S. work-
ers have (J. Yang and K. Carter, ”Have You Ever Negotiated a Pay Raise?” www.usatoday.com,
May 22, 2009).

a. Suppose that the survey had a sample size of n = 500. Construct a 95% confidence interval
for the proportion of all U.S. workers who have negotiated a pay raise.

b. Based on a, can you claim that more than half of all U.S. workers have negotiated a pay
raise?

SOLUTION:

a. We are given p̂ = 52%, α = 5%, and n = 500. Thus, using the z-table, z α2 = 1.96. With
these information, the 95% confidence interval
q for the proportion of all U.S.
q workers who
(0.52)(1−0.52) (0.52)(1−0.52)
have negotiated a pay raise is [0.52 − (1.96) 500
, 0.52 + (1.96) 500
] ≈
[0.4762, 0.5638].

b. From the confidence interval in a, there are evidences that less than half of all U.S.
workers have negotiated a pay raise.

6.7 Sample Size Determination


Let’s recall all of the margin of errors that we have discussed so far. Let E known σ , E unknown σ ,
and E proportion be the margin of errors for known σ, unknown σ, and proportion, respectively.
σ
E known σ = z α2 √ (6.18)
n
s
E unknown σ = t α2 √ (6.19)
n

96
r
p̂(1 − p̂)
E proportion = z α2 (6.20)
n
Suppose that the sample size n is unknown. Hence, with much manipulation, we have the
following:
σz α2
n known σ = ( )2 (6.21)
E known σ
st α2
n unknown σ = ( )2 (6.22)
E unknown σ
p
p̂(1 − p̂)z α2 2
n proportion =( ) (6.23)
E proportion
Example 6.7.1. (Berenson, Krehbiel, and Levine, 2012)

An advertising agency that serves a major radio station wants to estimate the mean amount
of time that the station’s audience spends listening to the radio daily. From past studies, the
standard deviation is estimated as 45 minutes. What sample size is needed if the agency wants
to be 90% confident of being correct to within ± 5 minutes?
SOLUTION: We are given σ = 45, E known σ = 5, and α = 10% (thus, z α2 = 1.645).
σz α
Using equation (6.22), the required sample size to have a margin of error of 5 is ( E known
2
σ
)2 =
[ (45)(1.645)
5
]2 = 219.1880 ≈ 220. In cases of a non-integer answer, the result must be rounded
up.

Example 6.7.2. (Berenson, Krehbiel, and Levine, 2012)

What proportion of Americans get most of their news from the Internet? According to a
poll conducted by Pew Research Center, 40% get most of their news from the Internet (Data
extracted from ”Drill Down,” The New York Times, January 5, 2009). To conduct a follow-up
study that would provide 99% confidence that the point estimate is correct to within ± 0.04 of
the population proportion, how many people need to be sampled?
SOLUTION: We are given p̂ = 40%, E proportion = 0.04, and α = 1% (thus, z α2 =
2.575).
√ Using equation (6.24), the required sample size to have a margin of error of 0.04 is

p̂(1−p̂)z α (0.40)(0.60(2.575) 2
2 2
( E proportion ) = [ (0.04)
] ≈ 405.9038 ≈ 406.

Example 6.7.3. (Bowerman, Murphree, and O’Connell, 2014)

Regard the sample of 10 sales figures for which s = 32.866 as a preliminary sample. How
large a sample of sales figures is needed to make us 95% confident that x, the sample mean
sales dollars per square foot, is within a margin of error of $ 10 of µ, the population mean sales
dollars per square foot for all Whole Foods supermarkets?
SOLUTION: We are given s = 32.866, E unknown σ = $10, df = 10 − 1 = 9, α = 5% (thus,
t = 2.262). Using equation (6.25), the required sample size to have a margin of error of $ 10
α
2
st α
is ( E unknown
2
σ
)2 = [ (32.866)(2.262)
(10)
]2 ≈ 55.2687 ≈ 56.

97
6.8 Problem Set 5 (Due the Next Session)
1. (Berenson, Krehbiel, and Levine, 2012) The diameter of a brand of Ping-Pong balls is
approximately normally distributed, with a mean of 1.30 inches and a standard deviation
of 0.04 inch. If you select a random sample of 16 Ping-Pong balls,

a. what is the sampling distribution of the mean?


b. what is the probability that the sample mean is less than 1.28 inches?
c. what is the probability that the sample mean is between 1.31 and 1.33 inches?
d. The probability is 60% that the sample mean will be between what two values,
symmetrically distributed around the population mean?

2. (Weiss, 2017) An ethanol railroad tariff is a fee charged for shipments of ethanol on
public railroads. The Agricultural Marketing Service publishes tariff rates for railroad-
car shipments of ethanol in the Biofuel Transportation Database. Assuming that the
standard deviation of such tariff rates is $1,150, determine the probability that the mean
tariff rate of 500 randomly selected railroad-car shipments of ethanol will be within $100
of the mean tariff rate of all railroad-car shipments of ethanol.

3. (Ang, 2012) A garment manufacturer sells its cloth scraps to small businesses. The cloth
scraps are baled. The weights of the bales are normally distributed with a mean weight
of 50.8 kilograms. The garment manufacturer only sells batches of 25 bales each time and
the garment manufacturer workers randomly select them from their warehouse. Based on
the company records from sales of several batches of 25 bales each, 15% of the time the
mean weight of a bale is more than 51.5 kilograms.

a. What is the population standard deviation of the weights of the bales of scrap cloth?
b. Suppose a buyer buys 25 bales, what is the probability that the mean weight of the
bales is more than 52 kilograms?
c. Suppose a buyer randomly selects a bale from the warehouse, what is the probability
that it is less than 51.5 kilograms?

4. (Bowerman, Murphree, and O’Connell, 2014) In an article in Marketing Science, Silk and
Berndt investigate the output of advertising agencies. They describe ad agency output
by finding the shares of dollar billing volume coming from various media categories such
as network television, spot television, newspapers, radio, and so forth.

a. Suppose that a random sample of 400 U.S. advertising agencies gives an average per-
centage share of billing volume from network television equal to 7.46%, and assume
that σ = 1.42%. Calculate a 95% confidence interval for the mean percentage share
of billing volume from network television for the population of all U.S. advertising
agencies.
b. Suppose that a random sample of 400 U.S. advertising agencies gives an average per-
centage share of billing volume from network television equal to 12.44%, and assume
that σ = 1.55%. Calculate a 95% confidence interval for the mean percentage share
of billing volume from network television for the population of all U.S. advertising
agencies.

98
c. Compare the confidence intervals in parts a and b. Does it appear that the mean
percentage share of billing volulme from spot television commericals for U.S. ad-
vertising agencies is greater than the mean percentage share of billing volume from
network television? Explain.

5. (Carlson, Newbold, and Thorne, 2013) How much do students pay, on the average, for
textbooks during the first semester of college? From a random sample of 400 students
the mean cost was found to be $ 357.75, and the sample standard deviation was $ 37.89.
Assuming that the population is normally distributed, find the margin of error of a 95%
confidence interval for the population mean.

6. (Larsen and Marx, 2018) How long does it take to fly from Atlanta to New York’s La-
Guardia airport? There are many components of the time elapsed, but one of the more
stable measurements is the actual in-airtime. For a sample of eighty-three flights between
these destinations on Fridays in October, the time in minutes (y) gave the following
results:
P83
t=1 yt = 8, 622
P83
t=1 yt2 = 899, 750

Find a 99% confidence interval for the average flight time.

7. (Ang, 2011) You were assigned to conduct a study about middle-aged working people’s
size preference for tablet computers. A random sample of 300 middle-aged employees
shows that 120 of them prefer a 10” table while 180 prefer a 7” tablet.

a. Construct a 96% confidence interval for the true proportion of employees who prefer
a 10” tablet.
b. If you want to obtain a 98% confidence interval which is within 5% of the true
population mean, what would be your minimum required sample size?

8. (Triola, 2018) An investor is considering funding of a new video game. She wants to know
the worldwide percentage of people who play video games, so a survey is being planned.
How many people must be surveyed in order to be 90% confident that the estimated
percentage is within three percentage points of the true population percentage?

a. Assume that nothing is known about the worldwide percentage of people who play
video games.
b. Assume that about 16% of people play video games (based on a report by Spil
Games).

9. (Weiss, 2017) Researcher Sudhinta Sinha discussed how people adjust to prison life in
the article ”Adjustment and mental health problem in prisoners” (Industrial Psychiatry
Journal, Vol. 19, No. 2, pp. 101-104). A sample of 37 sentenced adult male prisoners
had an average age of 36.7 years. Assume that, for the sentenced adult male prisoners,
the population standard deviation of age is 8.0 years.

a. Find the margin of error, E.

99
b. Determine the sample size required to have a margin of error of 1.2 year and a 90%
confidence level.
c. Find a 90% confidence interval for µ if a sample of the size determined in part (c)
yields a mean of 32.5 years.

10. (Berenson, Krehbiel, and Levine, 2012) An advertising agency that serves a major radio
station wants to estimate the mean amount of time that the station’s audience spends
listening to the radio daily. From past studies, the standard deviation is estimated as 45
minutes.

a. What sample size is needed if the agency wants to be 90% confident of being correct
to within ± 5 minutes?
b. If 99% confidence is desired, how many listeners need to be selected?

100
Chapter 7

Hypothesis Testing and Applications

In the sixth chapter, we have tackled the following topics:

• understanding unbiased estimates

• analyzing the distributions of the sample mean and the sample proportion

• inferring the population mean through confidence intervals with known and un-
known standard deviations

• inferring the population proportion through confidence intervals

• determining the approriate sample size at a given significance level

Well, this is it. The final chapter! Wooo! You are almost there! Keep on going!
This chapter is a continuation in inferring population parameters through point estimates.
We do that by using confidence intervals most of the time. Now, we will use such intervals to
test claims.

7.1 Null and Alternative Hypotheses and Errors in Hy-


pothesis Testing
Suppose a friend tells you, ”I feel like I am going to fail QMT 11”. You see him, with his teary
eyes, hopeless in one corner of Matteo Ricci Hall to pass QMT 11. You, as a friend, want to be
a strong support system and find evidences that he may somehow pass the course. You may
ask, ”How are your quizzes? Are they good?”. Another would be, ”Are there any chances that
you can make your F into a D at least?” These questions are actually the motivation for the
next topic of this course, which is to find evidence that a certain claim is false. This is what
statisticians call hypothesis testing.
In hypothesis testing, the analyst states the null hypothesis, denoted by H0 , the statement
being tested. Often, the null hypothesis is a statement of “no difference” or “no effect”. The
null hypothesis is not rejected unless there is convincing sample evidence that it is false.
The alternative, or research, hypothesis, denoted by Ha , is a statement that will be
accepted only if there is convincing sample evidence that it is true.

101
For example, if H0 : ”Students studied an average of 14.6 hours per week”, Ha : ”Students
studied an average of more than 14.6 hours per week.” The study wishes to test whether
students spent 14.6 hours per week for studying or not.
Another example would be H0 : µ = 0.5 vs Ha : µ 6= 0.5. The study wishes to test the claim
that the population mean is 0.5.
One last example would be H0 : σ = 0 vs Ha : σ > 0. The study wants to test the claim
that the population standard deviation is 0.
From the examples, one can deduce the conclusions of the hypothesis testing: reject H0 or
do not reject H0 .
However, there are risks in hypothesis testing. They are summarized in Table 44.

Conclusion \ True Nature H0 is True H0 is False


Reject H0 Type I Error Correct Decision
Fail to Reject H0 Correct Decision Type II Error
Table 44: The Possible Results in a Hypothesis Testing

From the table above, a type I error occurs if the analyst rejects the null hypothesis when
it is true. The probability α of a type I error is called the significance level.
Furthermore, a type II error occurs if the analyst does not reject the null hypothesis when
it is false. The probability 1 − β of rejecting H0 when it is false is called the power of the test.
There are two ways to do hypothesis testing. Consider the following normal curve with
α = 5%

Figure 24: The Normal Curve with a 95% Confidence Interval

If you recall from the previous chapter, the 95% confidence interval indicates that the analyst
is 95% confident that the interval (in this case, [−1.96, 1.96]) contains the population parameter.
To reject the null hypothesis, we must find evidences that the population parameter is beyond
this interval. That means it can either be less than the lower limit (in the figure, that is −1.96)
or greater than the upper limit (which is 1.96).

102
In other words, given the confidence interval [a, b], we reject H0 if z-scores z are found
to be either z > b or z < a.
Another way to reject H0 is to check p-values. It is defined to be the the probability of
observing a value that at least contradicts H0 and supports Ha . In other words, that is the
probability that the data point lies beyond the confidence interval. Mathematically, we reject
H0 if p-values are less than α (for one-tailed tests) or α/2 (for two-tailed tests). To
have a smaller area than that of the significance level is to say that the test value is beyond
the confidence interval.
Now, let’s apply them!

7.2 One-Tailed Tests


Example 7.2.1. (Bowerman, Murphree, and O’Connell, 2014)
Suppose that we wish to test H0 : µ = 80 versus Ha : µ > 80, where σ is known to equal 20.
Also, suppose that a sample of n = 100 measurements randomly selected from the population
has a mean of x = 85.

a. Calculate the value of the test statistic z

b. By comparing z with a critical value, test H0 versus Ha at α = 0.05.

c. Calculate the p-value for testing H0 versus Ha .

d. What is the conclusion of the given hypothesis testing?

SOLUTION:

a. Suppose µ = 80. Then, its corresponding z − score is

85−80 5
z= √
20/ 100
= 2
= 2.5

X−µ
b. At a 5% significance level, we want to find the limits such that P ( σ/√ < z∗) = 95%.
n
Verify that the limit z∗ is 1.645. This means that 2.5 = z > z∗ = 1.645.

c. P (Z < 2.5) = 0.9938. Hence, P (Z ≥ 2.5) = 1 − 0.9938 = 0.0062. This means that the
p-value is less than 5%, indicating there are evidences that the population mean is beyond
80.

d. As a result of the two methods, we reject H0 .

Example 7.2.2. (Bowerman, Murphree, and O’Connell, 2014)


Suppose that we wish to test H0 : µ = 20 versus Ha : µ < 20, where σ is known to equal
7. Also, suppose that a sample of n = 49 measurements randomly selected from the population
has a mean of x = 18.

a. Calculate the value of the test statistic z

103
b. By comparing z with a critical value, test H0 versus Ha at α = 0.01.

c. Calculate the p-value for testing H0 versus Ha .

d. What is the conclusion of the given hypothesis testing?

SOLUTION:

a. Suppose µ = 20. Then, its corresponding z − score is

18−20 −2
z= √
7/ 49
= 1
= −2

X−µ
b. At a 1% significance level, we want to find the limits such that P (z∗ < √ )
σ/ n
= 99%.
Verify that the limit z∗ is -2.33. This means that −2 = z > z∗ = −2.33.

c. P (Z < −2) = 0.0228. This means that the p-value is greater than 1%, indicating there
are insufficient evidences that the population mean is beyond 20.

d. As a result of the two methods, we do not reject H0 .

7.3 Two-Tailed Tests


Example 7.3.1. (Bowerman, Murphree, and O’Connell, 2014)
Suppose that we wish to test H0 : µ = 40 versus Ha : µ 6= 20, where σ is known to equal 18.
Also, suppose that a sample of n = 81 measurements randomly selected from the population has
a mean of x = 35.

a. Calculate the value of the test statistic z

b. By comparing z with a critical value, test H0 versus Ha at α = 0.05.

c. Calculate the p-value for testing H0 versus Ha .

d. What is the conclusion of the given hypothesis testing?

SOLUTION:

a. Suppose µ = 40. Then, its corresponding z − score is

35−40 −5
z= √
18/ 81
= 2
= −2.5

X−µ
b. At a 5% significance level, we want to find the limits such that P (−z∗ < σ/√ < z∗) =
n
95%. Verify that the limit z∗ is 1.96. This means that −2.5 = z < −z∗ = −1.96.

c. Since the normal curve is symmetric, P (Z < −2.5) = P (Z > 2.5) = 0.0062. This means
that the p-value of 2(0.0062) = 0.0124 is less than 5%, indicating there are evidences that
the population mean is not 40.

d. As a result of the two methods, we reject H0 .

104
To continue the analysis, we have the following interpretations.
If the p-value for testing H0 is less than:

• 0.10, we have some evidence that H0 is false.

• 0.05, we have strong evidence that H0 is false.

• 0.01, we have very strong evidence that H0 is false.

• 0.001, we have extremely strong evidence that H0 is false.

7.4 Problem Set 6 (Due the Next Session)


1. (Weiss, 2017) A study by M. Chen et al. titled ”Heat Stress Evaluation and Worker
Fatigue in a Steel Plant” (American Industrial Hygiene Association, Vol. 64, pp. 352-
359) assessed fatigue in steel-plant workers due to heat stress. A random sample of 29
casting workers had a mean post-work heart rate of 78.3 beats per minute (bpm). At
the 5% significance level, do the data provide sufficient evidence to conclude that the
mean post-work heart rate for casting workers exceeds the normal resting heart rate of 72
bpm? (To add more depth to the analysis, use the interpretations at the end of Section
7.3.) Assume that the population standard deviation of post-work heart rates for casting
workers is 11.2 bpm.

2. (Triola, 2018) The drug OxyContin (oxycodone) is used to treat pain, but it is dangerous
because it is addictive and can be lethal. In clinical trials, 227 subjects were treated with
OxyContin and 52 of them developed nausea (based on data from Purdue Pharma L.P.).
Use a 0.05 significance level to test the claim that more than 20% of OxyContin users
develop nausea.
52
a. Construct a 95% confidence interval around the sample proportion p̂, which is 227
.
Using this interval, test if 20% of OxyContin users develop nausea.
b. Does the rate of nausea appear to be too high?

3. (Berenson, Krehbiel, and Levine, 2012) You are the manager of a restaurant for a fast-food
franchise. Last month, the mean waiting time at the drive-through window for branches in
your geographical region, as measured from the time a customer places an order until the
time the customer receives the order, was 3.7 minutes. You select a random sample of 64
orders. The sample mean waiting time is 3.57 minutes, with a sample standard deviation
of 0.8 minute. At the 0.05 level of significance, is there evidence that the population mean
waiting time is different from 3.7 minutes? (To add more depth to the analysis, use the
interpretations at the end of Section 7.3.)

4. (Larsen and Marx, 2018) As a class research project, Rosaura wants to see whether the
stress of final exams elevates the blood pressures of freshmen women. When they are
not under any untoward durress, healthy eighteen-year-old women have systolic blood
pressures that average 120mm Hg with a standard deviation of 12mm Hg. If Rosaura
finds that the average blood pressure for the fifty women in Statistics 101 on the day
of the final exam is 125.2, what should she conclude? Set up, and test an appropriate

105
hypothesis. (To add more depth to the analysis, use the interpretations at the end of
Section 7.3.)

5. (Larsen and Marx, 2018) Commercial fishermen working certain parts of the Atlantic
Ocean sometimes find their efforts hindered by the presence of whales. Ideally, they
would like to scare away the whales without frightening the fish. One of the strategies
being experimented with is to transmit underwater the sounds of a killer whale. On the
fifty-two occasions that technique has been tried, it worked twenty-four times (that is,
the whales immediately left the area). Experience has shown, though, that 40% of all
whales slighted near fishing boats leave of their own accord, probably just to get away
from the noise of the boat.

a. Let p = P (Wale leaves area after hearing sounds of killer whale). Test H0 : p = 0.40
versus H1 : p > 0.40 at the α = 0.05 level of significance. Can it be argued on
the basis of these data that transmitting underwater predator sounds is an effective
technique for clearing fishing waters of unwanted whales?
b. Calculate the P-value for these data. For what values of α would H0 be rejected?

6. (Triola, 2018) A clinical trial was conducted to test the effectiveness of the drug zopiclone
for treating insomnia in older subjects. Before treatment with zopiclone, 16 subjects had
a mean wake time of 102.8 min. After treatment with zopiclone, the 16 subjects had a
mean wake time of 98.9 min and a standard deviation of 42.3 min (based on data from
”Cognitive Behavioral Therapy vs Zopiclone for Treatment of Chronic Primary Insomnia
in Older Adults,” by Sivertsen et al., Journal of the American Medical Association, Vol.
295, No. 24). Assume that the 16 sample values appear to be from a normally distributed
population, and test the claim that after treatment with zopiclone, subjects have a mean
wake time of less than 102.8 min. Does zopiclone appear to be effective? (To add more
depth to the analysis, use the interpretations at the end of Section 7.3.)

7. (Weiss, 2017) Alligators perform a spinning maneuver, referred to as a ”death roll”, to


subdue their prey. Videos were taken of juvenile alligators performing this maneuver in
a study for the article ”Death Roll of the Alligator, Mechanics of Twist and Feeding in
Water” (Journal of Experimental Biology, Vol. 210, pp. 2811-2818) by F. Fish et al. One
of the variables measured was the degree of the angle between the body and head of the
alligator while performing the roll. A sample of 20 rolls yielded the following data, in
degrees.

58.6 58.7 57.3 54.5 52.9


59.5 29.4 43.4 31.8 52.3
42.7 34.8 39.2 61.3 60.4
51.5 42.8 57.5 43.6 47.6

At the 5% significance level, do the data provide sufficient evidence to conclude that, on
average, the angle between the body and a head of an alligator during a death roll is
greater than 45◦ ? (To add more depth to the analysis, use the interpretations at the end
of Section 7.3.)

106
8. (Berenson, Krehbiel, and Levine, 2012) In a recent year, the Federal Communications
Commission reported that the mean wait for repairs for Verizon customers was 36.5
hours. In an effort to improve this service, suppose that a new repair service process
was developed. This new process, used for a sample of 100 repairs, resulted in a sample
mean of 34.5 hours and a sample standard deviation of 11.7 hours. By constructing
95% confidence interval around the sample mean, is there evidence that the population
mean amount is less than 36.5 hours? (To add more depth to the analysis, use the
interpretations at the end of Section 7.3.)

9. (Ang, 2012) The Gibbs Baby Food Company wishes to compare the weight gain of infants
using their brand versus their competitor’s. A sample of 15 babies using the Gibbs
products revealed a mean weight gain of 7.6 pounds in the first 3 months after birth.
The standard deviation of the sample was 2.3 pounds. A sample of 10 babies using the
competitor’s brand revealed a mean increase in weight of 8.1 pounds, with a standard
deviation of 2.9 pounds. At the 0.05 level of significance, can we conclude that babies
using the Gibbs brand gained less weight? (To add more depth to the analysis, use the
interpretations at the end of Section 7.3.)

10. (Triola, 2018) In a survey of 3005 adults aged 57 through 85 years, it was found that
81.7% of them used at least one prescription medication (based on data from ”Use of
Prescription and Over-the-Counter Medications and Dietary Supplements Among Older
Adults in the United States,” by Qato et al., Journal of the American Medical Association,
Vol. 200, No. 24). Use a 0.01 significance level to test the claim that more than 34 of
adults use at least one prescription medication. Does the rate of prescription use among
adults appear to be high? (To add more depth to the analysis, use the interpretations at
the end of Section 7.3.)

7.5 Two-Population Hypothesis Testing (Independent Sam-


pling)
Now, let’s extend the hypothesis testing from one population to two.
A bank manager has developed a new system to reduce the time customers spend waiting
to be served by tellers during peak business hours. We let µ1 denote the population mean
customer waiting time during peak business hours under the current system. To estimate µ1 ,
the manager randomly selects n1 = 100 customers and records the length of time each customer
spends waiting for service. The manager finds that the mean and the variance of the waiting
times for these 100 customers are x1 = 8.79 minutes and s21 = 4.8237. We let µ2 denote the
population mean customer waiting time during the peak business hours for the new system.
During a trial run, the manager finds that the mean and the variance of the waiting times for
a random sample of n2 = 100 customers are x2 = 5.14 minutes and s22 = 1.7927.
Such comparisons between two independent systems are one such study that utilizes a
two-population hypothesis testing. Let X1 and X2 be the random variables associated with
the average waiting time during peak business hours in the current and the new systems,
respectively. We want to know X1 − X2 .
From the previous chapter, we know that E(X1 ) = µ1 and E(X2 ) = µ2 . By the linearity of
expected values, E(X1 − X2 ) = µ1 − µ2 .

107
σ12 σ22 σ12 σ22
Moreover, we know that V ar(X1 ) = n1
and V ar(X2 ) = n2
. Hence, V ar(X1 ) − X2 = n1
+ n2
.
Thus, we have the sampling distribution of X1 − X2 .

Definition 7.5.1. (Weiss, 2017)

Suppose that X1 and X2 are normally distributed random variables on each of the two
populations. Then, for independent samples of size n1 and n2 from the two populations,

• µX1 −X2 = µ1 − µ2 ,
q 2
σ σ2
• σX1 −X2 = n11 + n22 , and

• X1 − X2 is normally distributed.

Thus, the test statistic of differences between two independent sample means are in the
form

(X1 − X2 ) − (µ1 − µ2 )
Z= q 2 (7.1)
σ1 σ2
n1
+ n22

Most of the time, we do not know the population variances. Hence, one assumption an
analyst can do is the equality of these two variances by pooling sample variances. Given s21
and s22 as the sample variances on each of the two populations, the pooled estimate s2p is the
weighted average of the individual sample variances. Mathematically,

(n1 − 1)s21 + (n2 − 1)s22


s2p = (7.2)
n1 + n2 − 2

Since s2p is a pooled estimate, σ12 = σ22 = s2p will be assumed. Thus, we have the following:

(X1 − X2 ) − (µ1 − µ2 )
t= q (7.3)
s2p ( n11 + n12 )

Since we are using sample estimates, the t-distribution will be used. Hence, the correspond-
ing 100(1 − α)% confidence interval around the differences of sample means will be
r r
1 1 1 1
[(x1 − x2 ) − t α2 s2p ( + ), (x1 − x2 ) + t α2 s2p ( + )] (7.4)
n1 n2 n1 n2

Note that t α2 is based on (n1 + n2 − 2) degrees of freedom.

Definition 7.5.2. (Bowerman, Murphree, and O’Connell, 2014)

Suppose we have randomly selected independent samples from two normally distributed pop-
ulations having equal variances. Then, a 100(1 − α)% confidence interval for µ1 − µ2 is given
in (7.4) where the pooled estimate s2p is given in equation (7.2) and t α2 is based on (n1 + n2 − 2)
degrees of freedom.

108
If the variances are unequal, then the pooled estimate will no longer be used. Instead, we
have the following 100(1 − α)% confidence interval:
s s
s21 s22 s21 s2
[(x1 − x2 ) − t α2 + , (x1 − x2 ) + t α2 + 2] (7.5)
n1 n2 n1 n2

In equation (7.5), t α2 is based on

(A + B)2
df = A2 2 (7.6)
n1 −1
+ nB2 −1
s21 s22
degrees of freedom, where A = n1
and B = n2
.
As a rule of thumb, if the sample sizes n1 and n2 are equal, and the larger sample variance is
not more than three times the smaller sample variance, the analyst can use the equal variances
procedure.
Otherwise, if the sample sizes n1 and n2 are equal, and the larger sample variance is more
than three times the smaller sample variance, the analyst can use the unequal variances
procedure. The same procedure can be used if both the sample sizes and the sample variances
differ substantially.

Example 7.5.1. (Bowerman, Murphree, and O’Connell, 2014)

Suppose we have taken independent, random samples of sizes n1 = 7 and n2 = 7 from two
normally distributed populations having means µ1 and µ2 , and suppose we obtain x1 = 240,
x2 = 210, s1 = 5, and s2 = 6. Calculate a 95% confidence interval for µ1 − µ2 using:

a. the equal variances procedure

b. the unequal variances procedure

SOLUTION:

a. Note that s21 = 25 and s22 = 36, which makes the latter not more than three times the
former. Moreover, since 7 = n1 = n2 = 7, the equal variances procedure can be used.
(7−1)(25)+(7−1)(36) (6)(25)+(6)(36) 61
The pooled estimate s2p is given by 7+7−2
= 12
= 2
= 30.5.

Note that t 0.05 is based on 7 + 7 − 2 = 12 degrees of freedom. Hence, by Figure A3,


2
t 0.05 = 2.179.
2

This means that the 95% confidence interval around the differences of sample means will
be

q q
[(240 − 210) − (2.179) 30.5( 71 + 17 ), (240 − 210) + (2.179) 30.5( 71 + 17 )] ≈
[23.5676, 36.4324]

109
25 36 ( 25 + 36 )2
b. Note that from equation (7.6), A = 7
and B = 7
. Thus, t 0.05 is based on 25
7
( 7 )2
7
36
( 7 )2
≈ 12
2
7−1
+ 7−1
(round down).

With df , t 0.05 = 2.179 based from Figure A3.


2

This means that the 95% confidence interval around the differences of sample means will
be
q q
[(240 − 210) − (2.179) 25
7
+ 36
7
, (240 − 210) + (2.179) 25
7
+ 36
7
] ≈ [23.5676, 36.4324]

In hypothesis testing, for a real number D0 ,

H0 : µ1 − µ2 = D0

Ha : µ1 − µ2 > D0 or µ1 − µ2 < D0 or µ1 − µ2 6= D0

Example 7.5.2. (Anderson, Sweeney, and Williams, 2011)

Consider the following hypothesis test.

H0 : µ1 − µ2 = 0

Ha : µ1 − µ2 6= 0

The following results are from independent samples taken from two populations: n1 = 35,
n2 = 40, x1 = 13.6, x2 = 10.1, s1 = 5.2, and s2 = 8.5.

a. What is the value of the test statistic?


b. What is the degrees of freedom for the t distribution?
c. What is the p-value?
d. At α = 0.05, what is your conclusion?

SOLUTION:
(10.1−13.6)−0
a. Here, we will be using equation (7.1) where σ12 = s21 and σ22 = s22 ; t = q
(5.2)2 (8.5)2
≈ -1.91
35
+ 40

b. Since the sample sizes and the sample variances are different, we will assume that the
2
population variances are unequal. Thus, we will be using equation (7.6). With A = (5.2)
35
2
and B = (8.5)
40
, df ≈ 65 (round down).
c. Since the t curve is symmetric, 0.025 < P (t < −1.91) = P (t > 1.91) < 0.05. This is
because from Figure A4, the value of 1.91 lies between 1.669 and 1.997 at the column
of df = 65. Hence, P (t < −1.91) = P (t > 1.91) must lie between 0.025 and 0.05.
d. At α = 0.05, we have strong evidence that H0 is false. Hence, we reject H0 .

110
7.6 One-Way Analysis of Variance
In many statistical studies, a variable of interest, called the response variable (or dependent
variable), is identified. Then, data are collected that tell us about how one or more factors (or
independent variables) influence the variable of interest. If we cannot control the factor(s)
being studied, we say that the data obtained are observational. For example, suppose that
in order to study how the size of a home relates to the sales price of the home, a real estate
agent randomly selects 50 recently sold homes and records the square footages and sales prices
of these homes. Because the real estate agent cannot control the sizes of the randomly selected
homes, we say that the data are observational.
If we can control the factors being studied, we say that the data are experimental. Fur-
thermore, in this case the values, or levels, of the factor (or combination of factors) are called
treatments. The purpose of most experiments is to compare and estimate the effects
of the different treatments on the response variable. For example, suppose that an oil
company wishes to study how three different gasoline types (A, B, and C) affect the mileage ob-
tained by a popular compact automobile model. Here, the response variable is gasoline mileage,
and the company will study a single factor - gasoline type. Because the oil company can con-
trol which gasoline type is used in the compact automobile, the data that the oil company will
collect are experimental. Furthermore, the treatments - the levels of the factor gasoline type -
are gasoline types A, B, and C.
In order to collect data in an experiment, the different treatments are assigned to objects
(people, cars, animals, or the like) that are called experimental units. For example, in the
gasoline mileage situation, gasoline types A, B, and C will be compared by conducting mileage
tests using a compact automobile. The automobiles used in the tests are experimental units.
In general, when a treatment is applied to more than one experimental unit, it is said to be
replicated. Furthermore, when the analyst controls the treatments employed and how they are
applied to the experimental units, a designed experiment is being carried out. A commonly
used, simple experiment design is called the completely randomized experimental design.

Definition 7.6.1. (Bowerman, Murphree, and O’Connell, 2014)

In a completely randomized experimental design, independent random samples of


experimental units are assigned to the treatments.

Suppose we wish to study the effects of p treatments (treatments 1, 2, ..., p) on a response


variable. For any particular treatment, say treatment i, we define µi and σi to be the mean
and standard deviation of the population of all possible values of the response variable that
could potentially be observed when using treatment i. Here we refer to µi as treatment
mean i. The goal of one-way analysis of variance (often called one-way ANOVA) is
to estimate and compare the effects of the different treatments on the response variable. We
do this by estimating and comparing the treatment means µ1 , µ2 , µ3 , ..., µp . Here, we
assume that a sample has been randomly selected for each of the p treatments by employing
a completely randomized experimental design. We let ni denote the size of the sample that
has been randomly selected for treatment i, and we let xij denote the j th value of the response
variable that is observed when using treatment i. It then follows that the point estimate of
µi is xi , the average of the sample of ni values of the response variable observed when using

111
treatment i. It further follows that the point estimate of σi is si , the standard deviation of the
sample of ni values of the response variable observed when using treatment i.
For example, consider the gasoline mileage situation. We let µA , µB , and µC denote the
means and σA , σB , and σC denote the standard deviations of the populations of all possible
gasoline mileages using gasoline types A, B, and C. If there are any statistically significant
differences among treatment means, we wiill estimate the magnitudes of these differences.
The one-way analysis of variance (ANOVA) formulas allows us to test for significant differ-
ences among treatment means and allow us to estimate differences among them. The validity
of these formulas requires that the following assumptions hold:

1. Constant Variance - the p populations of values of the response variable associated


with the treatments have equal variances

2. Normality - the p populations of values of the response variable associated with the
treatments all have normal distributions

3. Independence - the samples of experimental units associated with the treatments are
randomly selected, independent samples

As a preliminary step in one-way ANOVA, we wish to determine whether there are any
statistically significant differences between the treatment means µ1 , µ2 , ..., µp . To do this, we
test the null hypothesis

H0 : µ1 = µ2 = ... = µp (7.7)

This hypothesis says that all the treatments have the same effect on the mean response. We
test H0 versus the alternative hypothesis Ha

Ha : At least two of µ1 , µ2 , ..., µp differ

This alternative says that at least two treatments have different effects on the mean response.
To carry out such a test, we first consider the measure of variation among the sample
means. In hypothesis tsts for two population means, we measure the variation between the
two sample means by calculating their difference, x1 − x2 . When more than two populations
are involved, we cannot measure the variation among the sample means simply by taking a
difference. However, we can measure that variation among the sample means simply by taking
a difference. However, we can measure that variation by computing the standard deviation or
variance of the sample means or by computing any descriptive statistic that measures variation.
In one-way ANOVA, we measure the variation among the sample means by a weighted
average of their squared deviations about the mean, x, of all the sample data. That measure
of variation is called the treatment mean square, MSTR, and is defined as
SST R
M ST R = (7.8)
k−1
where k denotes the number of populations being sampled and

SST R = n1 (x1 − x)2 + n2 (x2 − x)2 + ... + nk (xk − x)2 (7.9)

112
with µj , xj , sj , and nj as the mean, sample mean, sample standard deviation, and sample
size, respectively, for population j. SST is called the treatment sum of squares.
Next, we consider the measure of variation within the samples. This measure is the pooled
estimate of the common population variance, σ 2 . It is called the error mean square, MSE,
and is defined as

SSE
M SE = , (7.10)
n−k
where n denotes the total number of observations and

n1
X n2
X nk
X
2 2
SSE = (x1j − x1 ) + (x2j − x2 ) + ... + (xkj − xk )2 (7.11)
j=1 j=1 j=1

SSE is known as the error sum of squares. Note that the total sum of squares (SST)
equals the treatment sum of squares plus the error sum of squares. Mathematically,

SST = SST R + SSE (7.12)

The one-way ANOVA identity shows that the total variation among all the observations
can be partitioned into two components. The partitioning of the total variation among all the
observations into two or more components is fundamental not only in one-way ANOVA but
also in all types of ANOVA.
Finally, we consider how to compare the variation among the sample means, M ST R, to the
variation within the samples, M SE. To do so, we use the statistic
M ST R
F = , (7.13)
M SE
which we refer to as the F-statistic.
A variable is said to have an F-distribution if its distribution has the shape of a special
type of right-skewed curve, called an F-curve. An F-distribution, however, has two numbers
of degrees of freedom instead of one. The first number of degrees of freedom for an F -curve (as
shown in Figure 25) is called the degrees of freedom for the numerator, and the second
is called the degrees of freedom for the denominator.

Figure 25: The Shape of the F-Curve

Thus, for the F -curve in Figure 25 with df = (10, 2), we have 10 as the degrees of freedom
for the numerator and 2 as the degrees of freedom for the denominator.

113
To get the p-values necessary for hypothesis testing, Figures A5-A12 present tables of
values of Fα . Note that dfn is the degrees of freedom for the numerator; dfd, for the denomi-
nator.
Thus, we can say that the F-statistic in equation (7.13) follows an F-distribution with
df = (k − 1, n − k).
We can reject H0 in favor of Ha at level of significance α if either of the following equivalent
conditions holds:
1. F > Fα
2. p-value < α

If the one-way ANOVA F test says that at least two treatment means differ, then we
investigate which treatment means differ, and we estimate how large the differences are. We do
this by making what we call pairwise comparisons (that is, we compare treatment means two
at a time). One way to make these comparisons is to compute point estimates and confidence
intervals for pairwise differences.
There are two approaches to calculating confidence intervals for pairwise differences. The
first involves computing the usual, or individual, confidence interval for each pairwise
difference. Here, if we are computing 100(1 − α)% confidence intervals, we are 100(1 − α)%
confident that each individual pairwise difference is contained in its respective interval. That is,
the confidence level associated with each (individual) comparison is 100(1 − α)%, and we refer
to α as the comparisonwise error rate. However, we are less than 100(1 − α)% confident
that all of the pairwise differences are simultaneously contained in their respective intervals.
A more conservative approach is to compute simultaneous confidence intervals. Such
intervals make us 100(1 − α)% confident that all of the pairwise differences are simultaneously
contained in their respective intervals. That is, when we compute simultaneous intervals, the
overall confidence level associated with all the comparisons being made in the experiment is
100(1 − α)%, and we refer to α as the experimentwise error rate.
Several kinds of simultaneous confidence intervals can be computed. In this book, we present
what is called the Tukey formula for simultaneous intervals. We do this because, if we are
interested in studying all pairwise differences between treatment means, the Tukey formula yields
the most precise (shortest) simultaneous confidence intervals.
Definition 7.6.2. (Bowerman, Murphree, and O’Connell, 2014)
1. Consider the pairwise difference µj − µk , which can be interpreted to be the change in
the mean value of the response variable associated with changing from using treatment k
to using treatment j. Then, a point estimate of the difference µj − µk is xj − xk ,
where xj and xk are the sample treatment means associated with treatments j and k.
2. A Tukey simultaneous 100(1 − α)% confidence interval for µj − µk is
s s
qα 1 1 qα 1 1
[(xj − xk ) − √ M SE( + ), (xj − xk ) + √ M SE( + )] (7.14)
2 nj nk 2 nj nk

Here, the value qα is obtained from Figures A13-A14, which is a table of percentage
points of the studentized range. In this table, qα is listed corresponding to values of
κ = k and ν = n − k.

114
3. A point estimate of the treatment mean µj is xj and an individual 100(1 − α)%
percent confidence interval for µk is
s s
M SE M SE
[xj − t α2 , xj + t α2 ] (7.15)
nj nj

Here, the t α2 point is based on n − k degrees of freedom.

Example 7.6.1. (Bowerman, Murphree, and O’Connell, 2014)

A consumer preference study compares the effects of three different bottle designs (A, B,
and C) on sales of a popular fabric softener. A completely randomized design is employed.
Specifically, 15 supermarkets of equal sales potential are selected, and 5 of these supermarkets
are randomly assigned to each bottle design. The number of bottles sold in 24 hours at each
supermarket is recorded. The data obtained are displayed in Table 45. Let µA , µB , and µC
represent mean daily sales using the bottle designs A, B, and C, respectively.

Bottle Design
A B C
16 33 23
18 31 27
19 37 21
17 29 28
13 34 25
Table 45: Bottle Design Study Data

a. Test the null hypothesis that µA , µB , and µC are equal by setting α = 0.05. That is, test
for statistically significant differences between these treatment means at the 0.05 level of
significance. Based on this test, can we conclude that bottle designs A, B, and C have
different effects on mean daily sales?
b. Consider the pairwise differences µB −µA , µC −µA , and µC −µB . Find a point estimate of
and a Tukey simultaneous 95% confidence interval for each pairwise difference. Interpret
the results in practical terms. Which bottle design maximizes mean daily sales?
c. Find and interpret a 95% confidence interval for each of the treatment means µA , µB ,
and µC .

SOLUTION:

a. Note that xA = 16.6, xB = 32.8, xC = 24.8, and nA = nB = nC = 5. Moreover,


x ≈ 24.73. Using equations (7.9) and (7.11),

SST = 5(16.6 − 24.73)2 + 5(32.8 − 24.73)2 + 5(24.8 − 24.73)2 ≈ 656.13

SSE = (16 − 16.6)2 + (18 − 16.6)2 + ... + (13 − 16.6)2 + (33 − 32.8)2 + (31 − 32.8)2 +
... + (34 − 32.8)2 + (23 − 24.8)2 + (27 − 24.8)2 + ... + (25 − 24.8)2 = 90.8

115
We now summarize the one-way ANOVA.

Source of Variation df SS MS F-statistic


656.13 328.07
Treatment 2 656.13 2
≈ 328.07 ≈ 43.36
7.57
90.8
Error 12 90.8 12
≈ 7.57
Total 14 746.93

Hence, the F -statistics observes an F-distribution with df = (2, 12). At α = 0.05, from
Figure A7, Fα = 3.89 < F ≈ 43.36. Thus, we reject the null hypothesis that the
treatment means are equal.

b. Note that κ = 3 and ν = 12. Thus, at α = 0.05, qα = 3.77 according to Figure A14.
Using (7.14), we have:

Pairwise Difference Point Estimate Confidence Interval


µB − µA 32.8 − 16.6 = 16.2 [11.56, 20.84]
µC − µA 24.8 − 16.6 = 8.2 [3.56, 12.84]
µC − µB 24.8 − 32.8 = −8 [−12.64, −3.36]

The results show that we can be 95% confident that changing from bottle design A to B
increases the mean daily sales by between 11.56 and 20.84 while changing from bottle de-
sign A to C increases the mean daily sales by between 3.56 and 12.84. However, changing
from bottle design B to C decreases the mean daily sales by between 3.36 and 12.64. Thus,
bottle design B maximizes mean daily sales.

c. Note that n−k = 12. Thus, at α = 0.05, t α2 = 2.179. Using (7.15), we have the following:

Treatment Means Point Estimate Confidence Interval


µA 16.6 [10.18, 23.02]
µB 32.8 [26.38, 39.22]
µC 24.8 [18.38, 31.22]

The results show that we can be 95% confident that the mean daily sales of a fabric softener
with design A is between 10.18 and 23.02; with design B, between 26.38 and 39.22; with
design C, between 18.38 and 31.22.

7.7 Chi-Square Tests for Independence


In this section of our lecture notes, we will now test the existence of a relationship between
any two variables. The chi-square test for independence accomplishes such task as an
application of the hypothesis testing.
Before we proceed with the test, let us define first what a chi-square distribution is.
A variable has a chi-square distribution if its distribution has the shape of a special type
of right-skewed curve, called a chi-square (χ2 ) curve. Actually, there are infinitely many

116
chi-square distributions, and we identify the chi-square distribution (and χ2 -curve) in question
by its number of degrees of freedom, just as we did for t-distributions. Figure 26 shows three
χ2 -curves with different degrees of freedom.

Figure 26: χ2 -curves for df = 5, 10, 19

To get the p-values necessary for hypothesis testing, Figures A15-A16 present tables of
values of χ2α .
As discussed in Chapter 2, contingency tables measure the ”state” in terms of two variables,
the ones in the rows and the ones in the columns. In a test of independence, we test the null
hypothesis that in a contingency table, the row and column variables are independent. This
means that there is no dependency between the row variable and the column variable.
Let O represents the observed frequency in a cell of a contingency table; E, the expected
frqeuency in a cell, found by assuming that the row and column variables are independent;
r, the number of rows in a contingency table (not including labels or row totals); c, represents
the number of columns in a contingency table (not including labels or column totals).
The null and alternative hypotheses are as follows:

H0 : The row and column variables are independent.


Ha : The row and column variables are dependent.

The χ2 test statistic for a test of independence is given by

(O − E)2
P
2
χ = (7.16)
E
where O is the observed frequency in a cell, and E is the expected frequency in a cell that
is found by evaluating

row total column total (row total)(column total)


E = (grand total)( )( )= (7.17)
grand total grand total (grand total)

Define the p-value related to χ2 to be the area under the curve of the chi-square distribution
having (r − 1)(c − 1) degrees of freedom to the right of χ2 . Then, we can reject H0 in favor of
Ha at level of significance α if either of the following equivalent conditions holds:

1. χ2 > χ2α

2. p-value < α

117
Here, the χ2α point is based on (r − 1)(c − 1) degrees of freedom.

Example 7.7.1. (Weiss, 2017)

A worldwide poll on religion was conducted by WIN-Gallup International and published as


the document Global Index of Religiosity and Atheism. One question involved religious belief
and educational attainment. The following data is based on the answers to that question.

(Observed) Basic Secondary Advanced Total


Religious 77 149 78 304
Not Religious 23 56 36 115
Atheist 8 24 29 61
Don’t Know 6 15 8 29
Total 114 244 151 509
Table 46: The Contingency Table of the Religiosity and the Education Attainment of 509
Respondents (Observed)

At the 5% significance level, do the data provide sufficient evidence to conclude that an
association exists between religiosity and education?
SOLUTION: Table 46 presents observed data O. To get the expected value of religious
and basic-education respondents ER,B , using equation (7.17),

(304)(114)
ER,B = 509
≈ 68.09

The expected value of religious and secondary-education respondents ER,S is given by

(304)(244)
ER,S = 509
≈ 145.73

Table 47 presents the expected data E.

(Expected) Basic Secondary Advanced


Religious 68.09 145.73 90.18
Not Religious 25.76 55.13 34.12
Atheist 13.66 29.24 18.10
Don’t Know 6.50 13.90 8.60
Table 47: The Contingency Table of the Religiosity and the Education Attainment of 509
Respondents (Expected)

Thus, with the two tables, the χ2 statistic is given by

(77−68.09)2 (149−145.73)2 (15−13.90)2 (8−8.60)2


χ2 = 68.09
+ 145.73
+ ... + 13.90
+ 8.60
≈ 13.32
Note that χ2α is based on (r − 1)(c − 1) = (4 − 1)(3 − 1) = 6 degrees of freedom. Hence,
at α = 5%, using Table A16, χ2α = 12.592 < 13.32 = χ2 . Thus, we reject the null
hypothesis that the the religiosity and the educational attainment of respondents
are independent. Thus, the data provide sufficient evidence to conclude that the two variables
are associated.

118
7.8 Simple Linear Regression
Suppose that an analyst plots data points and resemble something similar to Figure 27.

Figure 27: A Sample Scatter Plot

As discussed in Chapter 2, this is called a scatter plot. It does not seem to resemble any
familiar curves. What an analyst can do is to find the relationship between the independent
variable x and the dependent variable y based from the given points. One way to do that is
through a process of modelling phenomena called linear regression.
The simple linear regression model assumes that the relationship between the depen-
dent variable, which is denoted by y, and the independent variable, denoted by x, can be
approximated by a straight line. It can be expressed as follows:

y = β0 + β1 x +  (7.18)

1. The quantity β0 + β1 x is the mean value of the dependent variable y when the value of
the independent variable is x.

2. β0 is the y-intercept. β0 is the mean value of y when x = 0.

3. β1 is the slope. β1 is the change (amount of increase or decrease) in the mean value of y
associated with a one-unit increase in x. If β1 is positive, the mean value of y increases
as x increases. If β1 is negative, the mean value of y decreases as x increases.

4.  is an error term that describes the effects on y of all factors other than the value of
the independent variable x.

The y-intercept β0 and the slope β1 are called regression parameters. In addition, we
have interpreted the slope β1 to be the change in the mean value of y associated with a one-unit
increase in x. We sometimes refer to this change as the effect of the independent variable x on
the dependent variable y. However, we cannot prove that a change in an independent variable
causes a change in the dependent variable. Rather, regression can be used only to establish that
the two variables move together and that the independent variable contributes information for
predicting the dependent variable. For instance, regression analysis might be used to establish

119
that as liquor sales have increased over the years, college professors’ salaries have also increased.
However, this does not prove that increases in liquor sales cause increases in college professors’
salaries. Rather, both variables are influenced by a third variable —long-run growth in the
national economy.
Suppose that we have gathered n observations (x1 , y1 ), (x2 , y2 ), ..., (xn , yn ) where each
observation consists of a value of an independent variable x and a corresponding value of a
dependent variable y. Also, suppose that a scatter plot of the n observations incates that the
simple linear regression model relates y to x. In order to estimate the y-intercept β0 and the
slope β1 of the line of means of this model, we could visually draw a line —called an estimated
regression line —through the scatter plot. Then, we could read the y-intercept and slope
off the estimated regression line and use these values as the point estimates of β0 and β1 .
Unfortunately, if different people visually drew lines through the scatter plot, their lines would
probably differ from each other. What we need is the ”best line” that can be drawn through
the scatter plot. Although there are various definitions of what this best line is, one of the most
useful best lines is the least squares line.
To understand the least squares line, we let

ŷ = b0 + b1 x (7.19)
denote the general equation of an estimated regression line drawn through a scatter plot.
Here, because we will use this line to predict y on the basis of x, we call ŷ the predicted value
of ywhen the value of the independent variable is x. In addition, b0 is the y-intercept, and b1
is the slope of the estimated regression line. When we determine numerical values for b0 and
b1 , these values will be the point estimates of the y-intercept β0 and the slope β1 of the line of
means. The least squares line is the line that minimizes the sum of squared residuals. That is,
the least squares line is the line positioned on the scatter plot so as to minimize the sum of the
squared vertical distances between the observed and predicted values.
To define the least squares line in a general situation, consider an arbitrary observation
(xi , yi ) in a sample of n observations. For this observation, the predicted value of the
dependent variable y is given by an estimated regression line is

ŷi = b0 + b1 xi (7.20)
Furthermore, the difference between the observed and predicted values of y, yi − ŷi , is the
residual for the observation, and the sum of squared residuals for all n observations is
n
X
SSE = (yi − ŷi )2 (7.21)
i=1

The least squares line is the line that minimzes SSE. To find this line, we find the values
of the y-intercept b0 and slope b1 that give values of ŷi = b0 + b1 xi that minimize SSE. These
values of b0 and b1 are called the least squares point estimates of β0 and β1 . Using calculus,
it can be shown that these estimates are calculated as follows:

Definition 7.8.1. (Bowerman, Murphree, and O’Connell, 2014)

For the simple linear regression model:

120
1. The least squares point estimate of the slope β1 is

SSxy
b1 = (7.22)
SSxx
where

P P
X X ( xi )( yi )
SSxy = (xi − x)(yi − y) = xi y i − (7.23)
n
and

( xi )2
X X P
2 2
SSxs = (xi − x) = xi − (7.24)
n
2. The least squares point estimate of the y-intercept β0 is

b 0 = y − b1 x (7.25)

Here n is the number of observations (an observation is an observed value of x and its corre-
sponding value of y).

Note that when we are using a least squares regression line, we should not estimate a mean
value or predict an individual value unless the corresponding value of x is in the experimental
region - the range of the previously observed values of x. Often the value x = 0 is not in the
experimental region. In such a situation, it would not be appropriate to interpret the y-intercept
b0 as the estimate of the mean value of y when x = 0.
We now present a general procedure for estimating a mean value and predicting an individual
value:

Definition 7.8.2. (Bowerman, Murphree, and O’Connell, 2014)

Let b0 and b1 be the least squares point estimates of the y-intercept β0 and the slope β1 in
the simple linear regression model, and suppose that x0 , a specified value of the independent
variable x, is inside the experimental region. Then,

ŷ = b0 + b1 x0 (7.26)

1. is the point estimate of the mean value of the dependent variable when the value
of the independent variable is x0 .

2. is the point prediction of an individual value of the dependent variable when


the value of the independent variable is x0 . Here, we predict the error term to be 0.

Example 7.8.1. (Weiss, 2017)

Tax efficiency is a measure, ranging from 0 to 100, of how much tax due to capital gains
stock or mutual funds investors pay on their investments each year; the higher the tax efficiency,
the lower is the tax. In the article, ”At the Mercy of the Manager” (Financial Planning,

121
Vol. 30(5), pp. 54-56), C. Israelsen examined the relationship between investments in mutual
fund portfolios and their associated tax efficiencies. The following table shows percentage of
investments in energy securities (x) and tax efficiency (y) for 10 mutual fund portfolios.

x y
3.1 98.1
3.2 94.7
3.7 92.0
4.3 89.8
4.0 87.5
5.5 85.0
6.7 82.0
7.4 77.8
7.4 72.1
10.6 53.5

Summarized below are the needed values in computing for the point estimates of the param-
eters of the simple linear regression model.

x y x2 xy
3.1 98.1 9.61 304.11
3.2 94.7 10.24 303.04
3.7 92.0 13.69 340.4
4.3 89.8 18.49 386.14
4.0 87.5 16 350
5.5 85.0 30.25 467.5
6.7 82.0 44.89 549.4
7.4 77.8 54.76 575.72
7.4 72.1 54.76 533.54
10.6 53.5 112.36 567.1

P P
P Note
2
that n = 10, P x = 5.59, and y = 83.25. Moreover, x = 55.9, y = 83.25,
x = 365.05, and xy = 4376.95. Therefore, using equations (7.23) and (7.24), SSxy =
(55.9)(83.25) 2
4376.95 − 10
≈ −276.73 and SSxx = 365.05 − (55.9)
10
≈ 52.57. Hence, b1 = −276.73
52.57

−5.26, and b0 = 83.25 − (−5.26)(5.59) ≈ 112.68.
Therefore, the regression equation between investments in energy securities and tax efficiency
is given by

ŷ = 112.68 − 5.26x (7.27)

whose graph is presented in Figure 28.

122
Figure 28: The Regression Line between Investments in Energy Securities and Tax Efficiency

Given x values, we can find the predicted values ŷi and their squared errors from the observed
y values, summarized in the table below.

x y ŷ Squared Errors
3.1 98.1 96.36 3.04
3.2 94.7 95.83 1.28
3.7 92.0 93.20 1.44
4.3 89.8 90.04 0.06
4.0 87.5 91.62 16.97
5.5 85.0 83.72 1.63
6.7 82.0 77.41 21.10
7.4 77.8 73.72 16.63
7.4 72.1 73.72 2.63
10.6 53.5 56.88 11.41

Given this regression model, SSE is the sum of all squared errors, which is approximately
76.18.
Note that the experimental region ranges between 3.1 and 10.6. Hence, we can make pre-
dictions on the tax efficiency due to investments in energy securities at x = 4.1 or x = 7.5.
However, predicting tax efficiency at x = 0 or at x = 15.1 is inconclusive since the x values are
beyond the experimental region.
In order to perform hypothesis tests and set up various types of intervals when using the
simple linear regression model similar to that of equation (7.18), we need to make certain
assumptions about the error term . At any given value of x, there is a population of error
term values that could potentially occur. These error term values describe the different potential
effects on y of all factors other than the value of x. Therefore, these error term values explain
the variation in the y values that could be observed when the independent variable is x. Our
statement of the simple linear regression model assumes that µy , the mean of the population
of all y values that could be observed when the independent variable is x, is β0 + β1 x. This
model also implies that  = y − (β0 + β1 x), so this is equivalent to assuming that the mean

123
of the corresponding population of potential error term values is 0. In total, we make four
assumptions (called the regression assumptions) about the simple linear regression model.
These assumptions can be stated in terms of potential y values or, equivalently, in terms of
potential error term values. Following tradition, we begin by stating these assumptions in terms
of potential error term values:

Definition 7.8.3. (Bowerman, Murphree, and O’Connell, 2014)

1. At any given value of x, the population of potential error term values has a mean equal
to 0.

2. Constant Variance Assumption: At any given value of x, the population of potential


error term values has a variance that does not depend on the value of x. That is, the
different populations of potential error term values corresponding to different values of x
have equal variances. We denote the constant variance as σ 2 .

3. Normality Assumption: At any given value of x, the population of potential error


term values has a normal distribution.

4. Independence Assumption: Any one value of the error term  is statistically inde-
pendent of any other value of . That is, the value of the error term  corresponding to an
observed value of y is statistically independent of the value of the error term corresponding
to any other observed value of y.

Taken together, the first three assumptions say that, at any given value of x, the population
of potential error term values is normally distributed with mean zero and a variance σ 2
that does not depend on the value of x. Because the potential error term values cause
the variation in the potential y values, these assumptions imply that the population of all y
values that could be observed when the independent variable is x is normally distributed
with mean β0 + β1 x and a variance σ 2 that does not depend on x.
To make statistical inferences, we need to be able to compute point estimates of σ 2 and σ,
the constant variance and standard deviation of the error term populations. The point estimate
of σ 2 is called the mean square error and the point estimate of σ is called the standard
error.

Definition 7.8.4. (Bowerman, Murphree, and O’Connell, 2014)

If the regression assumptions are satisfied, and SSE is the sum of squared residuals:

1. The point estimate of σ 2 is the mean square error

SSE
s2 = (7.28)
n−2
2. The point estimate of σ is the standard error
r
SSE
s= (7.29)
n−2

124
In order to understand these point estimates, recall that σ 2 is the variance of the population
of y values (for a given value of x) around the mean value µy . Because ŷ is the point estimate of
this mean, it seems natural to use equation (7.21) to help construct a point estimate of σ 2 . We
divide SSE by n − 2 because it can be proven that doing so makes the resulting s2 an unbiased
point estimate of σ 2 . Here, we call n − 2 the number of degrees of freedom associated with
SSE.
Example 7.8.2. (Continuation of Example 7.8.1)

From Example 7.8.1, note that SSE


√ = 76.18. Thus, with n = 10, using equations (7.28)
2 76.18
and (7.29), s = 10−2 ≈ 9.52 and s = s2 ≈ 3.09.

Suppose that the variables x and y satisfy the assumptions for regression inferences. then,
for each value x of the independent variable, the conditional distribution of the response variable
is a normal distribution with mean β0 + β1 x and standard deviation σ.
Of particular interest is whether the slope, β1 , of the population regression line equals 0. If
β1 = 0, then, for each value x of the predictor variable, the conditional distribution is a normal
distribution having mean β0 (= β0 + 0 · x), and standard deviation σ. Because x does not
appear in either of those two parameters, it is useless as a predictor of y.
Hence, we can decide whether x is useful as a (linear) predictor of y —that is, whether the
regression equation has utility —by performing the hypothesis test

H0 : β1 = 0 (x is not useful for predicting y)


Ha : β1 6= 0 (x is useful for predicting y)

We base hypothesis tests for β1 (the slope of the population regression line) on the statistic
b1 (the slope of a sample regression line). Note that for different samples, the slope, b1 , of the
sample regression line varies and is therefore a variable. Its distribution is called the sampling
distribution of the slope of the regression line.

Definition 7.8.5. (Weiss, 2017)

Suppose that the variables x and y satisfy the four assumptions for regresion inferences.
Then, for samples of size n, each with the same values x1 , x2 , ..., xn for the predictor variable,
the following properties hold for the slope, b1 , of the sample regression line:

1. The mean of b1 equals the slope of the population regression line; that is, we have µb1 = β1
(i.e., the slope of the sample regression line is an unbiased estimator of the slope of the
population regression line).

2. The standard deviation of b1 is σb1 = √σ .


Sxx

3. The variable b1 is normally distributed.

As a consequence, the standardized variable


b1 − β 1
z= (7.30)
√σ
Sxx

125
has the standard normal distribution, but this variable cannot be used as a basis for the
required test statistic because the common conditional standard deviation, σ, is unknown. We,
therefore, replace σ with its sample estimate s, the standard error of the estimate. As you
might suspect, the resulting variable has a t-distribution.

Definition 7.8.6. (Weiss, 2017)

Suppose that the variables x and y satisfy the four assumptions for regression inferences.
Then, for samples of size n, each with the same values of x1 , x2 , ..., xn for the predictor
variable, the variable
b1 − β1
t= (7.31)
√s
Sxx

has the t-distribution with df = n − 2.

As a result, for a hypothesis test with the null hypothesis H0 : β1 = 0, we can use the
variable
b1
t= (7.32)
√s
Sxx

as the test statistic and obtain the critical values or P -value from the t-table. We call this
hypothesis-testing procedure the regression t-test. We reject the null hypothesis if t > tα
(one-tailed tests) or t > t α2 (two-tailed tests) and p-value < α.
Now, we will be defining the confidence interval for the slope.
Definition 7.8.7. (Bowerman, Murphree, and O’Connell, 2014)

If the regression assumptions hold, a 100(1 − α%) confidence interval for the true
slope β1 is
s s
[b1 − t α2 √ , b1 + t α2 √ ] (7.33)
Sxx Sxx
Here, t α2 is based on n − 2 degrees of freedom.

We can also test the significance of the y-intercept β0 . We do this by testing the null
hypothesis H0 : β0 = 0 versus the alternative hypothesis Ha : β0 6= 0. If we can reject H0
in favor of Ha by setting the probability of a Type I error equal to α, we conclude
that the intercept β0 is significant at the α level. To carry out the hypothesis test, we
use the test statistic
b0
t= q (7.34)
x2
s n1 + SSxx

Example 7.8.3. (Continuation of Example 7.8.2)

Recall that x = 5.59, b0 ≈ 112.68, b1 ≈ −5.26, SSxx ≈ 52.57, n = 10, and s ≈ 3.09. Thus,
using equation (7.32),

126
−5.26
t= √3.09
≈ −12.37
52.57

At a significance level of 5%, with df = 10 − 2 = 8, t α2 = 2.306 < |t| = 12.37. Hence, we


reject the null hypothesis that β1 = 0.
At the same significance level, the 95% confidence interval for the true slope β1 is

[−5.26 − 2.306( √3.09


52.57
, −5.26 + 2.306( √3.09
52.57
)] ≈ [−6.25, −4.28]

which makes sense to reject the null hypothesis.


Using equation (7.34),
112.68
t= q
1 2
≈ 43.82 > 2.306 = t α2
3.09 10 + 5.59
52.57

at α = 5%. Thus, we reject the null hypothesis that β1 = 0.

As an alternative to the t test, in simple linear regression, you can use an F test to determine
whether the slope is statistically significant.
Definition 7.8.8. (Berenson, Krehbiel, and Levine, 2012)

The F test statistic is equal to the regression mean square (MSR) divided by the mean square
error (MSE).
M SR
F = (7.35)
M SE
where

(ŷi − y)2 X
P
M SR = = (ŷi − y)2 (7.36)
1
SSE
M SE = (7.37)
n−2
The F test statistic follows an F distribution with 1 and n − 2 degrees of freedom.

Using a level of significance α, the decision rules are reject H0 if F > Fα and p-value < α;
otherwise, do not reject H0 .
Table 48 presents the complete set of results into an analysis of variance (ANOVA) table.

Source df Sum of Squares Mean Square (Variance) F


SSR M SR
Regression 1 SSR M SR = 1
= SSR F = M SE

SSE
Error n−2 SSE M SE = n−2

Total n−1 SST

Table 48: The Analysis of Variance of Simple Linear Regression Models

127
Testing the significance of the regression relationship between y and x by using the overall
F statistic and its related p-value is equivalent to doing this test by using the t statistic and
its related p-value. Specifically, it can be shown that t2 = F and that (t α2 )2 based on n − 2
degrees of freedom equals Fα based on 1 numerator and n − 2 denominator degrees of freedom.
It follows that the critical value conditions

|t| > t α2 (7.38)


and

F > Fα (7.39)
are equivalent. Furthermore, the p-values related to t and F can be shown to be equal.

Example 7.8.4. (Continuation from Example 7.8.3)

Recall that y = 83.25, n = 10, and SSE = 76.18. Using the predicted data ŷ listed in
Example 7.8.1 and equation (7.36), M SR ≈ 1456.69. Thus, F = 1456.69
76.18 ≈ 152.98.
10−2

At α = 5%, F follows an F distribution with df = (1, 10−2) = (1, 8). Thus, Fα = 5.32 < F .
Hence, we reject the null hypothesis that β1 = 0.

7.9 Problem Set 7 (Due the Next Session)


1. (Larsen and Marx, 2018) Suppose that H0 : µX = µY is being tested against Ha : µX 6=
2
µY , where σX and σY2 are known to be 17.6 and 22.9, respectively. If n1 = 10, n2 = 20,
x = 81.6, and y = 79.9, what p-value would be associated with the observed Z ratio?

2. (Ang, 2012)The Gibbs Baby Food Company wishes to compare the weight gain of infants
using their brand versus their competitor’s. A sample of 15 babies using the Gibbs
products revealed a mean weight gain of 7.6 poinds in the first 3 months after birth.
The standard deviation of the sample was 2.3 pounds. A sample of 10 babies using the
competitor’s brand revealed a mean increase in weight of 8.1 pounds, with a standard
deviation of 2.9 pounds. At the 0.05 level of significance, can we conclude that babies
using the Gibbs brand gained less weight?

3. (Bowerman, Murphree, and O’Connell, 2014) In order to compare the durability of four
different brands of golf balls (ALPHA, BEST, CENTURY, and DIVOT), the National
Gold Association randomly selects five balls of each brand and places each ball into a
machine that exerts the force produced by a 250-yard drive. The number of simulated
drives needed to crack or chip each ball is recorded. The results are given in the table
below.

Alpha Best Century Divot


281 270 218 364
220 334 244 302
274 307 225 325
242 290 273 337
251 331 249 355

128
a. Test for statistically significant differences between the treatment means. Set α =
0.05.
b. Perform pairwise comparisons of the treatment means by using Tukey simultaneous
95% confidence intervals. Which brands are most durable? Find and interpret a
95% confidence interval for each of the treatment means.

4. (Triola, 2018) Many students have had the unpleasant experience of panicking on a test
because the first question was exceptionally difficult. The arrangement of test items
was studied for its effect on anxiety. The following scores are measures of ”debilitating
test anxiety,” which most of us call panic or blanking out (based on data from ”Item
Arrangement, Cognitive Entry Characteristics, Sex and Test Anxiety as Predictors of
Achievement in Examination Performance,” by Klimko, Journal of Experimental Educa-
tion, Vol. 52, No. 4.) Using the unequal variances procedure, is there sufficient evidence
to support the claim that the two populations of scores have different means? Is there
sufficient evidence to support the claim that the arrangement of the test items has an
effect on the score? Is the conclusion affected by whether the significance level is 0.05 or
0.01?

Questions Arranged from Easy to Difficult


24.64 39.29 16.32 32.83 28.02
33.31 20.60 21.13 26.69 28.90
26.43 24.23 7.10 32.86 21.06
28.89 28.71 31.73 30.02 21.96
25.49 38.81 27.85 30.29 30.72

Questions Arranged from Difficult to Easy


33.62 34.02 26.63 30.26
35.91 26.68 29.49 35.32
27.24 32.34 29.34 33.53
27.62 42.91 30.20 32.54

5. (Weiss, 2017) The National Association of Colleges and Employers (NACE) conducts
surveys on salary offers to college graduates by field and degree. Results are published in
Salary Survey. The following table provides summary statistics for starting salaries, in
thousands of dollars, to samples of bachelor’s-degree graduates in six fields.

Field nj xj sj
Business 46 55.1 5.6
Communications 11 44.6 4.7
Computer Science 30 59.1 4.0
Education 11 40.6 5.0
Engineering 44 62.6 5.7
Math and Sciences 18 43.0 4.8

129
At the 1% significance level, do the data provide sufficient evidence to conclude that a
difference exists in mean starting salaries among bachelor’s-degree in the six fields? Hint:
Recall that sP
nj 2
i=1 (xji − xj )
sj = (7.40)
nj − 1

6. (Berenson, Krehbiel, and Levine, 2012) The owner of a restaurant serving Continental-
style entrées has the business objective of learning more about the patterns of patron
demand during the Friday-to-Sunday weekend time period. Data were collected from 630
customers on the type of entrée ordered and the type of dessert ordered and organized
into the following table:

Type of Entrée
Type of Dessert Beef Poultry Fish Pasta Total
Ice Cream 13 8 12 14 47
Cake 98 12 29 6 145
Fruit 8 10 6 2 26
None 124 98 149 41 412
Total 243 128 196 63 630

At the 0.05 level of significance, is there evidence of a relationship between type of dessert
and type of entrée?

7. (Siegel, 2016) Your firm is considering expansion to a nearby city. A survey of employees
in that city, asked to respond to the question, ”Will business conditions in this area get
better, stay the same, or get worse?” produced the data set shown in the table below.

Managers Other Employees Total


Better 23 185
Same 37 336
Worse 11 161
Not Sure 15 87
Total

a. Fill in the ”Total” row and column.


b. Does the response appear to be independent of the employee’s classification? Why
or why not?

8. (Miah, 2016) While examining the relationship between water depth and dissolved oxygen
concentration in a pond, an aquaculturist recorded the following the observations:

Depth (cm) 10 20 30 40 50 60 70 80 90 100


DO (mg/L) 15 14 13 12 10 8 5 4 2 2

a. Establish the sample regression model.

130
b. Estimate the DO concentration at 105 cm depth.
c. Test if the coefficient is significant (use 5% level of significance).
d. Find the 95% confidence interval of the constant term.
e. Find the 95% confidence interval of the coefficient.

9. (McClave and Sincich, 2018) Marketers are keenly interested in how social media (e.g.,
Facebook, Twitter) may influence consumers who buy their products. Researchers at HP
Labs (Palo Alto, CA) investigated whether the volume of chatter on Twitter.com could be
used to forecast the box office revenues of movies (IEEE International Conference on Web
Intelligence and Intelligent Agent Technology, 2010). Opening weekend box office revenue
data (in millions of dollars) were collected for a sample of 24% movies. In addition,
the researchers computed each movie’s tweet rate, i.e., the average number of tweets (at
Twitter.com) referring to the movie per hour one wewek prior to the movie’s release.
The data (simulated based on information provided in the study) are listed in the table.
Assuming that movie revenue and tweet rate are linearly related, how much do you
estimate a movie’s opening weekend revenue to change as the tweet rate for the movie
increases by an average of 100 tweets per hour?

Tweet Rate Revenue (millions)


1365.8 142
1212.8 77
581.5 61
310.1 32
455 31
290 30
250 21
680.5 18
150 18
164.5 17
113.9 16
144.5 15
418 14
98 14
100.8 12
115.4 11
74.4 10
87.5 9
127.6 9
52.2 9
144.1 8
41.3 2
2.75 0.3

10. (Ang, 2011) A study regarding the relationship between age and the amount of pressure
sales personnel feel in relation to their jobs revealed the following sample information. At
the 1% level of significance, is there a relationship between job pressure and age?

131
Degree of Job Pressure
Age (years) Low Medium High
Less than 20 5 7 8
20 - 40 34 49 57
40 - 60 38 58 54
Above 60 18 37 35

CONGRATULATIONS FOR
(ALMOST) COMPLETING QMT 11!
WOOOOOO!!!!

132
Appendix A

Tables

Figure A1: The Cumulative Areas under the Standard Normal Curve (Bowerman, O’Connell,
and Murphree, 2014)

133
Figure A2: The Cumulative Areas under the Standard Normal Curve (continued)(Bowerman,
Murphree, and O’Connell, 2014)

134
Figure A3: The Values under the t Curve (Bowerman, Murphree, and O’Connell, 2014)

135
Figure A4: The Values under the t Curve (continued) (Bowerman, Murphree, and O’Connell,
2014)

136
Figure A5: The Values under the F -curve (Weiss, 2017)

137
Figure A6: The Values under the F -curve (continued) (Weiss, 2017)

138
Figure A7: The Values under the F -curve (continued) (Weiss, 2017)

139
Figure A8: The Values under the F -curve (continued) (Weiss, 2017)

140
Figure A9: The Values under the F -curve (continued) (Weiss, 2017)

141
Figure A10: The Values under the F -curve (continued) (Weiss, 2017)

142
Figure A11: The Values under the F -curve (continued) (Weiss, 2017)

143
Figure A12: The Values under the F -curve (continued) (Weiss, 2017)

144
Figure A13: The Percentage Points of the Studentized Range (Weiss, 2017)

145
Figure A14: The Percentage Points of the Studentized Range (continued) (Weiss, 2017)

146
Figure A15: The Upper and Lower Percentiles of χ2 Distributions (Weiss, 2017)

147
Figure A16: The Upper and Lower Percentiles of χ2 Distributions (continued) (Weiss, 2017)

148
Bibliography

[1] Bruce Bowerman, Emily Murphree, and Ricard O’Connell. Business Statistics in Practice,
Seventh Edition. The McGraw-Hill Companies, Inc., New York, 2014.

[2] James Stewart. Single Variable Calculus Early Transcendentals, Eighth Edition. Cengage
Learning, Boston, Massachusetts, 2016.

[3] Aczel A. and Sounderpandian J. Business Statistics. The McGraw-Hill Companies, Inc.,
New York, 2008.

[4] Abdul Quader Miah. Applied Statistics for Social and Management Sciences. Springer
Science+Business Media, Singapore, 2016.

[5] Andrew Siegel. Practical Business Statistics, Seventh Edition. Elsevier Inc., Seattle, 2016.

[6] David Anderson, Dennis Sweeney, and Thomas Williams. Statistics for Business and Eco-
nomics, Second Edition. South-Western, Cengage Learning, Ohio, 2011.

[7] James McClave and Terry Sincich. Statistics, Thirteenth Edition, Global Edition. Pearson
Education Limited, United Kingdom, 2018.

[8] Gouri Bhattacharyya and Richard Johnson. Statistics, Principles & Methods, Sixth Edi-
tion. John Wiley & Sons, Inc., New Jersey, 2010.

[9] Mario Triola. Elementary Statistics, 13th Edition. Pearson Education, Inc., United States
of America, 2018.

[10] Mark Berenson, Timothy Krehbiel, and David Levine. Basic Business Statistics: Concepts
and Applications, 12th Edition. Pearson Education, Inc., New Jersey, 2012.

[11] Neil Weiss. Introductory Statistics, Tenth Edition, Global Edition. Pearson Education Lim-
ited, England, 2017.

[12] Richard Larsen and Morris Marx. An Introduction to Mathematical Statistics and Its Ap-
plications, Sixth Edition. Pearson Education, Inc., United States of America, 2018.

[13] Ronald Weiers. Introduction to Business Statistics, Seventh Edition. South-Western, Cen-
gage Learning, Ohio, 2011.

149

You might also like