0% found this document useful (0 votes)
10 views99 pages

Pdf24 Merged

Uploaded by

Jack Quân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views99 pages

Pdf24 Merged

Uploaded by

Jack Quân
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Chapter 2

2.1 VARIABLES AND DATA


Data Terminology
▪ An observation: a single member of a collection of items that we want to study,
such as a person, firm, or region.
Ex: an employee, or an invoice mailed last month
▪ A variable: a characteristic about the items that we want to study (e.g., student
name, Gender, DOB).
Ex: an employee’s income or an invoice amount.
▪ Data set: all the values of all of the variables for all of the observations we chose.
Data usually are entered into a spreadsheet or database as an n X m matrix

Categorical and Numerical Data


A data set may contain a mixture of data types. Two broad categories:

• Categorical (qualitative) data: values that are described by words rather than
numbers - nonnumerical values - Verbal label. Values of the categorical variable
might be represented using numbers - Coded
• Numerical (quantitative) data: arise from counting, measuring something, or some
kind of mathematical operation. Two types: Discrete (integers), Continuous
(physical measurements, financial variables)

Time Series Data and Cross-Sectional Data


• Time series Data: observation in the sample represents a different equally spaced
point in time (years, months, days). The periodicity is the time between observations.
→ trends and patterns over time
Ex: a firm’s sales, market share, debt/ equity ratio, employee absenteeism, inventory
turnover, and product quality ratings
• Cross-sectional Data: observation represents a different individual unit (e.g., a
person, firm, geographic area) at the same point in time.
→ variation among observations and relationships
Ex: daily closing prices of a group of 20 stocks recorded on December 1, 2015.
Combine the two data types to get pooled cross-sectional and time series data.
Ex: monthly unemployment rates for the 13 Canadian provinces or territories for the last 60
months

2.2 LEVEL OF MEASUREMENT


Four levels of measurement for data: nominal, ordinal, interval, and ratio.

Nominal Measurement
Nominal data: the weakest level of measurement and the easiest to recognize, identify a
category. “Nominal” data are the same as “qualitative”, “categorical” or “classification” data.
The only permissible mathematical operations are counting (e.g., frequencies).
➔ No ordering

Ordinal Measurement
Ordinal data codes connote – imply - a ranking of data values. There is no clear meaning to
the distance between. Like nominal data, ordinary data lack the properties that are required
to compute many statistics, such as the average. Ordinal data can be treated as nominal,
but not vice versa.
➔ Ordering, but differences have no meaning.
Interval Measurement
Interval data is not only a rank but also has meaningful intervals between scale points.
Intervals between numbers represent distances → mathematical operations: taking an
average. Ratios are not meaningful for interval data.
➔ Differences have meaning, but ratios have no meaning.

Ratio Measurement
The data have all the properties of interval data and the ratio of two values is meaningful.
The measurements have a true zero value. We can recode ratio measurements downward
into ordinal or nominal measurements (but not conversely)
➔ Ratios have meaning.

2.3 SAMPLING CONCEPTS


• A population: the collection of all items of interest or under investigation, could be
finite or infinite.

• A sample: looking only at some items selected from the population - an observed
subset of the population. Be preferred when: Infinite Population, Destructive Testing,
Timely Results, Accuracy, Cost, Sensitive Information

• A census: an examination of all items in a defined population. Be preferred when:


Small population, large sample size, database exist, legal requirement.
Rule of Thumb:
A population may be treated as infinite when the population size N is at least 20 times the
sample size n (i.e., when N/n ≥ 20)

Parameters and Statistics


• A parameter is a specific characteristic of a population
• A statistic is a specific characteristic of a sample
From a sample of n items, chosen from a population, we compute statistics that can be
used as estimates of parameters found in the population.

Population mean = µ Sample mean = 𝐱̅

Population proportion = π Sample proportion = p

Target Population
• The target population contains all the individuals in which we are interested
• The sampling frame is the group from which we take the sample
2.4 SAMPLING METHODS
Two main categories: random sampling and non-random sampling

Random Sampling Methods

Simple Random Sample


We denote the population size by N and the sample size by n. In a simple random sample,
every item in the population of N items has the same chance of being chosen in the sample
of n items. Ex: select one student at random from a list of 15 students
Sampling without replacement: once an item has been selected to be included in the
sample, it cannot be considered for the sample again. Problem when our sample size n is
close to our population size N → bias/ tendency to overestimate/ underestimate
A finite population is effectively infinite if the sample is less than 5 percent of the population
(if n/N < .05)
Sampling with replacement: the same random number could show up more than once.
Duplicates are unlikely when n is much smaller than N

Systematic Sample
Systematic sample: choose every kth item from a sequence or list, starting from a
randomly chosen entry among the first k items on the list.

• Decide on sample size: n

• Divide frame of N individuals into n groups of k individuals: k=N/n

• Randomly select one xth individual from the first group

• Select every xth individual in other groups thereafter


Stratified Sample
• Divide population into homogenous subgroups (called strata) according to some
common characteristic (e.g. age, gender, occupation)

• Select a simple random sample from each subgroup

• Combine samples from subgroups into one

Cluster Sampling
Divide population into several “clusters” (e.g. regions), each representative of the population

• One-stage cluster sampling: randomly selected k clusters


• Two-stage cluster sampling: randomly select k clusters and then choose a random
sample of elements within each cluster.

Non-Random Sampling Methods

Sources of Error
2.6 SURVEYS
SURVEY
• Step 1: State the goals of the research

• Step 2: Develop the budget (time, money, staff)

• Step 3: Create a research design (target population, frame, sample size).

• Step 4: Choose a survey type and method.

• Step 5: Design a data collection instrument (questionnaire).

• Step 6: Pretest the survey instrument and revise as needed.

• Step 7: Conduct the survey.

• Step 8: Code the data and analyze the data

Questionnaire Design
• Begin with short, clear instructions.
• State the survey purpose.
• Assure anonymity.
• Instruct on how to submit the completed survey.
• Break survey into naturally occurring sections
• Let respondents bypass sections that are not applicable (e.g., “if you answered no to
question 7, skip directly to Question 15”).
Chapter 3
Graphical Presentation of Data
• Data in raw form are usually not easy to use for decision making
• Some type of organization like graph or table is needed
• The type of graph to use depends on the variable being summarized

Tables and Charts for Categorical Data

Summary table

Column/Bar and Pie Chart


• Column/Bar charts and Pie charts are often used for qualitative/categorical data
(categories or nominal scale)
• Pies or Bars/Columns represent categories
• Height of bar or size of pie slice shows the frequency or percentage for each
category

Column and Bar chart


• A column chart is a vertical display of data
• A bar chart is a horizontal display of data
A Column chart display easier to read, but a Bar chart can be useful when the axis
labels are long or when there are many categories.
Pie chart
• Pie charts can only convey a general idea of the data values.
• The only correct use of a pie chart is to portray data that sum to a total (e.g.
percentage). A pie chart is never used to display time series data.
• Pie chart should have only a few slices

Pareto Diagram
• A Pareto chart displays categorical data, with categories displayed in descending
order of frequency
• A cumulative polygon is often shown in the same graph
• Commonly used in quality management to display the frequency of defects or errors
of different types.

Tables and Charts for Numerical Data

The Ordered Array


A sequence of data in rank order:

• Shows range (min to max)


• Provides some signals about variability within the range
• May help identify outliers (unusual observations)
• If the data set is large, the ordered array is less useful
❖ Data in raw form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
❖ Data in ordered array from smallest to largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Stem-and-Leaf Diagram
A simple way to see distribution details in a data set
METHOD: Separate the sorted data series into leading digits (the stem) and the trailing
digits (the leaves)
For two-digit or three-digit integer data, the stem is the tens digit of the data, and the leaf is
the ones digit

❖ Completed stem-and-leaf diagram: Using the 100’s digit as the stem:


Data in ordered array:
❖ Completed stem-and-leaf display:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Dot Plots
A dot plot is another simple graphical display of n individual values of numerical data,
The basic steps in making a dot plot are to
1. Make a scale that covers the data range
2. Mark axis demarcations and label them
3. Plot each data value as a dot above the scale at its approximate location
If more than one data value lies at approximately the same X-axis location, the
dots are piled up vertically

• Easy to understand
• Show variability
• Show the center and where the midpoint lies
• Reveal some things about the shape of the distribution
• Not good for large samples (e.g., > 5,000).
Dot plots have limitations.

• Don’t reveal very much information about the data set’s shape when the
sample is small
• Become awkward when the sample is large (what if you have 100 dots at the
same point?)
• When have decimal data.

Tabulating Numerical Data


Frequency and Cumulative Distributions
• A table
• Grouping n data values into k classes called bins (based on values of the data)
• The bin limits are cutoff points that define each bin.
• Bins have equal interval widths and their limits cannot overlap

❖ The basic steps for constructing a frequency distribution


1. Sort the data in ascending order
➔ Find Smallest and Largest Data Values
2. Choose the number of bins
➔ Sturges’ Rule: k = 1 + 3.3.log (n)
3. Set the bin limit:
𝐱𝐦𝐚𝐱 −𝐱𝐦𝐢𝐧
➔ 𝐁𝐢𝐧 𝐰𝐢𝐝𝐭𝐡 ≈ 𝐤

4. Put the data values in the appropriate bin


➔ Count the Data Values in Each Bin
5. Create the table.
➔ Show only the absolute frequencies or include the relative frequencies and the
cumulative frequencies
Histograms
A histogram is a graphical representation of a frequency distribution

• The class boundaries (or class midpoints) are shown on the X axis
• the Y axis is either frequency, relative frequency, or percentage
• Bars of the appropriate heights are used to represent the number of
observations within each class. No gap between bars

The Frequency Polygon and Ogive


The Polygon
A frequency polygon: a line graph that connects the midpoints of the histogram intervals,
plus extra intervals at the beginning and end so that the line will touch the X-axis

The Ogive (Cumulative % Polygon)


An ogive: a line graph of the cumulative frequencies.
Use for finding percentiles or in comparing the shape of the sample
Multivariate Categorical Data
Contingency table

Side-by-side bar charts

Scatter Plots
A scatter plot shows n pairs of observations (x1, y1), (x2, y2), . . ., (xn, yn) as dots (or some
other symbol) on an X-Y graph

• Investigate the relationship between two variables → association between two variables
• Convey patterns in data pairs that would not be apparent from a table.
Time Series Plot
• Used to study patterns in the values of a variable over time.
• One variable is measured on the X axis
• The time period is measured on the Y axis.
• Can display several variables at once

Log Scales
Useful for time series data: be expected to grow at a compound annual percentage rate
(e.g., GDP, the national debt, or your future income).
Reveal whether the quantity is growing at an

• increasing percent (concave upward),


• constant percent (straight line)
• declining percent (concave downward).

Deceptive Graphs
Error 1: Nonzero Origin: A nonzero origin will exaggerate the trend
Error 2: Elastic Graph Proportions: By shortening the X-axis in relation to the Y-axis,
vertical change is exaggerated.

Error 3: Dramatic Titles and Distracting Pictures: The title often is designed more to
grab the reader’s attention than to convey the chart’s content
Error 4: 3-D and Novelty Graphs: Depth may enhance the visual impact of a bar chart,
but it introduces ambiguity in bar height

Error 5: Rotated Graphs: By making a graph 3-dimensional and rotating, trends appear
to dwindle into the distance or loom alarmingly toward you
Error 8: Complex Graphs: Complicated visual displays make the reader work harder.

Error 11: Area Trick: Simultaneously enlarging the width of the bars as their height
increases → bar area misstates the true proportion
CHAPTER 4

4.1 NUMERICAL DESCRIPTION


Descriptive measures derived from:

• a sample (n items): statistics


• a population (N items or infinite): parameters
Three key characteristics: center, variability, and shape.

4.2 MEASURES OF CENTER


Mean
The sum of the data values divided by the number of data items.

Population: µ

Sample: 𝒙

Characteristics of the Mean


• aka the “average”, is used generally unless extreme values (outliers) exist
• affected by every sample item, especially extreme values (outliners)
• the balancing point or fulcrum in a distribution

Median
The median (denoted M) is the 50th percentile or midpoint of the ordered (ordered array)
sample data set x1, x2, …, xn.

• Separating the sorted observations into lower 50%, upper 50%


• Not affected by the extreme values (outliners) → often used

Finding the Median


❖ The location of the median:
𝒏+𝟏
Median position = position in the ordered data
𝟐
• If n is odd, the median is the middle observation in the ordered array
• If n is even, the median is the average of the middle two observations
n+1
Note that is not the value of the median, only the position of the median in the ranked data.
2

Mode
• Value that occurs most frequently
• Not affected by extreme values (outliners)
• Used for either numerical or categorical (nominal) data
• There may be no mode
• There may be several modes
• Most useful for discrete or categorical data

Geometric Mean
The geometric mean (denoted G) is a measure of central tendency, obtained by multiplying
the data values and then taking the 𝒏𝒕𝒉 root of the product.

𝐆 = 𝐧√𝐱 𝟏 𝐱 𝟐 𝐱 𝟑 … 𝐱 𝐧
All the data values: x > 0
• Used to measure the rate of change of a variable over time
𝟏
𝐗 𝐆 = ( 𝐱 𝟏 𝐱 𝟐 𝐱 𝟑 … 𝐱 𝐧 )𝐧
• Geometric mean of rate of return
o Measures the status of an investment over time
𝟏
𝐑 𝐆 = [(𝟏 + 𝐑 𝟏 )(𝟏 + 𝐑 𝟐 )(𝟏 + 𝐑 𝟑 ) … (𝟏 + 𝐑 𝐧 )]𝐧 − 𝟏
Where Ri is the rate of return in time period i

SUMMARY TABLE

Quartiles
The quartiles (denoted Q1, Q2, Q3): scale points that divide the ordered data into four
groups of approximately equal size: the 25th, 50th, and 75th percentiles

• Q1 is the value for which 25% of the observations are smaller and 75% are larger
• Q2 is the same as the median (50% are smaller, 50% are larger)
• Only 25% of the observations are greater than Q3

The first quartile Q1 is the median of the data values below Q2, and the third quartile Q3 is
the median of the data values above Q2
Find a quartile by determining the value in the appropriate position (𝒙𝒏 ) in the ordered
data

First quartile position: Position of Q1 = (N+1)/4

Second quartile position: Position of Q2 = (N+1)/2 (the median position)

Third quartile position: Position of Q3 = 3(N+1)/4


where N is the number of observed values
𝒙𝒏−𝟏 + 𝒙𝒏+𝟏
The value of Q1, Q2, Q3 is the value between 𝒙𝒏−𝟏 and 𝒙𝒏+𝟏 : =
2

Box and Whisker Plot


Box-and-Whisker Plot: A Graphical display of data using 5-number summary

𝑥𝑚𝑖𝑛 , Q1, Q2, Q3, 𝑥𝑚𝑎𝑥

A box plot shows

• center (position of the median Q2)


• variability (width of the “box” defined by Q1 and Q3 and the range between
𝑥𝑚𝑖𝑛 and 𝑥𝑚𝑎𝑥 )
• shape (skewness if the whiskers are of unequal length and/or if the median is
not in the center of the box)
Measures of Variation
Each diagram has the same mean, but they differ in dispersion around the mean.
Variation is the “spread” of data points about the center

Range
The range is the difference between the largest and smallest observations:

Range = Xlargest – Xsmallest

Disadvantages of the Range


• Ignores the way in which data are distributed (only consider the two extreme
data values)
• Sensitive to outliers

Interquartile Range
The interquartile range Q3 – Q1 (denoted IQR) measures the degree of spread in the data
(the middle 50 percent).

• Eliminate some outlier problems


• Eliminate some high- and low-valued observations and calculate the range from
the remaining values

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1
Fences and Unusual Data Values
We can use the quartiles to identify unusual data points. Detect data values that are far
below Q1 or far above Q3.

Inner fences Outer fences


Lower fence: Q1 - 1.5 IQR Q1 - 3 IQR
Upper fence: Q3 +1.5 IQR Q3 +3 IQR

• Values outside the inner fences are unusual


• Values outside the outer fences are outliers
In a Boxplot, Xmin and Xmax are the smallest and highest values in the inner fences.
Some considered values outside the inner fences are outliers

Midhinge
An additional measure of center that has the advantage of not being influenced by
outliers. The midhinge is the average of the first and third quartiles:

𝑄1 + 𝑄3
Midhinge =
2
A new way to describe skewness:

• Median < Midhinge ⇒ Skewed right (longer right tail)


• Median ≅ Midhinge ⇒ Symmetric (tails roughly equal)
• Median > Midhinge ⇒ Skewed left (longer left tail)

Variance
̅) from the mean divided by the sample size
The sum of squared deviations (𝒙 − 𝒙
➔ Average (approximately) of squared deviations of values from the mean
When

𝑥̅ = mean 𝑛 2
∑𝑖=1(𝑥𝑖 − 𝑥̅ )
n = sample size 𝑠2 =
𝑥𝑖 = ith value of the variable x 𝑛−1
Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
• A measure of the “average” scatter around the mean

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑆=√
𝑛−1

❖ The more spread out the data, the higher the standard deviation and vice versa

Advantages of Variance and Standard Deviation


• Each value in the data set is used in the calculation
• Values far from the mean are given extra weight

Notice
Standard deviations

• can be compared only for data sets measured in the same units.
• should not be compared if the means differ greatly.
Coefficient of Variation
To compare dispersion in data sets with

• dissimilar units of measurement (e.g., kilograms and ounces)


• dissimilar means (e.g., home prices in two different cities).
A unit-free measure of dispersion:
𝑆
𝐶𝑉 = ̅ 100%
𝑋

• Measures relative dispersion


• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data measured in different units

Example

We can say that ATM deposits have much greater relative dispersion (120 percent) than
either defect rates (18 percent) or P/E ratios (62 percent).

SUMMARY TABLE

Shapes of Distribution
• Describes how data are distributed
• Measures of shape:
o Symmetry / asymmetry
o Peakedness
Skewness
Measure symmetry/asymmetry of a distribution

This unit-free statistic

• compare two samples measured in different units (say, dollars and yen)
• compare one sample with a known reference distribution

Kurtosis
Kurtosis refers to the relative length of the tails and the degree of concentration in the
center.

Measures of Center and Shapes


Standardized Data
To locate the position of items within a data array.

Chebyshev’s Theorem
Regardless of how the data are distributed, at least (1 - 1/k2) x 100% of the values will
fall within k standard deviations of the mean (for k > 1)
Example

The Empirical Rule


If the data distribution is approximately bell-shaped, then the interval:

Data values outside μ ± 3σ are rare (less than 1%) in a normal distribution → outliers.

Z Scores
A measure of distance from the mean (for example, a Z-score of 2.0 means that a value is
2.0 standard deviations from the mean)
Formula for a population Formula for a sample
𝑥𝑖 − 𝜇 𝑥𝑖 − 𝑥̅
𝑍= 𝑍=
𝜎 𝑠
Based on its standardized z-score, a data value is classified as:

Unusual if |𝒛𝒊 | > 2 (beyond μ 6 ± 2σ)

Outlier if |𝒛𝒊 | > 3 (beyond μ 6 ± 3σ)


Grouped Data
Weighted Mean
The weighted mean is a sum that assigns each data value a weight wj that represents a
fraction of the total (i.e., the k weights must sum to 1).

• Use when data is already grouped into n classes, with wi values in the ith class

Approximations for Grouped Data


❖ Use the midpoint of a class interval to approximate the values in that class
o For a population of N observations, the mean is

o For a sample of n observations, the mean is

❖ Suppose a data set contains midpoint values m1, m2, . . ., mk, occurring with
frequencies f1, f2, . . . fK
o For a population of N observations the variance is

o For a sample of n observations, the variance is


Linear Relationship
The Covariance
The covariance of two random variables X and Y (denoted Cov(X,Y)) measures the degree
to which the values of X and Y change together
➔ the strength of the linear relationship between two variables
The Population Covariance:

The Sample Covariance:

❖ Covariance between two random variables:

cov(X,Y) > 0 → X and Y tend to move in the same direction

cov(X,Y) < 0 → X and Y tend to move in opposite directions

cov(X,Y) = 0 → X and Y are independent

Note:
o Only concerned with the strength of the relationship
o No causal effect is implied

Coefficient of Correlation
Correlation coefficient (denoted r) describes the degree of linearity between paired
observations on two quantitative variables X and Y.
➔ Measures the relative strength of the linear relationship between two variables
Sample coefficient of correlation

OR
Features of Correlation Coefficient, r

• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative (go down) linear relationship
• The closer to 1, the stronger the positive (go up) linear relationship
• The closer to 0, the weaker the linear relationship
SUMMARY
• Described measures of central tendency
o Mean, median, mode, geometric mean
• Discussed quartiles
• Described measures of variation
o Range, interquartile range, variance and standard deviation, coefficient of
variation, Z-scores
• Illustrated shape of distribution
• Symmetric, skewed, box-and-whisker plots
• Discussed covariance and correlation coefficient
Chapter 5
5.1 RANDOM EXPERIMENTS
Sample Space
Random experiment is an observational process whose results cannot be known in
advance.
The set of all possible outcomes (denoted S) is the sample space for the experiment.
◼ A sample space with a countable number of outcomes is discrete.

• Flip a coin, the sample space consists of 2 outcomes S = {H, T}


• Roll a die, the sample space consists of 6 outcomes S = {1, 2, 3, 4, 5, 6}
If the outcome of the experiment is a continuous measurement, the sample space cannot
be listed, but it can be described by a rule. E.g. S = {all X such that X > 0}

Event
An event is any subset of outcomes in the sample space

• A simple event or elementary event is a single outcome.


o Flip a coin: S = {H, T}
• A discrete sample space S consists of all the simple events (Ei): S = {E1, E2, …, En}
• A compound event consisting of two or more simple events
o Flip a coin twice: S = {HH, HT, TH, TT}

5.2 PROBABILITY
The probability of an event is a number that measures the relative likelihood that the event
will occur.

• The probability of an event A, denoted P(A), must lie within the interval from 0 to 1:
0 ≤ 𝑃(𝐴) ≤ 1
o If P(A) = 0: The event cannot occur
O If P(A) = 1: The event is certain to occur

Assigning Probability
Three distinct ways of assigning probability:
Empirical Approach
Counting the frequency of observed outcomes (f) defined in our experimental sample
𝑓
space and dividing by the number of observations (n). The estimated probability is
𝑛

Classical Approach
A priori: the process of assigning probabilities before actually observe the event or try an
experiment
When flipping a coin, rolling a pair of dice cards, lottery numbers, and roulette, the nature of
the process allows us to envision the entire sample space.

Subjective Approach
A subjective probability reflects someone’s informed judgment about the likelihood of
an event - needed when there is no repeatable random experiment.

5.3 RULES OF PROBABILITY


Complement of an Event
The complement of an event A is denoted by A′ (or Ā) and consists of everything in the
sample space except event A.
❖ A and A′ together comprise the entire sample space:
◼ P(A) + P(A′ ) = 1
◼ P(A′ ) = 1 – P(A)

Union of Two Events


The union of two events consists of all outcomes in the sample space S that are contained
either in event A or in event B or in both.
The union of A and B is denoted:

• A∪B
• “A or B”

Intersection of Two Events


The intersection of two events A and B: the event consisting of all outcomes in the sample
space S that are contained in both event A and event B
The intersection of A and B is denoted:

• A∩B
• “A and B”

General Law of Addition


The probability of the union of two events A and B is the sum of their probabilities less the
probability of their intersection.

P(A ∪ B) = P(A) + P(B) – P(A ∩ B)


Mutually Exclusive Events
Events A and B are mutually exclusive (or disjoint) if their intersection is the empty set
(a set that contains no elements). Events that cannot occur simultaneously

If A  B = , then P(A  B) = 0

• Example:
o Event A = a day in January. Even B = a day in February

Special Law of Addition


In the case of mutually exclusive events, since these events do not overlap, the addition law
reduces to:

P(A  B) = P(A) + P(B)

Collectively Exhaustive Events


Events are collectively exhaustive if their union is the entire sample space S (i.e., all the
events that can possibly occur).

• Example: Randomly choose a day from 2020


o Event A = weekday
o Even B = weekend
o Event C: January
o Event D: Spring
• Events A, B, C and D are collectively exhaustive (but not mutually exclusive – a
weekday can be in January or in Spring)
• Events A and B are collectively exhaustive and also mutually exclusive (weekday
and weekend constitute 365 days of a year)

Conditional Probability
The probability of event A given that event B has occurred is a conditional probability.
Denoted P(A | B). The vertical line “ | ” is read as “given”.

𝑷(𝑨  𝑩 )
P(A | B) = for P(B) > 0
𝑷(𝑩)

Independent Events
Two events are independent if and only if:

P(A | B) = P(A)
Events A and B are independent when the probability of one event is not affected by the
fact that the other event has occurred

Multiplication Rules
Using algebra, we can rewrite the formula of conditional probability

𝑷(𝑨  𝑩 ) = 𝐏(𝐀 | 𝐁). 𝑷(𝑩)


Note: If A and B are independent, then P(A | B) = P(A) and the multiplication rule simplifies to

𝑷(𝑨  𝑩 ) = 𝐏(𝐀 ). 𝑷(𝑩)


Odds of an Event
In sports and games of chance, we define the odds in favor of an event A as the ratio of the
probability that event A will occur to the probability that event A will not occur. Its
reciprocal is the odds against event A.

Relationship with Probability


❖ The odds in favor of event A occurring is:
𝑃(𝐴) 𝑃(𝐴)
Odds = =
𝑃(𝐴′ ) 1−𝑃(𝐴)

❖ The odds against event A occurring is:


𝑃(𝐴′) 1−𝑃(𝐴)
Odds = =
𝑃(𝐴) 𝑃(𝐴)

5.5 CONTINGENCY TABLES


A contingency table is a cross-tabulation of frequencies into rows and columns. A cell
shows a frequency. (used to report the results of a survey)
❖ Collect data of 100 cars:
o Each car either has AC or no AC
o Each car either has GPS or no GPS

Joint Probabilities
A joint probability representing the probability of the intersection of two events.
Found by diving the cell (except the total row and column) by the total sample size

𝟑𝟓
𝑷(𝑮𝑷𝑺 ∩ 𝑨𝑪) = = 𝟎. 𝟑𝟎
𝟏𝟎𝟎
𝟓𝟓
𝑷(𝒏𝒐𝑮𝑷𝑺 ∩ 𝑨𝑪) = 𝟏𝟎𝟎 = 𝟎. 𝟓𝟓
Marginal probability
The marginal probability of an event is found by dividing a row or column total by the total
sample size.
𝟗𝟎
𝑷(𝑨𝑪) = = 0,9
𝟏𝟎𝟎
𝟒𝟎
𝑷(𝑮𝑷𝑺) = = 0,4
𝟏𝟎𝟎

Conditional probability
Found by restricting ourselves to a single row or column of the given condition.
Dividing the cell by the total of the given condition
𝑮𝑷𝑺 ∩ 𝑨𝑪 35
𝑷(𝑮𝑷𝑺|𝑨𝑪) = = = 0,3889
𝑨𝑪 90
𝑵𝒐𝑨𝑪 ∩ 𝑮𝑷𝑺 5
𝑷(𝑵𝒐𝑨𝑪|𝑮𝑷𝑺) = = = 0,125
𝑮𝑷𝑺 40

5.6 DECISION TREE

Given AC or no AC: Given GPS or no GPS:


5.7 BAYES’ THEOREM
• Bayes’ Theorem is used to revise previously calculated probabilities based on new
information.

• Developed by Thomas Bayes in the 18th Century.

• It is an extension of conditional probability.


The prior (marginal) probability of an event B is revised after event A has been considered
to yield a posterior (conditional) probability.

P( A | B) P( B)
P( B | A) =
P( A)
❖ In situations where P(A) is not given, the form of Bayes’ Theorem is:
n

General Forms of Bayes’ Theorem


P(A | B i )P(B i )
P(B i | A) =
P(A | B 1 )P(B 1 ) + P(A | B 2 )P(B 2 ) +    + P(A | B k )P(B k )
• where:
o Bi = ith event of k mutually exclusive and collectively exhaustive events
o A = new event that might impact P(Bi)

Counting Rules
Fundamental Rule of Counting
Counting Rule 1:
If event A can occur in 𝒏𝟏 ways and event B can occur in 𝒏𝟐 ways, then events A and B
can occur in 𝒏𝟏 × 𝒏𝟐 ways. In general, the number of ways that m events can occur:
𝒏𝟏 × 𝒏𝟐 × … × 𝒏𝒎
Example: You want to go to a park, eat at a restaurant, and see a movie. There are 3 parks,
4 restaurants, and 6 movie choices. How many different possible combinations are there?
Answer: (3)(4)(6) = 72 different possibilities

Counting Rule 2:
If any one of k different mutually exclusive and collectively exhaustive events can occur
on each of n trials, the number of possible outcomes is equal to

𝑘𝑛
Example: If you roll a fair die 3 times then there are 63 = 216 possible outcomes
Factorials
The number of unique ways that n items can be arranged in a particular order: n factorial,
the product of all integers from 1 to n

n! = 1.2.3…(n-2)(n-1)(n)
Example: You have five books to put on a bookshelf. How many different ways can
these books be placed on the shelf?
Answer: 5! = (5)(4)(3)(2)(1) = 120 different possibilities

Permutations
Choose X items at random without replacement from a group of n items. The number of
ways of arranging X objects selected from n objects in order is

n!
n Px =
(n − X)!
Example: You have five books and are going to put three on a bookshelf. How
many different ways can the books be ordered on the bookshelf?

n! 5! 120
n Px = = = = 60
Answer: (n − X)! (5 − 3)! 2 different possibilities

Combinations
A combination is a collection of X items chosen at random without replacement from n
items. The number of ways of selecting X objects from n objects, irrespective of order, is

n!
Cx =
X! (n − X)!
n

Example: You have five books and are going to randomly select three to read. How many
different combinations of books might you select?

n! 5! 120
Cx = = = = 10
X! (n − X)! 3! (5 − 3)! (6)(2)
n
Answer: different possibilities
CHAPTER 6
6.1 DISCRETE PROBABILITY DISTRIBUTIONS
Random Variables
A random variable is a function or rule that assigns a numerical value to each outcome
in the sample space of a random experiment.
A discrete random variable has a countable number (integer) of distinct values.

• Some have a clear upper limit (e.g., number of absences in a class of 40 students)
• Others do not (e.g., number of text messages you receive in a given hour).
A continuous random variable produces outcomes from a measurement

• e.g. your annual salary, or your weight/height

Probability Distributions
Discrete Probability Distribution
A discrete probability distribution assigns a probability to each value of a discrete
random variable X. The distribution must follow the rules of probability
▪ The probability for any given value of X
0 ≤ 𝑃(𝑥𝑖 ) ≤ 1
▪ The sum of all values of X
𝑛

∑ 𝑃(𝑥𝑖 ) = 1
𝑖=1

More than one random variable value can be assigned to the same probability, but one
random variable value cannot have two different probabilities.

Cumulative Probability Function


The Cumulative Probability Function (CDF), denoted F(x0), shows the probability that X is
less than or equal to x0

𝐹(𝑥0 ) = 𝑃(𝑋 ≤ 𝑥0 ) Or in other words:

𝐹(𝑥0 ) = ∑ 𝑃(𝑥)
𝑋≤𝑥0

PDF Probability Distribution Function CDF Cumulative Distribution Function


6.2 EXPECTED VALUE AND VARIANCE
Expected Value
The expected value E(X) (of a discrete random variable) is the sum of all X-values
weighted by their respective probabilities.
Because E(x) is an average (weighted mean) → E(X) is the mean and use the symbol μ.
𝑛

𝐸(𝑥) = 𝜇 = ∑ 𝑥𝑖 𝑃(𝑥𝑖 )
𝑥𝑖 =1

Variance and Standard Deviation


The variance Var(X) (of a discrete random variable) is the sum of the squared deviations
about its expected value, weighted by the probability of each X-value.
Var(X) is a weighted average → measures variability about the mean
𝒏

𝑽𝒂𝒓(𝑿) = 𝝈𝟐 = ∑ (𝒙𝒊 − 𝝁)𝟐 𝑷(𝒙𝒊 )


𝒙𝒊 =𝟏

The standard deviation is the square root of the variance and is denoted σ:

𝝈 = √𝝈 = √𝑽𝒂𝒓(𝑿)
6.3 UNIFORM DISTRIBUTION
Characteristics of the Uniform Distribution
The uniform distribution describes a random variable with a finite number of consecutive
integer values from a to b.
Each value of the random variable is equally likely to occur

Rolling a die, the probability of having 1 side of the die:


1
P(x) =
6
6.4 BINOMIAL DISTRIBUTION
Bernoulli Experiments
A random experiment with only 2 outcomes is a Bernoulli experiment.

• One outcome is labeled a “success” (denoted X = 1) and the other a “failure”


(denoted X = 0).
• π is the P(success), 1 – π is the P(failure).
• “Success” is usually defined as the less likely outcome so that p < 0.5 for
convenience.

Possible Bernoulli Settings


• A manufacturing plant labels items as either defective or acceptable
• A firm bidding for contracts will either get a contract or not
• A marketing research firm receives survey responses of “yes I will buy” or “no I
will not”
• New job applicants either accept the offer or reject it

❖ We have: P(0) + P(1) = (1 – ) +  = 1 (0 ≤  ≤ 1)


❖ The expected value (mean): E(X) = 
❖ The variance: E(X) =  and V(X) = (1 - )

Binomial Distribution
The binomial distribution arises when a Bernoulli experiment is repeated n times
In a binomial experiment, X = the number of success in n trials
➔ binomial random variable X is the sum of n independent Bernoulli random
variables
The number of combinations of selecting X objects out of n objects is

n!
n Cx =
X! (n − X)!
P(X = x) is determined by the two parameters n and π. The binomial probability function:
𝒏!
𝑷(𝑿) = 𝝅𝑿 (𝟏 − 𝝅)𝒏−𝒙
𝑿! (𝒏 − 𝑿!)

• P(X) = probability of X successes in n trials, with probability of success p on each trial


• X = number of ‘successes’ in sample, (X = 0, 1, 2, ..., n)
• n = sample size (number of trials or observations)
• p = probability of “success”
Ex: Flip a coin four times, let x = # heads:
n=4 1 - π = (1 - 0.5) = 0.5
π= 0.5 X = 0, 1, 2, 3, 4
Binomial Shape
• π < 0.5 : skewed right
• π=0: symmetric
• π > 0.5 : skewed left

6.6 HYPERGEOMETRIC DISTRIBUTION


The hypergeometric distribution is similar to the binomial except that sampling is without
replacement from a finite population of N items
The trials are not independent and the probability of success is not constant from trial
to trial
Finding probability of “X=xi” items of interest in the sample (n) where there are “s”
items of interest in the population (N)
The hypergeometric distribution has three parameters:

• N (the number of items in the population),


• n (the number of items in the sample)
• s (the number of successes in the population).
Hypergeometric Distribution Formula

Where

• N = population size
• s = number of items of interest in the population
• N – s = number of events not of interest in the population
• n = sample size
• x = number of items of interest in the sample
• n – x = number of events not of interest in the sample
Example: 3 different computers are selected from 10 in the department. 4 of the 10
computers have illegal software loaded. What is the probability that 2 of the 3 selected
computers have illegal software loaded?
o N = 10 ( C )( C )
o n=3 P(X = 2 | 3,10,4) = 4 2 6 1
= 0.3
o s=4 ( C)
10 3
o x=2
➔ The probability that 2 of the 3 selected computers have illegal software loaded is
0.30, or 30%.
Geometric Distribution
The geometric distribution describes the number of Bernoulli trials until the first
success.

• X is the number of trials until the first success.


o X ranges from {1, 2, . . .}
o At least one trial to obtain the first success, but the number of trials is not fixed.

• π is the constant probability of a success on each trial.

6.5 POISSON DISTRIBUTION


The Poisson distribution describes the number of occurrences within a randomly chosen
unit of time (e.g., minute, hour) or space (e.g., square foot, linear mile).
The events must occur randomly and independently over a continuum of time or space
❖ Use POISSON DISTRIBUTION when
• To count the number of times an event occurs in a given area of opportunity (time,
space, volume…)
• The probability that an event occurs in one area of opportunity is the same for all
areas of opportunity
• The number of events that occur in one area of opportunity is independent of the
number of events that occur in the other areas of opportunity
• The average number of events per unit is  (lambda)

❖ Examples of Poisson Distribution


• The number of phone calls received by a call center per hour.
• The number of taxis passing a particular street corner per day.
• The number of computer crashes in a day.
• The number of mosquito bites on a person.
Poisson Distribution Formula
e −  x
P( X = x |  ) =
x!
where:

• x = number of events in an area of opportunity


•  = expected number of events (average number of events per unit)
• e = base of the natural logarithm system (2.71828...)

Mean:
μ=λ Variance σ2 = λ Standard Deviation σ= λ

Always right-skewed. The larger the , the less right-skewed the distribution

❖ Poisson Distribution Example


Example: An average number of houses sold per day by a real estate company is 2.
What is the probability that 3 houses will be sold tomorrow?

X=3
=2 e −  λ x e −2 (2)3
P(X = 3 | 2) = =
x! 3!
(2.71828−2 )(2)3
= = 0.18
3!
Optional: Use the Poisson approximation (as an
alternative) to the binomial
• The Poisson distribution may be used to approximate a binomial by setting
 = n
• This approximation is helpful when n is large.
• A common rule of thumb says the approximation is adequate if

n  20 and  (= )  .05
𝒏

Optional: Use the binomial approximation (as an


alternative) to the Hypergeometric
• Both the binomial and hypergeometric involve sample size of n and the number of
successes X.
• The binomial sample is with replacement while the hypergeometric sample is
without replacement

❖ Rule of Thumb: If n/N < 0.05, we can use the binomial approximation to the
hypergeometric, using sample size n and  = s/N.

Optional: Transformation rules


A linear transformation of a random variable X is performed by adding a constant,
multiplying by a constant, or both

Two useful rules about the mean and variance of a transformed random variable aX + b,
where a and b are any constants (a ≥ 0).
❖ Example: Professor Hardtack gave a tough exam whose scores had μ = 40 and σ = 10,
→ raise the mean 20 points. One way: by adding 20 points to every student’s score.
• Rule 1: adding a constant to all X-values will shift the mean but will leave the
standard deviation unchanged.
Alternatively, by multiply every exam score by 1.5 (40x1.5 = 60)

• Rule 2 : the standard deviation would rise from 10 to 15 → increasing the


dispersion. In other words, this policy would “spread out” the students’ exam scores.
Some scores might even exceed 100
Sums of Random Variables
If we consider the sum of two independent random variables X and Y, given as X + Y,
then:

Covariance
When X and Y are dependent, the covariance of them, denoted by Cov(X,Y) or σxy,
describes how the variables vary in relation to each other.

• Cov(X,Y) > 0 : indicates that the two variables move in the same direction
• Cov(X,Y) < 0 : indicates that the two variables move in opposite direction.
We use both the covariance and the variances of X and Y to calculate the standard
deviation of the sum of X and Y
CHAPTER 7
7.1 CONTINUOUS PROBABILITY DISTRIBUTIONS
A Continuous Variable is a variable that can assume any value in an interval:
o thickness of an item
o time required to complete a task
o temperature in a room
These can potentially take on any value, depending only on the ability to measure
accurately.
❖ Discrete Variable: each value of X has its own probability P(X).
❖ Continuous Variable: events are intervals and probabilities are areas underneath
smooth curves. A single point has no probability.

PDF and CDF of Continuous Distributions


Probability Density Function (PDF): Cumulative Distribution Function (CDF):

• Denoted f(x); must be nonnegative. • Denoted F(x).


• Total area under curve = 1. • Shows P(X ≤ x).
• Mean, variance, and shape • Useful for finding probabilities.
depend on the PDF parameters
Probabilities as Areas
Continuous probability functions are smooth curves.

• Unlike discrete distributions, the area at any single point = 0.


• The entire area under any PDF must be 1.
P(a < X < b) is the integral of the probability density function f(x) over the interval from a to b.
Because P(X = a) 5 0 the expression P(a < X < b) is equal to P(a ≤ X ≤ b).

Expected Value and Variance


The mean and variance of a continuous random variable are analogous (similar) to
E(X) and Var(X) for a discrete random variable, except that the integral sign e replaces the
summation sign Σ.

7.2 UNIFORM CONTINUOUS DISTRIBUTION


Characteristics of the Uniform Distribution
Uniform continuous distribution has equal probabilities for all possible outcomes of the
random variable. aka rectangular distribution - Denoted U(a, b) for short.
If X is a random variable that is uniformly distributed between a and b
❖ PDF constant height: f(x)=1/(b-a) ❖ CDF increases linearly to 1
Area = base x height = (b-a) x 1/(b-a) = 1 P(X ≤ x) = (x - a)/(b - a)
SUMAMARY TABLE

❖ Example: Using the uniform probability distribution to find P(3 ≤ X ≤ 5):


P(3 ≤ X ≤ 5) = (Base)(Height)=(d-c)/(b-a) = (5-3)/(6-2) = 0.5

7.3 NORMAL DISTRIBUTION


Characteristics of the Normal Distribution
A normal probability distribution is defined by two parameters, μ and σ. denoted N(μ, σ).
▪ Bell Shaped
▪ Symmetrical
▪ Mean= Median = Mode

Center is determined by the mean, μ

Spread is determined by the standard deviation, σ

The random variable has an infinite theoretical range: -  to + 


The Normal CDF
❖ The formula for the normal PDF is

1
e −(x−μ) /2σ
2 2
f(x) =
2π
Where

• e = the mathematical constant approximated by 2.71828


• π = the mathematical constant approximated by 3.14159
• μ = the population mean
• σ = the population standard deviation
• x = any value of the continuous variable, − < x < 

Changing μ shifts the distribution left or right.


Changing σ increases or decreases the spread.

The Normal CDF


For a normal random variable X with mean μ and variance σ2 , i.e., X~N(μ, σ2), the CDF is

F(x 0 ) = P(X  x 0 )
Finding Normal Probabilities
The probability for a range of values is measured by the area under the curve

P(a  X  b) = F(b) − F(a)

F(b) = P(X  b)

F(a) = P(X  a)
Summary of Normal Distributions
7.4 STANDARD NORMAL DISTRIBUTION
Characteristics of the Standard Normal
Any normal distribution can be transformed into the standardized normal distribution
(Z), with mean 0 and variance 1
By subtracting the mean and dividing by the standard deviation to produce a
standardized variable

The shape of the distribution is unaffected by the z transformation, only the scale
changed. We can express the problem in original units (X) or in standardized units (Z).

Finding Normal Probabilities


 a −μ b −μ
P(a  X  b) = P Z 
 σ σ 
 b −μ  a −μ
= F  − F 
 σ   σ 
Standardized Normal Table
The Standardized Normal table shows values of the cumulative normal distribution
function

For a given Z-value a , the table shows F(a) (the area under the curve from − to a)

Normal Areas from Appendix C-1


Appendix C-1 shows areas from 0 to z using increments of 0.01 from z = 0 to z = 3.69 (e.g.
z = 1,96 → look up column (vertically) for z = 1,9 then look up for the top row for z = 0,06

Normal Areas from Appendix C-2


Appendix C-2 shows cumulative normal areas from the left to z.

7.5 NORMAL APPROXIMATIONS


Normal Approximation to the Binomial
The logic of this approximation is that as n becomes large, the discrete binomial bars
become more like a smooth, continuous, normal curve

Rule of thumb: when n > 10 and n(1- ) > 10, then it is appropriate to use the normal
approximation to the binomial.

The binomial mean and standard deviation will be equal to the normal µ and 

μ = nπ
σ = nπ (1 − π)
Normal Approximation to the Poisson
The normal approximation to the Poisson works best when  is large

when  exceeds the values in Appendix B (which only goes up to λ = 20)

Set the normal µ and  equal to the Poisson mean and standard deviation.

μ=λ
σ= λ
7.6 EXPONENTIAL DISTRIBUTION
Characteristics of the Exponential Distribution
Often used to model the length of time between two occurrences of an event (the time
between arrivals)
Examples:

• Time between trucks arriving at an unloading dock


• Time between transactions at an ATM Machine
• Time between phone calls to the main operator

Defined by a single parameter, its mean λ (lambda)


The probability that an arrival time is less than some specified time X is

P(arrival time  X) = 1 − e − λX
Where: λ (lambda): mean of arrivals per unit (same as Poisson distribution)

POISSON DISTRIBUTION RELATIONSHIP WITH EXPONENTIAL DISTRIBUTION


The count of customer arrivals is a discrete random variable: Poisson distribution.
When the count of customer arrivals has a Poisson distribution
➔ The distribution of the time between two customer arrivals will have an
exponential distribution
SUMMARY TABLE

Finding Probability

Probability of waiting more than x Probability of waiting less than x

Example: Customers arrive at the service counter at the rate of 20 per hour. What is the
probability that the arrival time between consecutive customers is less than 6 minutes?

• The mean number of arrivals per hour is 20 → λ = 20


• 6 minutes is 0.1 hours → (X < 0,1)
• P(arrival time < 0.1) = 1 – e-λX = 1 – e-(20)(0.1) = 0.864665
• In Excel: EXPONDIST(0.1, 20,TRUE)
• So there is a 86.47% chance that the arrival time between successive customers is
less than 6 minutes
Chapter 8
8.1 SAMPLING AND ESTIMATION
A sample statistic: a random variable whose value depends items included in the random sample
Some samples may represent the population well, while other samples could differ greatly from the population
(particularly if the sample size is small)

To make inferences about a population that consider four factors:

• Sampling variation (uncontrollable).


• Population variation (uncontrollable).
• Sample size (controllable).
• Desired confidence in the estimate (controllable).

Estimators and estimates


An estimator: a statistic derived from a sample to infer the value of a population parameter
An estimate: the value of the estimator in a particular sample
Sample estimator of population parameters

Examples of estimators
Example: Consider eight random samples of size n = 5 from a large population of GMAT scores.
Sample mean is a statistic
Sample mean used to estimate population mean is an estimator

̅̅̅̅𝟏 = 𝟓𝟎𝟒. 𝟎 is an estimate


𝑿

Sampling error
Sampling error: the difference between an estimate and the corresponding population parameter.
Example for population mean
̅− 𝝁
Sampling error = 𝑿

Properties of Estimators
BIAS
The bias: the difference between the expected value of the estimator and the true parameter. Example
for the average value

̅)
Bias = 𝑬(𝒙 −𝝁
An unbiased estimator neither overstates nor understates the true parameter on average. Example of
unbiased estimator 𝑬(𝒙
̅) = 𝝁
̅) and sample proportion (p) are unbiased estimators of μ and π
The sample mean (𝒙

Sampling error is random whereas bias is systematic

EFFICIENCY
Efficiency refers to the variance of the estimator’s sampling distribution
Smaller variance means more efficient estimator. We prefer the minimum variance estimator - MVUE

̅ and s2 are minimum variance estimators of μ and σ2


𝒙

CONSISTENCY
Consistent estimator converges toward the parameter being estimated as the sample size increases
The variances of three estimators 𝒙
̅, s and p diminish as n increases, so all are consistent estimators.

8.2 CENTRAL LIMIT THEOREM


Sampling distribution of an estimator: the probability distribution of all possible values the statistic may
assume when a random sample of size n is taken.

The sample mean is an unbiased estimator for μ: 𝑬(𝒙


̅) = 𝝁 (the expected value of mean)
Sampling error of the sample mean - standard error of the mean: described by its standard deviation:
𝝈
𝝈𝑿̅ =
√𝒏

Central Limit Theorem for a Mean


If a random sample of size n is drawn from a population: mean μ and standard deviation σ

̅ approaches a normal distribution with mean μ and standard


The distribution of the sample mean 𝒙
𝝈
deviation 𝝈𝑿̅ = as the sample size increases.
√𝒏
Three important facts about the sample mean:

1. If the population is normal, the sample mean has a normal distribution centered at μ, with a standard
𝝈
error equal to 𝝈𝑿̅ =
√𝒏

2. As sample size n increases, the distribution of sample means converges to the population mean μ
𝝈
(i.e., the standard error of the mean 𝝈𝑿̅ = gets smaller).
√𝒏
3. Even if your population is not normal, by the Central Limit Theorem, if the sample size is large
enough, the sample means will have approximately a normal distribution.

Applying the Central Limit Theorem


UNIFORM POPULATION
The RULE OF THUMB that n ≥ 30 is required to ensure a normal distribution for the sample mean, but
actually a much smaller n will suffice if the population is symmetric.
The CLT predicts:

• The distribution of sample means drawn from the population will be normal
• The standard error of the sample mean 𝛔𝐗̅ will decrease as sample size increases

SKEWED POPULATION
The CLT predicts

• The distribution of sample means drawn from any population will approach normality
• The standard error of the sample mean 𝛔𝐗̅ will diminish as sample size increases.
In highly skewed populations, even n ≥ 30 will not ensure normality, though it is not a bad rule

In severely skewed populations, the mean is a poor measure of center to begin with due to outliers
Histograms of the actual means of many samples drawn from this uniform population

Range of Sample Means


The CLT permits: to define an interval which the sample means are expected to fall in.
As long as the sample size n is large enough, we can use the normal distribution regardless of the population shape
(or any n if the population is normal to begin with)

We use the familiar z-values for the standard normal distribution. If we know μ and σ, the CLT allows
us to predict the range of sample means for samples of size n:
8.3 SAMPLE SIZE AND STANDARD ERROR
𝝈
The key is the standard error of the mean: 𝝈𝑿̅ = The standard error decreases as n increases
√𝒏
To halve (÷2) the standard error, you must quadruple (x4) the sample size

𝝈
You can make the interval 𝝈𝑿̅ = ̿ of sample
as small as you want by increasing n. The mean 𝑿
√𝒏
̅ converges to the true population mean μ as n increases.
means 𝑿

8.4 CONFIDENCE INTERVAL FOR A MEAN (μ) WITH KNOWN σ


What Is a Confidence Interval?
A sample mean 𝑿̅ calculated from a random sample x1, x2, . . . , xn is a point estimate of the unknown
population mean μ

Construct a confidence interval for the unknown mean μ by adding and subtracting a margin of error
̅ , the mean of our random sample
from 𝑿

The confidence level for this interval is expressed as a percentage such as 90, 95, or 99 percent

Confidence level using z


Choosing a Confidence Level
In order to gain confidence, we must accept a wider range of possible values for μ. Greater confidence
implies loss of precision (i.e., a greater margin of error)
Common confidence level

Interpretation
If you took 100 random samples from the same population and used exactly this procedure to construct
100 confidence intervals using a 95 percent confidence level
➔ approximately 95 (95%) of the intervals would contain the true mean μ, while approximately 5 (5%)
intervals would not

When Can We Assume Normality?


• If σ is known and the population is normal, then we can safely use formula 8.6 to construct the
confidence interval for μ.
• If σ is known but we do not know whether the population is normal
̅ (by the CLT) as long as the
➔ Rule of thumb: n ≥ 30 is sufficient to assume a normal distribution for 𝑿
population is reasonably symmetric and has no outliers.

8.5 CONFIDENCE INTERVAL FOR A MEAN (μ) WITH UNKNOWN σ

Student’s t Distribution
When σ is unknown → the formula for a confidence interval resembles the formula for known σ except
that t replaces z and s replaces σ.

The confidence intervals will be wider (other things being the same) - tα/2 is always greater than zα/2.
Degrees of Freedom
Knowing the sample size allows us to calculate a parameter called degrees of freedom - d.f. - used to
determine the value of the t statistic used in the confidence interval formula.

d.f. = n - 1 (degrees of freedom for a confidence interval for μ)


The degrees of freedom tell us how many observations we used to calculate s - the sample standard
deviation, less the number of intermediate estimates we used in our calculation
Number of observations that are free to vary after sample mean has been calculated

Comparison of z and t
As degrees of freedom increase, the t-values approach the familiar normal z-values.

Outliers and Messy Data


The t distribution assumes a normal population.
Confidence intervals using Student’s t are reliable as long as the population is not badly skewed and if
the sample size is not too small

Using Appendix D
Beyond d.f. 5 50, Appendix D shows d.f. in steps of 5 or 10. If Appendix D does not show the exact degrees
of freedom that you want, use the t-value for the next lower d.f.

6 CONFIDENCE INTERVAL FOR A PROPORTION (π)


The distribution of a sample proportion p = x/n tends toward normality as n increases.
̅ . We say that p = x/n is a
Standard error σp will decrease as n increases like the standard error for 𝑿
consistent estimator of π.
Standard Error of the Proportion
The standard error of the proportion is denoted σp - depending on π, on n - being largest when the
population proportion is near π = .50 and becoming smaller when π ≈ 0 or 1.
The formula is symmetric

Confidence Interval for π

Narrowing the Interval


The width of the confidence interval for π depends on

• Sample size n
• Confidence level
• Sample proportion p
A narrower interval (i.e., more precision) → increase the sample size or reduce the confidence level
(e.g., from 95 percent to 90 percent)

Polls and Margin of Error


In polls and survey research, the margin of error is typically based on a 95 percent confidence level and the
initial assumption that π = .50
Each reduction in the margin of error requires a disproportionately larger sample size

Rule of Three
If in n independent trials, no events occur, the upper 95% confidence bound is approximately 3/n

8.7 ESTIMATING FROM FINITE POPULATIONS


𝑵−𝟏
The finite population correction factor (FPCF) √ reduces the margin of error and provides a more
𝑵−𝒏
precise interval estimate
8.8 SAMPLE SIZE DETERMINATION FOR A MEAN
Sample Size to Estimate μ
𝒛𝝈 𝟐
𝒏=( )
𝑬
Estimate σ
Method 1: Take a Preliminary Sample
Take a small preliminary sample and use the sample estimate s in place of σ. This method is the most
common, though its logic is somewhat circular (i.e., take a sample to plan a sample).

Method 2: Assume Uniform Population


Estimate upper and lower limits a and b and set σ = [(b - a)2/12 ]1/2 .

Method 3: Assume Normal Population


Estimate upper and lower bounds a and b and set σ = (b - a)/6. This assumes normality with most of the
data within μ + 3σ and μ - 3σ → the range is 6σ

Method 4: Poisson Arrivals


In the special case when λ is a Poisson arrival rate, then 𝝈 = √𝛌.

8.9 SAMPLE SIZE DETERMINATION FOR A PROPORTION


The formula for the required sample size for a proportion
𝒛 𝟐
𝒏 = ( ) 𝝅(𝟏 − 𝝅)
𝑬

Estimate π
Method 1: Assume That π = .50
Method 2: Take a Preliminary Sample
Take a small preliminary sample and insert p into the sample size formula in place of π
Method 3: Use a Prior Sample or Historical Data

8.10 CONFIDENCE INTERVAL FOR A POPULATION VARIANCE σ2


Chi-Square Distribution
If the population is normal → construct a confidence interval for the population variance σ2 using the chi-
square distribution with degrees of freedom equal to d.f. = n – 1
Lower-tail and upper-tail percentiles for the chi-square distribution (denoted XL2and XU2) can be found in
Appendix E.

(𝒏 − 𝟏)𝒔𝟐 𝟐
(𝒏 − 𝟏)𝒔𝟐
≤ 𝝈 ≤
𝑿𝟐𝑼 𝑿𝟐𝑳
Chapter 9
9.1 LOGIC OF HYPOTHESIS TESTING
The analyst states the assumption, called a hypothesis, in a format that can be tested using well-known
statistical procedures.
Hypothesis testing is an ongoing/iterative/continuous process.

Who Uses Hypothesis Testing?

STEP 1: STATE THE HYPOTHESES


Formulate a pair of mutually exclusive, collectively exhaustive statements about the world. One
statement or the other must be true, but they cannot both be true.
o H0: Null Hypothesis
o H1: Alternative Hypothesis
Efforts will be made to reject the null hypothesis (maintained hypothesis or research hypothesis)
If we reject H0 – null hypothesis → the alternative hypothesis H1 is the case.
H0 represents the status quo (e.g., the current state of affairs), while H 1 called as the action alternative because action
may be required if we reject H0 in favor of H1
STEP 2: SPECIFY THE DECISION RULE
Before collecting data to compare against the hypothesis, the researcher must specify how the evidence
will be used to reach a decision about the null hypothesis.

STEPS 3 AND 4: DATA COLLECTION AND DECISION MAKING


We compare the data with the hypothesis → using the decision rule → decide to reject or not reject
the null hypothesis

STEP 5: TAKE ACTION BASED ON DECISION


Appropriate action for the decision should relate back to the purpose of conducting the hypothesis test
in the first place.

9.2 TYPE I AND TYPE II ERROR


We have two possible choices concerning the null hypothesis. We either reject H0 or fail to reject H0

• Rejecting the null hypothesis when it is true is a Type I error (a false positive).
• Failure to reject the null hypothesis when it is false is a Type II error (a false negative).

Probability of Type I and Type II Errors


The probability of a Type I error (rejecting a true null hypothesis) is denoted α - level of significance

The probability of a Type II error (not rejecting a false hypothesis) is denoted β

The power of a test is the probability that a false hypothesis will be rejected. Reducing β would
correspondingly increase power (e.g. increase the sample size)

Relationship between α and β


Given two equivalent tests, we will choose the more powerful test (smaller β)
The larger critical value needed to reduce α makes it harder to reject H0, thereby increasing β.

Both α and β can be reduced simultaneously only by increasing the sample size

9.3 DECISION RULES AND CRITICAL VALUES


A statistical hypothesis: a statement about the value of a population parameter that we are interested in
The hypothesized value of the parameter is the center of interest.
Relying on sampling distribution and the standard error of the estimate to decide
One-Tailed and Two-Tailed Tests

The direction of the test is indicated by which way the inequality symbol points in H1:

o < indicates a left-tailed test


o ≠ indicates a two-tailed test
o > indicates a right-tailed test

Decision Rule
Compare a sample statistic to the hypothesized value of the population parameter stated in the null
hypothesis

• Extreme outcomes occurring in the left tail → reject the null hypothesis in a left-tailed test
• Extreme outcomes occurring in the right tail → reject the null hypothesis in a right-tailed test
The area under the sampling distribution curve that defines an extreme outcome: the rejection region
Calculating a test statistic that measures the difference between the sample statistic and the
hypothesized parameter
➔ A test statistic that falls in the shaded region → rejection of H0

Critical Value
The critical value: the boundary between the two regions (reject H0, do not reject H0).
The decision rule states what the critical value of the test statistic would have to be in order to reject
H0 at the chosen level of significance (α).
The choice of α should precede the calculation of the test statistic, thereby minimizing the temptation to select α

9.4 TESTING A MEAN: KNOWN POPULATION VARIANCE


Test Statistic
̅ and a benchmark μ0 in terms of
A test statistic measures the difference between a given sample mean 𝑿
the standard error of the mean.
Critical Value
Reject H0 if zcalc > + zα/2 or if zcalc < - zα/2
Otherwise fail to reject H0

p-Value Method
The p-value is a direct measure of the likelihood of the observed sample under H0
Compare the p-value with the level of significance.

• If the p-value is smaller than α, the sample contradicts the null hypothesis → reject H0

Reject H0 if P(Z > zcalc) < α


Otherwise fail to reject H0
o A large p-value (near 1.00) tends to support H0 → fail to reject H0
o A small p-value (near 0.00) tends to contradict H0 → reject H0

Two-Tailed Test
Reject H0 if zcalc > + zα/2 or if zcalc < - zα/2
Otherwise do not reject H0
USING THE P-VALUE APPROACH
In a two-tailed test, the decision rule using the p-value is the same as in a one-tailed test

Reject H0 if 2 x P(Z > zcalc) < α


Otherwise fail to reject H0

Analogy to Confidence Intervals


A two-tailed hypothesis test at the 5% level of significance (α = .05) = asking the 95% confidence
interval for the mean includes the hypothesized mean

Reject H0 if 𝑯𝟎 ̅+ 𝝈 ; 𝑿
∉ [𝑿 ̅ − 𝝈]
√𝒏 √𝒏

9.5 TESTING A MEAN: UNKNOWN POPULATION VARIANCE


Involves categorical variables
Two possible outcomes

• Possesses characteristic of interest


• Does not possess characteristic of interest
Fraction or proportion of the population in the category of interest is denoted by

Using Student’s t
When the population standard deviation σ is unknown and the population may be assumed normal
(generally symmetric with no outliers)

➔ the test statistic follows the Student’s t distribution with d.f. = n - 1

SENSITIVITY TO α
Decision is affected by our choice of α. Example:

Using the p-Value


After the p-value is calculated, different analysts can compare it to the level of significance (α)
From Appendix D we can only get a range for the p-value.

p-value < α then reject H0


p-value > α then fail to reject H0
Confidence Interval versus Hypothesis Test
The two-tailed test at the M% level of significance α = a two-tailed (1-M)% confidence interval.
If the confidence interval does not contain H0 → reject H0

9.6 TESTING A PROPORTION


The test statistic, calculated from sample data: the difference between the sample proportion p and the
hypothesized proportion π0 divided by the standard error of the proportion σp:

π0 is a benchmark - does not come from a sample

Calculating the p-Value

Reject H0 if P(Z > zcalc) < α


Otherwise fail to reject H0

Two-Tailed Test
CALCULATING A P-VALUE FOR A TWO-TAILED TEST
In two-tailed test, p-value = 2 x P(Z > zcalc)
Reject H0 if 2 x P(Z > zcalc) < α
Otherwise fail to reject H0

Effect of α
The test statistic zcalc is the same regardless of our choice of α, however, our choice of α does affect the
decision.

Which level of significance is the “right” one depends on how much Type I error we are willing to allow.
Smaller Type I error leads to increased Type II error
Chapter 10
10.1 TWO-SAMPLE TESTS
• A one-sample test compares a sample estimate against a non-sample benchmark
• A Two-sample test compares two sample estimates with each other

Basis of Two-Sample Tests


Situations where two groups are to be compared:

• Before versus after


• Old versus new
• Experimental versus control
The null hypothesis H0: both samples were drawn from populations with the same parameter value
Two samples drawn from the same population → different estimates of a parameter due to chance.
If the two sample statistics differ by more than the amount attributable to chance → that the samples
came from populations with different parameter values
Test Procedure
The testing procedure is like that of one-sample tests.

10.2 COMPARING TWO MEANS: INDEPENDENT SAMPLES


• Samples are randomly and independently drawn
• Populations are normally distributed or both sample sizes are at least 30

Format of Hypotheses

Test Statistic
̅ 𝟏−𝑿
The sample statistic used to test the parameter μ1 - μ2 is 𝑿 ̅ 𝟐 . The test statistic will follow the same
general format as the z- and t-scores in chapter 9

CASE 1: KNOWN VARIANCES

Knowing the values of the population variances, σ12 and σ22, the test statistic: z-score

➔ Use the standard normal distribution to find p-values or critical values of zα.

CASE 2: UNKNOWN VARIANCES ASSUMED EQUAL

➔ Use the Student’s t distribution

Relying on sample estimates s12 and s22

By assuming that the population variances are equal → pool the sample variances by taking a weighted
average of s12 and s22 → calculate an estimate of the common population variance aka pooled
variance - denoted sp2
CASE 3: UNKNOWN VARIANCES ASSUMED UNEQUAL

Population variances are unknown and cannot be assumed to be equal

Replacing σ12 and σ22 with the sample variances s12 and s22
Common situation of testing for a zero difference (D0 = 0)

Test Statistic for Zero Difference of Means

10.3 CONFIDENCE INTERVAL FOR THE DIFFERENCE OF TWO MEANS,


μ1 - μ2
Using a confidence interval estimate to find a range within which the true difference might fall
If the confidence interval for the difference of two means includes zero
➔ there is no significant difference in means
UNKNOWN VARIANCES ASSUMED EQUAL
The difference of means follows a Student’s t distribution with d.f. = (n1 - 1) + (n2 - 1).

UNKNOWN VARIANCES ASSUMED UNEQUAL


Use the t distribution, adding the variances and using Welch’s formula for the degrees of freedom

10.4 COMPARING TWO MEANS: PAIRED SAMPLES


Paired Data
When sample data consist of n matched pairs, a different approach is required.
If the same individuals are observed twice but under different circumstances → paired comparison
If we treat the data as two independent samples, ignoring the dependence between the data pairs, the test is less
powerful

Paired t Test
In the paired t Test we define a new variable d = X1 - X2 as the difference between X1 and X2.
The two samples are reduced to one sample of n differences d1, d2, . . . , dn. Presenting the n observed
differences in column form:

or row form:
We calculate the mean 𝒅̅ and standard deviation sd of the sample of n differences d1, d2, . . . , dn with the
usual formulas for a mean and standard deviation.

The population variance of d is unknown → a paired t test using Student’s t with d.f. = n - 1 to compare
̅ with a hypothesized difference μd (usually μd = 0)
the sample mean difference 𝒅

Analogy to Confidence Interval


A two-tailed test for a zero difference = asking whether the confidence interval for the true mean
difference μd includes zero.

10.5 COMPARING TWO PROPORTIONS


Testing for Zero Difference: π1 - π2 = 0
The three possible pairs of hypotheses:
Sample Proportions
A “success” is any event of interest (not necessarily something desirable)

Pooled Proportion
If H0 is true → no difference between π1 and π2

➔ samples be pooled into one “big” sample → estimate the combined population proportion pc

Test Statistic
Testing for zero difference

Testing for Nonzero Difference (Optional)

10.6 CONFIDENCE INTERVAL FOR THE DIFFER ENCE OF TWO


PROPORTIONS, π1 - π2
A confidence interval for the difference of two population proportions, π1 - π2

The rule of thumb for assuming normality is that np ≥ 10 and n(1 - p) ≥ 10 for each sample
10.7 COMPARING TWO VARIANCES
Format of Hypotheses

An equivalent way to state these hypotheses is to look at the ratio of the two variances

The F Test
The test statistic is the ratio of the sample variances. Assuming the populations are normal, the test statistic
follows the F distribution

If the null hypothesis of equal variances is true, this ratio should be near 1:

Fcalc ≅ 1 (if H0 is true) Do not reject


OR Fcalc > FR or Fcalc < FL →reject the H0
F distribution:
• mean is always greater than 1
• mode (the “peak” of the distribution) is always less than 1

Two-Tailed F Test
Critical values for the F test are denoted FL (left tail) and FR (right tail)
A right-tail critical value FR: found from Appendix F using d.f1. and d.f2.
FR = Fdf1, df2 (right-tail critical F)

To obtain a left-tail critical value FL we reverse the numerator and denominator degrees of freedom
𝟏
𝑭𝑳 = (left-tail critical F with reversed df1 and df2)
𝑭𝒅.𝒇𝟐,𝒅.𝒇𝟏
CHAPTER 11
11.1 OVERVIEW OF ANOVA
Analysis of Variance (ANOVA) allows one to compare more than two means simultaneously.
Variation in Y about its mean is explained by one or more categorical independent variables (the
factors) or is unexplained (random error).

ANOVA is a comparison of means.


Each factor has two or more levels
Treatment: possible value of a factor or combination of factors
Example: examine whether Gender (Male, Female) and Region (North, Central, South) affect Income.

• Two factors: Gender, Region


• Gender has two levels: Male, Female
• Region has three levels: North, Central, South
• Six treatments: (Male, North); (Male, Central); (Male, South); (Female, North); (Female, Central); (Female, South)

ONE FACTOR ANOVA

N-FACTOR ANOVA

ANOVA Assumptions
Analysis of variance assumes that

• Observations on Y are independent.


• Populations being sampled are normal.
• Populations being sampled have equal variances
Test if each factor has a significant effect on Y:

• H0: µ1 = µ2 = µ3 =…= µc
• H1: Not all the means are equal
If we cannot reject H0, we conclude that observations within each treatment have the same mean µ.

11.2 ONE-FACTOR ANOVA (COMPLETELY RANDOMIZED MODEL)


Data Format
Only interested in comparing the means of c groups (treatments or factor levels) → one-factor ANOVA
Sample sizes within each treatment do not need to be equal.
The total number of observations:

n = n1 + n2 + … + n c

Hypotheses to Be Tested
The question of interest is whether the mean of Y varies from treatment to treatment.
o H0: μ1 = μ2 = . . . = μc (all the treatment means are equal)
o H1: Not all the means are equal (at least one pair of treatment means differs)

One-Factor ANOVA as a Linear Model


Observations in treatment j (yj) came from a population with a common mean (μ) + a treatment effect
(Tj) + random error (εij) where 𝑻𝒋 = 𝒚
̅𝒋 − 𝒚
̅

Random error is assumed to be normally distributed with zero mean and the same variance.
If interested only in what happen to the response for the particular levels of the factor (a fixed-effects
model)
If the null hypothesis is true (Tj = 0 for all j) the ANOVA model is:

◼ The same mean in all groups, or no factor effect.

If the null hypothesis is false, in that case the Tj that are negative (below μ) must be offset by the Tj that are
positive (above μ) when weighted by sample size.

Decomposition of Variation

Group Means
The mean of each group is calculated in the usual way by summing the observations in the treatment and
dividing by the sample size

̅ can be calculated by
The overall sample mean or grand mean 𝒚

o summing all the observations and dividing by n


o taking a weighted average of the c sample means
Partitioned Sum of Squares
For a given observation yij, the following relationship holds

This important relationship may be expressed simply as

One-Factor ANOVA Table


Test Statistic
The F statistic is the ratio of the variance due to treatments to the variance due to error.

• MSB is the mean square between treatments


• MSE is the mean square within treatments

F test for equal treatment means is always a right-tailed test.


If there is little difference among treatments

̅𝒋 would be near the overall mean 𝒚


➔ MSB to be near zero because the treatment means 𝒚 ̅.
when F is near zero → not expect to reject H0 (hypothesis of equal group means)

Decision Rule
Use Appendix F to obtain the right-tail critical value of F - denoted Fdf1,df2 or Fc-1,n-c

11.3 MULTIPLE COMPARISONS


Tukey’s Test
• Do after rejection of equal means in ANOVA
• Tells which population means are significantly different

e.g.: μ1 = μ2 ≠ μ3
Tukey’s studentized range test is a multiple comparison test
Tukey’s is a two-tailed test for simultaneous comparison of equality of paired means from c groups
The hypotheses to compare group j with group k:

Tukey’s test statistic

Reject H0 if Tcalc > Tc, n-c


Tc, n-c :critical value for the level of significance
11.4 TESTS FOR HOMOGENEITY OF VARIANCES
ANOVA Assumption
• ANOVA assumes that observations on the response variable are from normally distributed
populations with the same variance.
• The one-factor ANOVA test is only slightly affected by inequality of variance when group sizes are
equal.
• We can easily test this assumption of homogeneous variances by using Hartley’s Fmax Test.

Hartley’s Test

Hartley’s test statistic is the ratio of the largest sample variance to the smallest sample variance:

𝒔𝟐𝒎𝒂𝒙
𝑯𝒄𝒂𝒍𝒄 = 𝟐
𝒔𝒎𝒊𝒏
o Do not reject if Hcalc ≈ 1 the variances are the same
o Reject H0 if Hcalc > Hcritical
Critical values of Hcritical may be found in Harley’s critical value table using degrees of freedom:

• Numerator: df1 = c
𝒏
• Denominator: df2 = − 𝟏
𝒄

11.5 TWO-FACTOR ANOVA WITHOUT REPLICATION (RANDOMIZED


BLOCK MODEL)
Data Format
Two factors A and B may affect Y

• Each row is a level of factor A


• Each column is a level of factor B
• Each factor combination is observed exactly once
• The mean of Y can be computed either across the rows or down the columns
• The grand mean 𝒚 ̅ is the sum of all data values divided by the sample size rc
Two-Factor ANOVA Model
Expressed in linear form, the two-factor ANOVA model:

where

• yjk = observed data value in row j and column k


• μ = common mean for all treatments
• Aj = effect of row factor A (j = 1, 2, . . . , r)
• Bk = effect of column factor B (k = 1, 2, . . . , c)
• εjk = random error normally distributed with zero mean and the same variance for all treatments
Hypotheses to Be Tested
If we are interested only in what happens to the response for the particular levels of the factors:

FACTOR A
• H0: A1 = A2 =. . . = Ar = 0 (row means are the same)
• H1: Not all the Aj are equal to zero (row means differ)

FACTOR B
• H0: B1 = B2 =. . . = BC = 0 (column means are the same)
• H1: Not all the BK are equal to zero (column means differ)
If we are unable to reject either null hypothesis
➔ all variation in Y: a random disturbance around the mean μ:

yjk = μ + εjk
Randomized Block Model
When only one factor is of research interest and the other factor is merely used to control for potential
confounding/perplexing/staggering influences
In the randomized block model
• the column effects: treatments (as in one-factor ANOVA → the effect of interest)
• the row effects: blocks
A randomized block model looks like a two-factor ANOVA and is computed exactly like a two-factor ANOVA
Interpretation may resemble a one-f actor ANOVA since only the column effects (treatments) are of interest
Format of Calculation of Nonreplicated Two-Factor ANOVA

The total sum of squares:

SST = SSA + SSB + SSE


where
• SST = total sum of squared deviations about the mean
• SSA = between rows sum of squares (effect of factor A)
• SSB = between columns sum of squares (effect of factor B)
• SSE = error sum of squares (residual variation)

Limitations of Two-Factor ANOVA without Replication


When replication is impossible or extremely expensive, two-factor ANOVA without replication
must suffice

11.6 TWO-FACTOR ANOVA WITH REPLICATION (FULL FACTORIAL


MODEL)
What Does Replication Accomplish?
• With multiple observations within each cell → more detailed statistical tests
• With an equal number of observations in each cell (balanced data)
➔ a two-factor ANOVA model with replication
• Replication allows to test
o the factors’ main effects
o an interaction effect
This model is called the full factorial model. In linear model format:

yijk = μ + Aj + Bk + ABjk + εijk


where

• yijk = observation i for row j and column k (i = 1, 2, . . . , m)


• μ = common mean for all treatments
• Aj = effect attributed to factor A in row j (j = 1, 2, . . . , r)
• Bk = effect attributed to factor B in column k (k = 1, 2, . . . , c)
• ABjk = effect attributed to interaction between factors A and B
• εijk = random error (normally distributed, zero mean, same variance for all treatments)
Format of Hypotheses
FACTOR A: ROW EFFECT
• H0: A1 = A2 = . . . = Ar = 0 (row means are the same)
• H1: Not all the Aj are equal to zero (row means differ)

FACTOR B: COLUMN EFFECT


• H0: B1 = B2 = . . . = Bc = 0 (column means are the same)
• H1: Not all the Bk are equal to zero (column means differ)

INTERACTION EFFECT
• H0: All the ABjk = 0 (there is no interaction effect)
• H1: Not all ABjk = 0 (there is an interaction effect)

Format of Data
Data Format of Replicated Two-Factor ANOVA

Sources of Variation
The total sum of squares is partitioned into four components:

SST = SSA + SSB + SSI + SSE


where

• SST = total sum of squared deviations about the mean


• SSA = between rows sum of squares (effect of factor A)
• SSB = between columns sum of squares (effect of factor B)
• SSI = interaction sum of squares (effect of AB)
• SSE = error sum of squares (residual variation)
Interaction Effect

• In the absence of an interaction, the lines will be roughly parallel or will tend to move in the same
direction at the same time.
• A strong interaction, the lines will have differing slopes and will tend to cross one another

Tukey Tests of Pairs of Means


Multiple comparison

• Significant differences at α = .05 between clinics C, D and between suppliers (1, 4) and (3, 5).
• At α = .01 there is also a significant difference in means between one pair of suppliers (4, 5).
Significance versus Importance
MegaStat’s table of means (Figure 11.23) allows us to explore these differences further and to assess the
question of importance as well as significance.

The largest differences in means between clinics or suppliers are about 2 days. Such a small difference
might be unimportant most of the time.
However, if their inventory is low, a 2-day difference could be important
Chapter 12
12.1 VISUAL DISPLAYS AND CORRELATION ANALYSIS
Visual Displays
Analysis of bivariate data (i.e., two variables) typically begins with a scatter plot that displays each
observed data pair (xi, yi) as a dot on an X-Y grid.
➔ initial idea of the relationship between two random variables.

Correlation Coefficient
Sample correlation coefficient (Pearson correlation coefficient) - denoted r - measures the degree of
linearity in the relationship between two random variables X and Y.

Its value will fall in the interval [-1, 1].

• Negative correlation:
o xi is above its mean
o yi is below its mean
• Positive correlation: xi and yi are above/below their means at the same time
Three terms called sums of squares

The formula for the sample correlation coefficient

√𝑺𝑺𝒙𝒚
𝒓=
√𝑺𝑺𝒙𝒙 √𝑺𝑺𝒚𝒚
Correlation coefficient only measures the degree of linear relationship between X and Y
Tests for Significant Correlation Using Student’s t
The sample correlation coefficient r is an estimate of the population correlation coefficient ρ

To test the hypothesis H0: ρ = 0, the test statistic

Compare this t test statistic with a critical value of t for a one-tailed or two-tailed test from Appendix D
using d.f. = n - 2 and any desired α.

After calculating tcalc → Find p-value by using Excel’s function =T.DIST.2T(tcalc,deg_freedom)

Critical Value for Correlation Coefficient


Equivalent approach → Calculate a critical value for the correlation coefficient

First: look up the critical value of t from Appendix D with d.f. = n - 2 degrees of freedom and chosen α

Then, the critical value of the correlation coefficient, rcritical

𝒕
𝒓𝒄𝒓𝒊𝒕𝒊𝒄𝒂𝒍 =
√𝒕𝟐 + 𝒏 − 𝟐
• a benchmark for the correlation coefficient
• no p-value
• inflexible when changing α
In very large samples, even very small correlations could be “significant”
A larger sample does not mean that the correlation is stronger nor does its increased significance imply
increased importance.
12.2 SIMPLE REGRESSION
What Is Simple Regression?
The simple linear model in slope-intercept form: Y = slope × X + y-intercept. In statistics this straight-line
model is referred as a simple regression equation.

• the Y variable as the response variable (the dependent variable)


• the X variable as the predictor variable (the independent variable)
Only the dependent variable (not the independent variable) is treated as a random variable

Interpreting an Estimated Regression Equation


Cause and effect is not proven by a simple regression
➔ cannot assume that the explanatory variable is “causing” the variation in the response variable

Prediction Using Regression


Predictions from our fitted regression model are stronger within the range of our sample x values.
The relationship seen in the scatter plot may not be true for values far outside our observed x range.
Extrapolation outside the observed range of x is always tempting but should be approached with
caution.

12.3 REGRESSION MODELS


Model and Parameters
The regression model’s unknown population parameters denoted by β0 (the intercept) and β1 (the
slope).

y = β0 + β1x + ε (population regression model)

Inclusion of a random error ε is necessary because other unspecified variables also may affect Y

The regression model without the error term represents the expected value of Y for a given x value
called simple regression equation

E(Y|x) = β0 + β1x (simple regression equation)

The regression assumptions.

• Assumption 1: The errors are normally distributed with mean 0 and standard deviation σ.
• Assumption 2: The errors have constant variance, σ2.
• Assumption 3: The errors are independent of each other.
The regression equation used to predict the expected value of Y for a given value of X:

̌ = 𝒃𝟎 + 𝒃𝟏 𝒙 (estimated regression equation)


𝒚
• coefficients b0 (estimated intercept)
• b1 (estimated slope)

̌i: a residual - ei
The difference between the observed value yi and its estimated value 𝒚

The residual is the vertical distance between each yi and the estimated regression line on a scatter
plot of (xi,yi) values.

12.4 ORDINARY LEAST SQUARES FORMULAS


Slope and Intercept
The ordinary least squares method (OLS method): estimate a regression so as to ensure the best fit
➔ selected the slope and intercept → residuals are as small as possible.
Residuals be either positive or negative, and around the regression line always sum to zero

̌
The fitted coefficients b0 and b1 are chosen so that the fitted linear model 𝒚 = 𝒃𝟎 + 𝒃𝟏 𝒙 has the
smallest possible sum of squared residuals (SSE):

Differential calculus used to obtain the coefficient estimators b0 and b1 that minimize SSE
The OLS formula for the slope can also be:

OLS Regression Line Always Passes Through (𝒙 ̅)


̅, 𝒚

Sources of Variation in Y
The total variation as a sum of squares (SST), split the SST into two parts:

• SST = total sum of squares


o Measures the variation of the yi values around their mean, y
• SSR = regression sum of squares
o Explained variation attributable to the linear relationship between x and y
• SSE = error sum of squares
o Variation attributable to factors other than the linear relationship between x and y

Coefficient of Determination
The coefficient of determination: the portion of the total variation in the dependent variable that is
explained by variation in the independent variable

The coefficient of determination called R-squared - denoted as R2

SSR SSE
R2 = = 1−
SST SST noted that 0  R2  1
Examples of Approximate R2 Values: The range of the coefficient of determination is 0 ≤ R2 ≤ 1.

12.5 TESTS FOR SIGNIFICANCE


Standard Error of Regression
An estimator for the variance of the population model error
n

e 2
i
SSE
σˆ 2 = s2e = i=1
=
n−2 n−2
Division by n – 2 → the simple regression model uses two estimated parameters, b0 and b1

se = s2e
the standard error of the estimate

COMPARING STANDARD ERRORS

The magnitude of se should always be judged relative to the size of the y values in the sample data
INFERENCES ABOUT THE REGRESSION MODEL
The variance of the regression slope coefficient (b1) is estimated by

s2e s2e
sb =
2
=
1
 (xi − x) (n − 1)s2x
2

where:

𝑺𝒃𝟏 = Estimate of the standard error of the least squares slope

𝑺𝑺𝑬
𝒔𝒆 = √𝒏−𝟐 = Standard error of the estimate

Confidence Intervals for Slope and Intercept

These standard errors → construct confidence intervals for the true slope and intercept
using Student’s t with d.f. = n - 2 degrees of freedom and any desired confidence level.

Hypothesis Tests
if β1 = 0 ➔ X does not influence Y
→ the regression model collapses to a constant β0 + a random error term:

For either coefficient, we use a t test with d.f. = n - 2. The hypotheses and test statistics
SLOPE VERSUS CORRELATION
The test for zero slope is the same as the test for zero correlation.

➔ The t test for zero slope will always yield exactly the same tcalc as the t test for zero correlation

12.6 ANALYSIS OF VARIANCE: OVERALL FIT


Decomposition of Variance

F Statistic for Overall Fit


To test a regression for overall significance → an F test to compare the explained (SSR) and
unexplained (SSE) sums of squares
ANOVA Table for simple regression

The formula for the F test statistic

F Test p-Value and t Test p-Value


the F test always yields the same p-value as a two-tailed t test for zero slope → same p-value as a two-
tailed test for zero correlation

The relationship between the test statistics is Fcalc = t2calc


12.7 CONFIDENCE AND PREDICTION INTERVALS FOR Y
Construct an Interval Estimate for Y

Quick Rules for Confidence and Prediction Intervals


A really quick 95% interval → plug in t = 2 (since most 95 percent t-values are not far from 2)

You might also like