Pdf24 Merged
Pdf24 Merged
• Categorical (qualitative) data: values that are described by words rather than
numbers - nonnumerical values - Verbal label. Values of the categorical variable
might be represented using numbers - Coded
• Numerical (quantitative) data: arise from counting, measuring something, or some
kind of mathematical operation. Two types: Discrete (integers), Continuous
(physical measurements, financial variables)
Nominal Measurement
Nominal data: the weakest level of measurement and the easiest to recognize, identify a
category. “Nominal” data are the same as “qualitative”, “categorical” or “classification” data.
The only permissible mathematical operations are counting (e.g., frequencies).
➔ No ordering
Ordinal Measurement
Ordinal data codes connote – imply - a ranking of data values. There is no clear meaning to
the distance between. Like nominal data, ordinary data lack the properties that are required
to compute many statistics, such as the average. Ordinal data can be treated as nominal,
but not vice versa.
➔ Ordering, but differences have no meaning.
Interval Measurement
Interval data is not only a rank but also has meaningful intervals between scale points.
Intervals between numbers represent distances → mathematical operations: taking an
average. Ratios are not meaningful for interval data.
➔ Differences have meaning, but ratios have no meaning.
Ratio Measurement
The data have all the properties of interval data and the ratio of two values is meaningful.
The measurements have a true zero value. We can recode ratio measurements downward
into ordinal or nominal measurements (but not conversely)
➔ Ratios have meaning.
• A sample: looking only at some items selected from the population - an observed
subset of the population. Be preferred when: Infinite Population, Destructive Testing,
Timely Results, Accuracy, Cost, Sensitive Information
Target Population
• The target population contains all the individuals in which we are interested
• The sampling frame is the group from which we take the sample
2.4 SAMPLING METHODS
Two main categories: random sampling and non-random sampling
Systematic Sample
Systematic sample: choose every kth item from a sequence or list, starting from a
randomly chosen entry among the first k items on the list.
Cluster Sampling
Divide population into several “clusters” (e.g. regions), each representative of the population
Sources of Error
2.6 SURVEYS
SURVEY
• Step 1: State the goals of the research
Questionnaire Design
• Begin with short, clear instructions.
• State the survey purpose.
• Assure anonymity.
• Instruct on how to submit the completed survey.
• Break survey into naturally occurring sections
• Let respondents bypass sections that are not applicable (e.g., “if you answered no to
question 7, skip directly to Question 15”).
Chapter 3
Graphical Presentation of Data
• Data in raw form are usually not easy to use for decision making
• Some type of organization like graph or table is needed
• The type of graph to use depends on the variable being summarized
Summary table
Pareto Diagram
• A Pareto chart displays categorical data, with categories displayed in descending
order of frequency
• A cumulative polygon is often shown in the same graph
• Commonly used in quality management to display the frequency of defects or errors
of different types.
Stem-and-Leaf Diagram
A simple way to see distribution details in a data set
METHOD: Separate the sorted data series into leading digits (the stem) and the trailing
digits (the leaves)
For two-digit or three-digit integer data, the stem is the tens digit of the data, and the leaf is
the ones digit
Dot Plots
A dot plot is another simple graphical display of n individual values of numerical data,
The basic steps in making a dot plot are to
1. Make a scale that covers the data range
2. Mark axis demarcations and label them
3. Plot each data value as a dot above the scale at its approximate location
If more than one data value lies at approximately the same X-axis location, the
dots are piled up vertically
• Easy to understand
• Show variability
• Show the center and where the midpoint lies
• Reveal some things about the shape of the distribution
• Not good for large samples (e.g., > 5,000).
Dot plots have limitations.
• Don’t reveal very much information about the data set’s shape when the
sample is small
• Become awkward when the sample is large (what if you have 100 dots at the
same point?)
• When have decimal data.
• The class boundaries (or class midpoints) are shown on the X axis
• the Y axis is either frequency, relative frequency, or percentage
• Bars of the appropriate heights are used to represent the number of
observations within each class. No gap between bars
Scatter Plots
A scatter plot shows n pairs of observations (x1, y1), (x2, y2), . . ., (xn, yn) as dots (or some
other symbol) on an X-Y graph
• Investigate the relationship between two variables → association between two variables
• Convey patterns in data pairs that would not be apparent from a table.
Time Series Plot
• Used to study patterns in the values of a variable over time.
• One variable is measured on the X axis
• The time period is measured on the Y axis.
• Can display several variables at once
Log Scales
Useful for time series data: be expected to grow at a compound annual percentage rate
(e.g., GDP, the national debt, or your future income).
Reveal whether the quantity is growing at an
Deceptive Graphs
Error 1: Nonzero Origin: A nonzero origin will exaggerate the trend
Error 2: Elastic Graph Proportions: By shortening the X-axis in relation to the Y-axis,
vertical change is exaggerated.
Error 3: Dramatic Titles and Distracting Pictures: The title often is designed more to
grab the reader’s attention than to convey the chart’s content
Error 4: 3-D and Novelty Graphs: Depth may enhance the visual impact of a bar chart,
but it introduces ambiguity in bar height
Error 5: Rotated Graphs: By making a graph 3-dimensional and rotating, trends appear
to dwindle into the distance or loom alarmingly toward you
Error 8: Complex Graphs: Complicated visual displays make the reader work harder.
Error 11: Area Trick: Simultaneously enlarging the width of the bars as their height
increases → bar area misstates the true proportion
CHAPTER 4
Population: µ
Sample: 𝒙
Median
The median (denoted M) is the 50th percentile or midpoint of the ordered (ordered array)
sample data set x1, x2, …, xn.
Mode
• Value that occurs most frequently
• Not affected by extreme values (outliners)
• Used for either numerical or categorical (nominal) data
• There may be no mode
• There may be several modes
• Most useful for discrete or categorical data
Geometric Mean
The geometric mean (denoted G) is a measure of central tendency, obtained by multiplying
the data values and then taking the 𝒏𝒕𝒉 root of the product.
𝐆 = 𝐧√𝐱 𝟏 𝐱 𝟐 𝐱 𝟑 … 𝐱 𝐧
All the data values: x > 0
• Used to measure the rate of change of a variable over time
𝟏
𝐗 𝐆 = ( 𝐱 𝟏 𝐱 𝟐 𝐱 𝟑 … 𝐱 𝐧 )𝐧
• Geometric mean of rate of return
o Measures the status of an investment over time
𝟏
𝐑 𝐆 = [(𝟏 + 𝐑 𝟏 )(𝟏 + 𝐑 𝟐 )(𝟏 + 𝐑 𝟑 ) … (𝟏 + 𝐑 𝐧 )]𝐧 − 𝟏
Where Ri is the rate of return in time period i
SUMMARY TABLE
Quartiles
The quartiles (denoted Q1, Q2, Q3): scale points that divide the ordered data into four
groups of approximately equal size: the 25th, 50th, and 75th percentiles
• Q1 is the value for which 25% of the observations are smaller and 75% are larger
• Q2 is the same as the median (50% are smaller, 50% are larger)
• Only 25% of the observations are greater than Q3
The first quartile Q1 is the median of the data values below Q2, and the third quartile Q3 is
the median of the data values above Q2
Find a quartile by determining the value in the appropriate position (𝒙𝒏 ) in the ordered
data
Range
The range is the difference between the largest and smallest observations:
Interquartile Range
The interquartile range Q3 – Q1 (denoted IQR) measures the degree of spread in the data
(the middle 50 percent).
IQR = Q3 – Q1
Fences and Unusual Data Values
We can use the quartiles to identify unusual data points. Detect data values that are far
below Q1 or far above Q3.
Midhinge
An additional measure of center that has the advantage of not being influenced by
outliers. The midhinge is the average of the first and third quartiles:
𝑄1 + 𝑄3
Midhinge =
2
A new way to describe skewness:
Variance
̅) from the mean divided by the sample size
The sum of squared deviations (𝒙 − 𝒙
➔ Average (approximately) of squared deviations of values from the mean
When
𝑥̅ = mean 𝑛 2
∑𝑖=1(𝑥𝑖 − 𝑥̅ )
n = sample size 𝑠2 =
𝑥𝑖 = ith value of the variable x 𝑛−1
Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
• A measure of the “average” scatter around the mean
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑆=√
𝑛−1
❖ The more spread out the data, the higher the standard deviation and vice versa
Notice
Standard deviations
• can be compared only for data sets measured in the same units.
• should not be compared if the means differ greatly.
Coefficient of Variation
To compare dispersion in data sets with
Example
We can say that ATM deposits have much greater relative dispersion (120 percent) than
either defect rates (18 percent) or P/E ratios (62 percent).
SUMMARY TABLE
Shapes of Distribution
• Describes how data are distributed
• Measures of shape:
o Symmetry / asymmetry
o Peakedness
Skewness
Measure symmetry/asymmetry of a distribution
• compare two samples measured in different units (say, dollars and yen)
• compare one sample with a known reference distribution
Kurtosis
Kurtosis refers to the relative length of the tails and the degree of concentration in the
center.
Chebyshev’s Theorem
Regardless of how the data are distributed, at least (1 - 1/k2) x 100% of the values will
fall within k standard deviations of the mean (for k > 1)
Example
Data values outside μ ± 3σ are rare (less than 1%) in a normal distribution → outliers.
Z Scores
A measure of distance from the mean (for example, a Z-score of 2.0 means that a value is
2.0 standard deviations from the mean)
Formula for a population Formula for a sample
𝑥𝑖 − 𝜇 𝑥𝑖 − 𝑥̅
𝑍= 𝑍=
𝜎 𝑠
Based on its standardized z-score, a data value is classified as:
• Use when data is already grouped into n classes, with wi values in the ith class
❖ Suppose a data set contains midpoint values m1, m2, . . ., mk, occurring with
frequencies f1, f2, . . . fK
o For a population of N observations the variance is
Note:
o Only concerned with the strength of the relationship
o No causal effect is implied
Coefficient of Correlation
Correlation coefficient (denoted r) describes the degree of linearity between paired
observations on two quantitative variables X and Y.
➔ Measures the relative strength of the linear relationship between two variables
Sample coefficient of correlation
OR
Features of Correlation Coefficient, r
• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative (go down) linear relationship
• The closer to 1, the stronger the positive (go up) linear relationship
• The closer to 0, the weaker the linear relationship
SUMMARY
• Described measures of central tendency
o Mean, median, mode, geometric mean
• Discussed quartiles
• Described measures of variation
o Range, interquartile range, variance and standard deviation, coefficient of
variation, Z-scores
• Illustrated shape of distribution
• Symmetric, skewed, box-and-whisker plots
• Discussed covariance and correlation coefficient
Chapter 5
5.1 RANDOM EXPERIMENTS
Sample Space
Random experiment is an observational process whose results cannot be known in
advance.
The set of all possible outcomes (denoted S) is the sample space for the experiment.
◼ A sample space with a countable number of outcomes is discrete.
Event
An event is any subset of outcomes in the sample space
5.2 PROBABILITY
The probability of an event is a number that measures the relative likelihood that the event
will occur.
• The probability of an event A, denoted P(A), must lie within the interval from 0 to 1:
0 ≤ 𝑃(𝐴) ≤ 1
o If P(A) = 0: The event cannot occur
O If P(A) = 1: The event is certain to occur
Assigning Probability
Three distinct ways of assigning probability:
Empirical Approach
Counting the frequency of observed outcomes (f) defined in our experimental sample
𝑓
space and dividing by the number of observations (n). The estimated probability is
𝑛
Classical Approach
A priori: the process of assigning probabilities before actually observe the event or try an
experiment
When flipping a coin, rolling a pair of dice cards, lottery numbers, and roulette, the nature of
the process allows us to envision the entire sample space.
Subjective Approach
A subjective probability reflects someone’s informed judgment about the likelihood of
an event - needed when there is no repeatable random experiment.
• A∪B
• “A or B”
• A∩B
• “A and B”
If A B = , then P(A B) = 0
• Example:
o Event A = a day in January. Even B = a day in February
Conditional Probability
The probability of event A given that event B has occurred is a conditional probability.
Denoted P(A | B). The vertical line “ | ” is read as “given”.
𝑷(𝑨 𝑩 )
P(A | B) = for P(B) > 0
𝑷(𝑩)
Independent Events
Two events are independent if and only if:
P(A | B) = P(A)
Events A and B are independent when the probability of one event is not affected by the
fact that the other event has occurred
Multiplication Rules
Using algebra, we can rewrite the formula of conditional probability
Joint Probabilities
A joint probability representing the probability of the intersection of two events.
Found by diving the cell (except the total row and column) by the total sample size
𝟑𝟓
𝑷(𝑮𝑷𝑺 ∩ 𝑨𝑪) = = 𝟎. 𝟑𝟎
𝟏𝟎𝟎
𝟓𝟓
𝑷(𝒏𝒐𝑮𝑷𝑺 ∩ 𝑨𝑪) = 𝟏𝟎𝟎 = 𝟎. 𝟓𝟓
Marginal probability
The marginal probability of an event is found by dividing a row or column total by the total
sample size.
𝟗𝟎
𝑷(𝑨𝑪) = = 0,9
𝟏𝟎𝟎
𝟒𝟎
𝑷(𝑮𝑷𝑺) = = 0,4
𝟏𝟎𝟎
Conditional probability
Found by restricting ourselves to a single row or column of the given condition.
Dividing the cell by the total of the given condition
𝑮𝑷𝑺 ∩ 𝑨𝑪 35
𝑷(𝑮𝑷𝑺|𝑨𝑪) = = = 0,3889
𝑨𝑪 90
𝑵𝒐𝑨𝑪 ∩ 𝑮𝑷𝑺 5
𝑷(𝑵𝒐𝑨𝑪|𝑮𝑷𝑺) = = = 0,125
𝑮𝑷𝑺 40
P( A | B) P( B)
P( B | A) =
P( A)
❖ In situations where P(A) is not given, the form of Bayes’ Theorem is:
n
Counting Rules
Fundamental Rule of Counting
Counting Rule 1:
If event A can occur in 𝒏𝟏 ways and event B can occur in 𝒏𝟐 ways, then events A and B
can occur in 𝒏𝟏 × 𝒏𝟐 ways. In general, the number of ways that m events can occur:
𝒏𝟏 × 𝒏𝟐 × … × 𝒏𝒎
Example: You want to go to a park, eat at a restaurant, and see a movie. There are 3 parks,
4 restaurants, and 6 movie choices. How many different possible combinations are there?
Answer: (3)(4)(6) = 72 different possibilities
Counting Rule 2:
If any one of k different mutually exclusive and collectively exhaustive events can occur
on each of n trials, the number of possible outcomes is equal to
𝑘𝑛
Example: If you roll a fair die 3 times then there are 63 = 216 possible outcomes
Factorials
The number of unique ways that n items can be arranged in a particular order: n factorial,
the product of all integers from 1 to n
n! = 1.2.3…(n-2)(n-1)(n)
Example: You have five books to put on a bookshelf. How many different ways can
these books be placed on the shelf?
Answer: 5! = (5)(4)(3)(2)(1) = 120 different possibilities
Permutations
Choose X items at random without replacement from a group of n items. The number of
ways of arranging X objects selected from n objects in order is
n!
n Px =
(n − X)!
Example: You have five books and are going to put three on a bookshelf. How
many different ways can the books be ordered on the bookshelf?
n! 5! 120
n Px = = = = 60
Answer: (n − X)! (5 − 3)! 2 different possibilities
Combinations
A combination is a collection of X items chosen at random without replacement from n
items. The number of ways of selecting X objects from n objects, irrespective of order, is
n!
Cx =
X! (n − X)!
n
Example: You have five books and are going to randomly select three to read. How many
different combinations of books might you select?
n! 5! 120
Cx = = = = 10
X! (n − X)! 3! (5 − 3)! (6)(2)
n
Answer: different possibilities
CHAPTER 6
6.1 DISCRETE PROBABILITY DISTRIBUTIONS
Random Variables
A random variable is a function or rule that assigns a numerical value to each outcome
in the sample space of a random experiment.
A discrete random variable has a countable number (integer) of distinct values.
• Some have a clear upper limit (e.g., number of absences in a class of 40 students)
• Others do not (e.g., number of text messages you receive in a given hour).
A continuous random variable produces outcomes from a measurement
Probability Distributions
Discrete Probability Distribution
A discrete probability distribution assigns a probability to each value of a discrete
random variable X. The distribution must follow the rules of probability
▪ The probability for any given value of X
0 ≤ 𝑃(𝑥𝑖 ) ≤ 1
▪ The sum of all values of X
𝑛
∑ 𝑃(𝑥𝑖 ) = 1
𝑖=1
More than one random variable value can be assigned to the same probability, but one
random variable value cannot have two different probabilities.
𝐹(𝑥0 ) = ∑ 𝑃(𝑥)
𝑋≤𝑥0
𝐸(𝑥) = 𝜇 = ∑ 𝑥𝑖 𝑃(𝑥𝑖 )
𝑥𝑖 =1
The standard deviation is the square root of the variance and is denoted σ:
𝝈 = √𝝈 = √𝑽𝒂𝒓(𝑿)
6.3 UNIFORM DISTRIBUTION
Characteristics of the Uniform Distribution
The uniform distribution describes a random variable with a finite number of consecutive
integer values from a to b.
Each value of the random variable is equally likely to occur
Binomial Distribution
The binomial distribution arises when a Bernoulli experiment is repeated n times
In a binomial experiment, X = the number of success in n trials
➔ binomial random variable X is the sum of n independent Bernoulli random
variables
The number of combinations of selecting X objects out of n objects is
n!
n Cx =
X! (n − X)!
P(X = x) is determined by the two parameters n and π. The binomial probability function:
𝒏!
𝑷(𝑿) = 𝝅𝑿 (𝟏 − 𝝅)𝒏−𝒙
𝑿! (𝒏 − 𝑿!)
Where
• N = population size
• s = number of items of interest in the population
• N – s = number of events not of interest in the population
• n = sample size
• x = number of items of interest in the sample
• n – x = number of events not of interest in the sample
Example: 3 different computers are selected from 10 in the department. 4 of the 10
computers have illegal software loaded. What is the probability that 2 of the 3 selected
computers have illegal software loaded?
o N = 10 ( C )( C )
o n=3 P(X = 2 | 3,10,4) = 4 2 6 1
= 0.3
o s=4 ( C)
10 3
o x=2
➔ The probability that 2 of the 3 selected computers have illegal software loaded is
0.30, or 30%.
Geometric Distribution
The geometric distribution describes the number of Bernoulli trials until the first
success.
Mean:
μ=λ Variance σ2 = λ Standard Deviation σ= λ
Always right-skewed. The larger the , the less right-skewed the distribution
X=3
=2 e − λ x e −2 (2)3
P(X = 3 | 2) = =
x! 3!
(2.71828−2 )(2)3
= = 0.18
3!
Optional: Use the Poisson approximation (as an
alternative) to the binomial
• The Poisson distribution may be used to approximate a binomial by setting
= n
• This approximation is helpful when n is large.
• A common rule of thumb says the approximation is adequate if
n 20 and (= ) .05
𝒏
❖ Rule of Thumb: If n/N < 0.05, we can use the binomial approximation to the
hypergeometric, using sample size n and = s/N.
Two useful rules about the mean and variance of a transformed random variable aX + b,
where a and b are any constants (a ≥ 0).
❖ Example: Professor Hardtack gave a tough exam whose scores had μ = 40 and σ = 10,
→ raise the mean 20 points. One way: by adding 20 points to every student’s score.
• Rule 1: adding a constant to all X-values will shift the mean but will leave the
standard deviation unchanged.
Alternatively, by multiply every exam score by 1.5 (40x1.5 = 60)
Covariance
When X and Y are dependent, the covariance of them, denoted by Cov(X,Y) or σxy,
describes how the variables vary in relation to each other.
• Cov(X,Y) > 0 : indicates that the two variables move in the same direction
• Cov(X,Y) < 0 : indicates that the two variables move in opposite direction.
We use both the covariance and the variances of X and Y to calculate the standard
deviation of the sum of X and Y
CHAPTER 7
7.1 CONTINUOUS PROBABILITY DISTRIBUTIONS
A Continuous Variable is a variable that can assume any value in an interval:
o thickness of an item
o time required to complete a task
o temperature in a room
These can potentially take on any value, depending only on the ability to measure
accurately.
❖ Discrete Variable: each value of X has its own probability P(X).
❖ Continuous Variable: events are intervals and probabilities are areas underneath
smooth curves. A single point has no probability.
1
e −(x−μ) /2σ
2 2
f(x) =
2π
Where
F(x 0 ) = P(X x 0 )
Finding Normal Probabilities
The probability for a range of values is measured by the area under the curve
F(b) = P(X b)
F(a) = P(X a)
Summary of Normal Distributions
7.4 STANDARD NORMAL DISTRIBUTION
Characteristics of the Standard Normal
Any normal distribution can be transformed into the standardized normal distribution
(Z), with mean 0 and variance 1
By subtracting the mean and dividing by the standard deviation to produce a
standardized variable
The shape of the distribution is unaffected by the z transformation, only the scale
changed. We can express the problem in original units (X) or in standardized units (Z).
For a given Z-value a , the table shows F(a) (the area under the curve from − to a)
Rule of thumb: when n > 10 and n(1- ) > 10, then it is appropriate to use the normal
approximation to the binomial.
The binomial mean and standard deviation will be equal to the normal µ and
μ = nπ
σ = nπ (1 − π)
Normal Approximation to the Poisson
The normal approximation to the Poisson works best when is large
Set the normal µ and equal to the Poisson mean and standard deviation.
μ=λ
σ= λ
7.6 EXPONENTIAL DISTRIBUTION
Characteristics of the Exponential Distribution
Often used to model the length of time between two occurrences of an event (the time
between arrivals)
Examples:
P(arrival time X) = 1 − e − λX
Where: λ (lambda): mean of arrivals per unit (same as Poisson distribution)
Finding Probability
Example: Customers arrive at the service counter at the rate of 20 per hour. What is the
probability that the arrival time between consecutive customers is less than 6 minutes?
Examples of estimators
Example: Consider eight random samples of size n = 5 from a large population of GMAT scores.
Sample mean is a statistic
Sample mean used to estimate population mean is an estimator
Sampling error
Sampling error: the difference between an estimate and the corresponding population parameter.
Example for population mean
̅− 𝝁
Sampling error = 𝑿
Properties of Estimators
BIAS
The bias: the difference between the expected value of the estimator and the true parameter. Example
for the average value
̅)
Bias = 𝑬(𝒙 −𝝁
An unbiased estimator neither overstates nor understates the true parameter on average. Example of
unbiased estimator 𝑬(𝒙
̅) = 𝝁
̅) and sample proportion (p) are unbiased estimators of μ and π
The sample mean (𝒙
EFFICIENCY
Efficiency refers to the variance of the estimator’s sampling distribution
Smaller variance means more efficient estimator. We prefer the minimum variance estimator - MVUE
CONSISTENCY
Consistent estimator converges toward the parameter being estimated as the sample size increases
The variances of three estimators 𝒙
̅, s and p diminish as n increases, so all are consistent estimators.
1. If the population is normal, the sample mean has a normal distribution centered at μ, with a standard
𝝈
error equal to 𝝈𝑿̅ =
√𝒏
2. As sample size n increases, the distribution of sample means converges to the population mean μ
𝝈
(i.e., the standard error of the mean 𝝈𝑿̅ = gets smaller).
√𝒏
3. Even if your population is not normal, by the Central Limit Theorem, if the sample size is large
enough, the sample means will have approximately a normal distribution.
• The distribution of sample means drawn from the population will be normal
• The standard error of the sample mean 𝛔𝐗̅ will decrease as sample size increases
SKEWED POPULATION
The CLT predicts
• The distribution of sample means drawn from any population will approach normality
• The standard error of the sample mean 𝛔𝐗̅ will diminish as sample size increases.
In highly skewed populations, even n ≥ 30 will not ensure normality, though it is not a bad rule
In severely skewed populations, the mean is a poor measure of center to begin with due to outliers
Histograms of the actual means of many samples drawn from this uniform population
We use the familiar z-values for the standard normal distribution. If we know μ and σ, the CLT allows
us to predict the range of sample means for samples of size n:
8.3 SAMPLE SIZE AND STANDARD ERROR
𝝈
The key is the standard error of the mean: 𝝈𝑿̅ = The standard error decreases as n increases
√𝒏
To halve (÷2) the standard error, you must quadruple (x4) the sample size
𝝈
You can make the interval 𝝈𝑿̅ = ̿ of sample
as small as you want by increasing n. The mean 𝑿
√𝒏
̅ converges to the true population mean μ as n increases.
means 𝑿
Construct a confidence interval for the unknown mean μ by adding and subtracting a margin of error
̅ , the mean of our random sample
from 𝑿
The confidence level for this interval is expressed as a percentage such as 90, 95, or 99 percent
Interpretation
If you took 100 random samples from the same population and used exactly this procedure to construct
100 confidence intervals using a 95 percent confidence level
➔ approximately 95 (95%) of the intervals would contain the true mean μ, while approximately 5 (5%)
intervals would not
Student’s t Distribution
When σ is unknown → the formula for a confidence interval resembles the formula for known σ except
that t replaces z and s replaces σ.
The confidence intervals will be wider (other things being the same) - tα/2 is always greater than zα/2.
Degrees of Freedom
Knowing the sample size allows us to calculate a parameter called degrees of freedom - d.f. - used to
determine the value of the t statistic used in the confidence interval formula.
Comparison of z and t
As degrees of freedom increase, the t-values approach the familiar normal z-values.
Using Appendix D
Beyond d.f. 5 50, Appendix D shows d.f. in steps of 5 or 10. If Appendix D does not show the exact degrees
of freedom that you want, use the t-value for the next lower d.f.
• Sample size n
• Confidence level
• Sample proportion p
A narrower interval (i.e., more precision) → increase the sample size or reduce the confidence level
(e.g., from 95 percent to 90 percent)
Rule of Three
If in n independent trials, no events occur, the upper 95% confidence bound is approximately 3/n
Estimate π
Method 1: Assume That π = .50
Method 2: Take a Preliminary Sample
Take a small preliminary sample and insert p into the sample size formula in place of π
Method 3: Use a Prior Sample or Historical Data
(𝒏 − 𝟏)𝒔𝟐 𝟐
(𝒏 − 𝟏)𝒔𝟐
≤ 𝝈 ≤
𝑿𝟐𝑼 𝑿𝟐𝑳
Chapter 9
9.1 LOGIC OF HYPOTHESIS TESTING
The analyst states the assumption, called a hypothesis, in a format that can be tested using well-known
statistical procedures.
Hypothesis testing is an ongoing/iterative/continuous process.
• Rejecting the null hypothesis when it is true is a Type I error (a false positive).
• Failure to reject the null hypothesis when it is false is a Type II error (a false negative).
The power of a test is the probability that a false hypothesis will be rejected. Reducing β would
correspondingly increase power (e.g. increase the sample size)
Both α and β can be reduced simultaneously only by increasing the sample size
The direction of the test is indicated by which way the inequality symbol points in H1:
Decision Rule
Compare a sample statistic to the hypothesized value of the population parameter stated in the null
hypothesis
• Extreme outcomes occurring in the left tail → reject the null hypothesis in a left-tailed test
• Extreme outcomes occurring in the right tail → reject the null hypothesis in a right-tailed test
The area under the sampling distribution curve that defines an extreme outcome: the rejection region
Calculating a test statistic that measures the difference between the sample statistic and the
hypothesized parameter
➔ A test statistic that falls in the shaded region → rejection of H0
Critical Value
The critical value: the boundary between the two regions (reject H0, do not reject H0).
The decision rule states what the critical value of the test statistic would have to be in order to reject
H0 at the chosen level of significance (α).
The choice of α should precede the calculation of the test statistic, thereby minimizing the temptation to select α
p-Value Method
The p-value is a direct measure of the likelihood of the observed sample under H0
Compare the p-value with the level of significance.
• If the p-value is smaller than α, the sample contradicts the null hypothesis → reject H0
Two-Tailed Test
Reject H0 if zcalc > + zα/2 or if zcalc < - zα/2
Otherwise do not reject H0
USING THE P-VALUE APPROACH
In a two-tailed test, the decision rule using the p-value is the same as in a one-tailed test
Reject H0 if 𝑯𝟎 ̅+ 𝝈 ; 𝑿
∉ [𝑿 ̅ − 𝝈]
√𝒏 √𝒏
Using Student’s t
When the population standard deviation σ is unknown and the population may be assumed normal
(generally symmetric with no outliers)
SENSITIVITY TO α
Decision is affected by our choice of α. Example:
Two-Tailed Test
CALCULATING A P-VALUE FOR A TWO-TAILED TEST
In two-tailed test, p-value = 2 x P(Z > zcalc)
Reject H0 if 2 x P(Z > zcalc) < α
Otherwise fail to reject H0
Effect of α
The test statistic zcalc is the same regardless of our choice of α, however, our choice of α does affect the
decision.
Which level of significance is the “right” one depends on how much Type I error we are willing to allow.
Smaller Type I error leads to increased Type II error
Chapter 10
10.1 TWO-SAMPLE TESTS
• A one-sample test compares a sample estimate against a non-sample benchmark
• A Two-sample test compares two sample estimates with each other
Format of Hypotheses
Test Statistic
̅ 𝟏−𝑿
The sample statistic used to test the parameter μ1 - μ2 is 𝑿 ̅ 𝟐 . The test statistic will follow the same
general format as the z- and t-scores in chapter 9
Knowing the values of the population variances, σ12 and σ22, the test statistic: z-score
➔ Use the standard normal distribution to find p-values or critical values of zα.
By assuming that the population variances are equal → pool the sample variances by taking a weighted
average of s12 and s22 → calculate an estimate of the common population variance aka pooled
variance - denoted sp2
CASE 3: UNKNOWN VARIANCES ASSUMED UNEQUAL
Replacing σ12 and σ22 with the sample variances s12 and s22
Common situation of testing for a zero difference (D0 = 0)
Paired t Test
In the paired t Test we define a new variable d = X1 - X2 as the difference between X1 and X2.
The two samples are reduced to one sample of n differences d1, d2, . . . , dn. Presenting the n observed
differences in column form:
or row form:
We calculate the mean 𝒅̅ and standard deviation sd of the sample of n differences d1, d2, . . . , dn with the
usual formulas for a mean and standard deviation.
The population variance of d is unknown → a paired t test using Student’s t with d.f. = n - 1 to compare
̅ with a hypothesized difference μd (usually μd = 0)
the sample mean difference 𝒅
Pooled Proportion
If H0 is true → no difference between π1 and π2
➔ samples be pooled into one “big” sample → estimate the combined population proportion pc
Test Statistic
Testing for zero difference
The rule of thumb for assuming normality is that np ≥ 10 and n(1 - p) ≥ 10 for each sample
10.7 COMPARING TWO VARIANCES
Format of Hypotheses
An equivalent way to state these hypotheses is to look at the ratio of the two variances
The F Test
The test statistic is the ratio of the sample variances. Assuming the populations are normal, the test statistic
follows the F distribution
If the null hypothesis of equal variances is true, this ratio should be near 1:
Two-Tailed F Test
Critical values for the F test are denoted FL (left tail) and FR (right tail)
A right-tail critical value FR: found from Appendix F using d.f1. and d.f2.
FR = Fdf1, df2 (right-tail critical F)
To obtain a left-tail critical value FL we reverse the numerator and denominator degrees of freedom
𝟏
𝑭𝑳 = (left-tail critical F with reversed df1 and df2)
𝑭𝒅.𝒇𝟐,𝒅.𝒇𝟏
CHAPTER 11
11.1 OVERVIEW OF ANOVA
Analysis of Variance (ANOVA) allows one to compare more than two means simultaneously.
Variation in Y about its mean is explained by one or more categorical independent variables (the
factors) or is unexplained (random error).
N-FACTOR ANOVA
ANOVA Assumptions
Analysis of variance assumes that
• H0: µ1 = µ2 = µ3 =…= µc
• H1: Not all the means are equal
If we cannot reject H0, we conclude that observations within each treatment have the same mean µ.
n = n1 + n2 + … + n c
Hypotheses to Be Tested
The question of interest is whether the mean of Y varies from treatment to treatment.
o H0: μ1 = μ2 = . . . = μc (all the treatment means are equal)
o H1: Not all the means are equal (at least one pair of treatment means differs)
Random error is assumed to be normally distributed with zero mean and the same variance.
If interested only in what happen to the response for the particular levels of the factor (a fixed-effects
model)
If the null hypothesis is true (Tj = 0 for all j) the ANOVA model is:
If the null hypothesis is false, in that case the Tj that are negative (below μ) must be offset by the Tj that are
positive (above μ) when weighted by sample size.
Decomposition of Variation
Group Means
The mean of each group is calculated in the usual way by summing the observations in the treatment and
dividing by the sample size
̅ can be calculated by
The overall sample mean or grand mean 𝒚
Decision Rule
Use Appendix F to obtain the right-tail critical value of F - denoted Fdf1,df2 or Fc-1,n-c
e.g.: μ1 = μ2 ≠ μ3
Tukey’s studentized range test is a multiple comparison test
Tukey’s is a two-tailed test for simultaneous comparison of equality of paired means from c groups
The hypotheses to compare group j with group k:
Hartley’s Test
Hartley’s test statistic is the ratio of the largest sample variance to the smallest sample variance:
𝒔𝟐𝒎𝒂𝒙
𝑯𝒄𝒂𝒍𝒄 = 𝟐
𝒔𝒎𝒊𝒏
o Do not reject if Hcalc ≈ 1 the variances are the same
o Reject H0 if Hcalc > Hcritical
Critical values of Hcritical may be found in Harley’s critical value table using degrees of freedom:
• Numerator: df1 = c
𝒏
• Denominator: df2 = − 𝟏
𝒄
where
FACTOR A
• H0: A1 = A2 =. . . = Ar = 0 (row means are the same)
• H1: Not all the Aj are equal to zero (row means differ)
FACTOR B
• H0: B1 = B2 =. . . = BC = 0 (column means are the same)
• H1: Not all the BK are equal to zero (column means differ)
If we are unable to reject either null hypothesis
➔ all variation in Y: a random disturbance around the mean μ:
yjk = μ + εjk
Randomized Block Model
When only one factor is of research interest and the other factor is merely used to control for potential
confounding/perplexing/staggering influences
In the randomized block model
• the column effects: treatments (as in one-factor ANOVA → the effect of interest)
• the row effects: blocks
A randomized block model looks like a two-factor ANOVA and is computed exactly like a two-factor ANOVA
Interpretation may resemble a one-f actor ANOVA since only the column effects (treatments) are of interest
Format of Calculation of Nonreplicated Two-Factor ANOVA
INTERACTION EFFECT
• H0: All the ABjk = 0 (there is no interaction effect)
• H1: Not all ABjk = 0 (there is an interaction effect)
Format of Data
Data Format of Replicated Two-Factor ANOVA
Sources of Variation
The total sum of squares is partitioned into four components:
• In the absence of an interaction, the lines will be roughly parallel or will tend to move in the same
direction at the same time.
• A strong interaction, the lines will have differing slopes and will tend to cross one another
• Significant differences at α = .05 between clinics C, D and between suppliers (1, 4) and (3, 5).
• At α = .01 there is also a significant difference in means between one pair of suppliers (4, 5).
Significance versus Importance
MegaStat’s table of means (Figure 11.23) allows us to explore these differences further and to assess the
question of importance as well as significance.
The largest differences in means between clinics or suppliers are about 2 days. Such a small difference
might be unimportant most of the time.
However, if their inventory is low, a 2-day difference could be important
Chapter 12
12.1 VISUAL DISPLAYS AND CORRELATION ANALYSIS
Visual Displays
Analysis of bivariate data (i.e., two variables) typically begins with a scatter plot that displays each
observed data pair (xi, yi) as a dot on an X-Y grid.
➔ initial idea of the relationship between two random variables.
Correlation Coefficient
Sample correlation coefficient (Pearson correlation coefficient) - denoted r - measures the degree of
linearity in the relationship between two random variables X and Y.
• Negative correlation:
o xi is above its mean
o yi is below its mean
• Positive correlation: xi and yi are above/below their means at the same time
Three terms called sums of squares
√𝑺𝑺𝒙𝒚
𝒓=
√𝑺𝑺𝒙𝒙 √𝑺𝑺𝒚𝒚
Correlation coefficient only measures the degree of linear relationship between X and Y
Tests for Significant Correlation Using Student’s t
The sample correlation coefficient r is an estimate of the population correlation coefficient ρ
Compare this t test statistic with a critical value of t for a one-tailed or two-tailed test from Appendix D
using d.f. = n - 2 and any desired α.
First: look up the critical value of t from Appendix D with d.f. = n - 2 degrees of freedom and chosen α
𝒕
𝒓𝒄𝒓𝒊𝒕𝒊𝒄𝒂𝒍 =
√𝒕𝟐 + 𝒏 − 𝟐
• a benchmark for the correlation coefficient
• no p-value
• inflexible when changing α
In very large samples, even very small correlations could be “significant”
A larger sample does not mean that the correlation is stronger nor does its increased significance imply
increased importance.
12.2 SIMPLE REGRESSION
What Is Simple Regression?
The simple linear model in slope-intercept form: Y = slope × X + y-intercept. In statistics this straight-line
model is referred as a simple regression equation.
Inclusion of a random error ε is necessary because other unspecified variables also may affect Y
The regression model without the error term represents the expected value of Y for a given x value
called simple regression equation
• Assumption 1: The errors are normally distributed with mean 0 and standard deviation σ.
• Assumption 2: The errors have constant variance, σ2.
• Assumption 3: The errors are independent of each other.
The regression equation used to predict the expected value of Y for a given value of X:
̌i: a residual - ei
The difference between the observed value yi and its estimated value 𝒚
The residual is the vertical distance between each yi and the estimated regression line on a scatter
plot of (xi,yi) values.
̌
The fitted coefficients b0 and b1 are chosen so that the fitted linear model 𝒚 = 𝒃𝟎 + 𝒃𝟏 𝒙 has the
smallest possible sum of squared residuals (SSE):
Differential calculus used to obtain the coefficient estimators b0 and b1 that minimize SSE
The OLS formula for the slope can also be:
Sources of Variation in Y
The total variation as a sum of squares (SST), split the SST into two parts:
Coefficient of Determination
The coefficient of determination: the portion of the total variation in the dependent variable that is
explained by variation in the independent variable
SSR SSE
R2 = = 1−
SST SST noted that 0 R2 1
Examples of Approximate R2 Values: The range of the coefficient of determination is 0 ≤ R2 ≤ 1.
e 2
i
SSE
σˆ 2 = s2e = i=1
=
n−2 n−2
Division by n – 2 → the simple regression model uses two estimated parameters, b0 and b1
se = s2e
the standard error of the estimate
The magnitude of se should always be judged relative to the size of the y values in the sample data
INFERENCES ABOUT THE REGRESSION MODEL
The variance of the regression slope coefficient (b1) is estimated by
s2e s2e
sb =
2
=
1
(xi − x) (n − 1)s2x
2
where:
𝑺𝑺𝑬
𝒔𝒆 = √𝒏−𝟐 = Standard error of the estimate
These standard errors → construct confidence intervals for the true slope and intercept
using Student’s t with d.f. = n - 2 degrees of freedom and any desired confidence level.
Hypothesis Tests
if β1 = 0 ➔ X does not influence Y
→ the regression model collapses to a constant β0 + a random error term:
For either coefficient, we use a t test with d.f. = n - 2. The hypotheses and test statistics
SLOPE VERSUS CORRELATION
The test for zero slope is the same as the test for zero correlation.
➔ The t test for zero slope will always yield exactly the same tcalc as the t test for zero correlation