0% found this document useful (0 votes)

31 views26 pages

STA248

This document contains notes for a statistics course. It covers topics such as statistics and sampling distributions, point estimation, statistical intervals based on a single sample, tests of hypotheses based on a single sample, inferences based on two samples, regression and correlation, analysis of variance, logistic regression, and chi-squared tests. The notes define key statistical concepts and formulas for calculating estimates, standard errors, confidence intervals, and performing statistical tests.

Uploaded by

raghavkanda9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views26 pages

STA248

Uploaded by

raghavkanda9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

STA248 Notes

Jenci Wei
Winter 2022

1
Contents
1 Statistics and Sampling Distributions 3

2 Point Estimation 5

3 Statistical Intervals Based On a Simgle Sample 7

4 Tests of Hypotheses Based on a Single Sample 8

5 Inferences Based on Two Samples 10

6 Regression and Correlation 14

7 Analysis of Variance 19

8 Logistic Regression 23

9 Chi-Squared Tests (Extra) 24

10 Bayesian Estimation (Extra) 26

Page 2
1 Statistics and Sampling Distributions
Statistic: any quantity whose value can be calculated from sample data
• E.g. y, s2
Population parameter: an unknown numerical value
• We are interested to conduct a statistical inference about the population parameter
• E.g. µ, σ 2
Population information
• Size of population: N
• Population mean: µ
• Population variance: σ 2
• Population distribution: Y
Sample information
• Sample size: n
• Samples: y1 , y2 , . . . , yn
• Sample mean: y
• Sample variance: s2
• Mean of the sampling distribution y: µy = E(y) = µ
√
• Standard deviation of the sampling distribution y: σy = σ/ n
– Called the standard error of the mean
Central Limit Theorem
• Refinement of the law of large numbers
• For a large number (n ≥ 30) of iid RVs y1 , . . . , yn with finite variance, the average y approximately
has a normal distribution, no matter what the distribution of the yi is
• Let y1 , . . . , yn be iid RVs with E(yi ) = µ and V (yi ) = σ 2 < ∞. Define
y−µ
Zn = √
σ/ n
The Zn follows the standard normal distribution for a large sample size n ≥ 30, i.e. Zn ∼ N (0, 1) for
n ≥ 30
– If σ is unknown, then
y−µ
Zn = √ ∼ N (0, 1)
s/ n
where s is the sample SD
The Sampling Distribution of the Sample Proportion
• Consider an event A in the sample space of some experiment with p = P (A). Let y be the number of
times A occurs when the experiment is repeated n independent times, and define the sample proportion
p̂ = y/n. Then

Page 3
1. E(p̂) = p
q
p(1−p) p(1−p)
2. V (p̂) = n and σp̂ = n
3. As n increases, the distribution of p̂ approaches a normal distribution
– p̂ is approximately normal, provided that np ≥ 10 and np(1 − p) ≥ 10
Gosset’s Theorem

• If y1 , . . . , yn is a random sample from a N (µ, σ) distribution, then a RV

y−µ
√
s/ n

has the t distribution with n − 1 degrees of freedom, i.e. tn−1

Chi-Squared Distribution
• Let y1 , . . . , yn be a random sample from a normal distribution with mean µ and variance σ 2 . Then
n
(n − 1)s2 1 X
= (yi − y)2
σ2 σ 2 i=1

has a χ2 distribution with n − 1 degrees of freedom (df)

F Distribution

• Let W1 and W2 be independent χ2 -distributed RVs with ν1 and ν2 df, respectively. Then

W1 /ν1
F =
W2 /ν2

has an F distribution with ν1 nuumerator degrees of freedom and ν2 denominator degrees of freedom

Page 4
2 Point Estimation
An estimator is a rule, often expressed as a formula, that tells how to calculate the value of an estimate
based on the measurements contained in a sample
n
• E.g. the sample mean y = 1
P
n yi is one possible point estimator of the population mean µ
i=1

The Bias and Mean Square Error of Point Estimators

• Let θ̂ be a point estimator for a parameter θ. Then θ̂ is an unbiased estimator if E(θ̂) = θ

– Otherwise θ̂ is biased
• The bias of a point estimator θ̂ is B(θ̂) = E(θ̂) − θ

• The mean square error of a point estimator θ̂ is

h i
MSE(θ̂) = E (θ̂ − θ)2

= V (θ̂) + B(θ̂)2

Evaluating the Goodness of a Point Estimator

• The error of estimation is the distance between an estimator and its target parameter, i.e. = |θ̂ − θ|

Confidence Intervals
• An interval estimator is a rule specifying the method for using the sample measuurements to calculate
two numbers that form the endpoints of the interval
1. We want the interval to contain the target parameter θ
2. We want the interval to be narrow

• Interval estimators are also called confidence intervals

– The upper and lower endpoints of a confidence interval are called the upper and lower confi-
dence limits, respectively
– Suppose that θ̂L and θ̂U are the (random) lower and upper confidence limits, respectively, for a
parameter θ. Then if
P (θ̂L ≤ θ ≤ θ̂U ) = 1 − α
the probability 1 − α is the confidence coefficient
Large-Sample Confidence Intervals
• The endpoints for a 100(1 − α)% confidence interval for θ are given by

θ̂L = θ̂ − zα/2 σθ̂

θ̂U = θ̂ + zα/2 σθ̂

Relative Efficiency
• Given two unbiased estimators θ̂1 and θ̂2 of a parameter θ. The efficiency of θ̂1 relative to θ̂2 , denoted
eff(θ̂1 , θ̂2 ), is the ratio
V (θ̂2 )
eff(θ̂1 , θ̂2 ) =
V (θ̂1 )

Page 5
Consistency
• An unbiased estimator θ̂n for θ is a consistent estimator of θ if

lim V (θ̂n ) = 0
n→∞

Likelihood Function
• Let y1 , . . . , yn be sample observations taken on corresponding RVs Y1 , . . . , Yn whose distributions de-
pend on a parameter θ. If Y1 , . . . , Yn are discrete RVs, the likelihood of the sample, L(y1 , . . . , yn |θ),
is defined to be the joint probability of y1 , . . . , yn .
– If Y1 , . . . , Yn are cts RVs, the likelihood L(y1 , . . . , yn |θ) is the joint density evaluated at y1 , . . . , yn
The Method of Moments

• Consider the kth moment of a RV, taken about the origin, is

µ0k = E(Y k )

The corresopnding kth sample moment is the average

n
1X k
m0k = Y
n i=1 i

• The method of moments is based on the idea that sample moments should provide good estimates of
the corresponding population moments
The Method of Maximum Likelihood
• Suppose that the likelihood function depends on k parameters θ1 , . . . , θk . Choose the estimates of those
parameters that maximize the likelihood L(y1 , . . . , yn |θ1 , . . . , θk )
• The likelihood function is a function of the parameters θ1 , . . . , θk
– We sometimes write the lilelihood function as L(θ1 , . . . , θk )
• Maximum likelihood estimators are referred to as MLEs

Page 6
3 Statistical Intervals Based On a Simgle Sample
Confidence Interval for Proportion
• Whenever we estimate the SD of a sampling distribution, we call it a standard error
• For a sample proportion p̂, the standard error is
r
p̂q̂
SE(p̂) =
n

• 100(1 − α)% confidence interval for the population proportion p is

p̂ ± Zα/2 SE(p̂)

– 100(1−α)% of samples this size will produce confidence intervals that capture the true proportion
– We are 100(1 − α)% confident that the true proportion lies in our interval
• The extend of the interval on either side of p̂ is called the margin of error (ME):

ME = Zα/2 SE(p̂)

– Zα/2 is called the critical value and α is called the level of significance
A Confidence Interval for the Mean
• 100(1 − α)% confidence interval for the population mean µ:

y ± tn−1, α2 SE(y)
√
where the standard error of the mean SE(y) = s/ n
• If n ≥ 30, then 100(1 − α)% confidence interval for the population mean µ:

y ± Zα/2 SE(y)
√
where the standard error of the mean SE(y) = s/ n
Confidence Interval for σ 2

• 100(1 − α)% confidence interval for the population variance σ 2 :

!
(n − 1)S 2 (n − 1)S 2
,
χ2α/2 χ21−α/2

Page 7
4 Tests of Hypotheses Based on a Single Sample
Test of Hypothesis
• Statistical hypothesis: a statement about the numerical valuue of a population parameter
– E.g. popuulation mean, population SD
• Null hypothesis (H0 ): some claim about the population parameter that the researcher wants to test
– Either reject or not reject
• Alternative hypothesis (Ha ): the values of a population parameter for which the researcher wants
to gather evidence to support
– E.g.

H0 : µ ≤ 24
Ha : µ > 24

• Test statistic: a sample statistic, computed from information provided in the sample
– Used to decide between the null and alternative hypotheses
• Type I error: the researcher rejects the null hypothesis when H0 is true

α = P (Type I error) = P (Reject H0 |H0 )

Value of α is the level of the test

• Rejection region: the set of possible values of the test statistic for which we could reject H0
• Type II error: the researcher accepts the null hypothesis when H0 is false

β = P (Type II error) = P (Do not reject H0 |¬H0 )

• Observed significance level (p-value): the probability, assuming that H0 is true, of observing a
value of the test statistic that is at least as contradictory to the null hypothesis, and supportive of the
alternative hypothesis, as the actual one computed from the sample data
Large-Sample α-Level Hypothesis Tests
• H0 : θ = θ0

θ > θ 0
 (upper-tail alternative)
• Ha : θ < θ0 (lower-tail alternative)

θ 6= θ0 (two-tailed alternative)


• Test statistic: Z = θ̂−θ0

σθ̂

{z > zα }
 (upper-tail RR)
• Rejection region: {z < −zα } (lower-tail RR)


|z| > zα/2 (two-tailed RR)
Small-Sample Test for µ
• Assumptions: Y1 , . . . , Yn constitute a random sample from a normal distribution with E(Yi ) = µ
• H0 : µ = µ0

Page 8

µ > µ0
 (upper-tail alternative)
• Ha : µ < µ0 (lower-tail alternative)

µ 6= µ0 (two-tailed alternative)


Y −µ
• Test statistic: t = √0,
S/ n
where Y is the sample mean and S is the sample SD

{t > tα,n−1 }
 (upper-tail RR)
• Rejection region: {t < −tα,n−1 } (lower-tail RR)


|t| > tα/2,n−1 (two-tailed RR)

Test of Hypothesis Concerning a Population Variance

• Assumptions: Y1 , . . . , Yn constitute a random sample from a normal distribution with E(Yi ) = µ and
V (Yi ) = σ 2
• H0 : σ 2 = σ02

2 2
σ > σ0
 (upper-tail alternative)
• Ha : σ 2 < σ02 (lower-tail alternative)

 2
σ 6= σ02 (two-tailed alternative)
2
• Test statistic: χ2 = (n−1)S
σ02

2 2
χ > χα,n−1 (upper-tail RR)


2 2
• Rejection region: nχ < χ1−α,n−1 o (lower-tail RR)
 χ2 > χ2 2 2


α/2,n−1 ∨ χ < χ1−α/2,n−1 (two-tailed RR)

Test of Hypothesis: σ12 = σ22

• Assumptions: independent samples from normal populations
• H0 : σ12 = σ22
• Ha : σ12 > σ22
S12
• Test statistic: F = S22

• Rejection region: F > Fα , where Fα is chosen so that P (F > Fα ) = α when F has ν1 = n1 − 1

numerator df and ν2 = n2 − 1 denominator df

Page 9
5 Inferences Based on Two Samples
Comparing two population means: independent sampling – large-sample case
• Properties of the sampling distribution of y 1 − y 2
1. The mean of the sampling distribution of y 1 − y 2 is µ1 − µ2
– µ1 and µ2 are the means of the two populations
2. If the two samples are independnet, then the SD of the sampling distribution is
s
σ12 σ2
σy1 −y2 = + 2
n1 n2

– σ12 and σ22 are the variances of the two populations being sampled
– n1 and n2 are the respective sample sizes
– σy1 −y2 is also referred to as the standard error of the statistic y 1 − y 2
3. By the CLT, the sampling distribution of y 1 − y 2 is approximately normal for large samples
• When σ12 and σ22 are known, the 100(1 − α)% CI for µ1 − µ2 is
s
σ12 σ2
(y 1 − y 2 ) ± Zα/2 + 2
n1 n2

• When σ12 and σ22 are unknown, the 100(1 − α)% CI for µ1 − µ2 is
s
s21 s2
(y 1 − y 2 ) ± Zα/2 + 2
n1 n2

Comparing two population means: independent sampling – small-sample case

• Assumptions
1. Both sampled populations are approximately normally distributed
2. The samples have equal population variances (i.e. σ12 = σ22 = σ 2 )
3. Random samples are selected independently of each other
• 100(1 − α)% CI for µ1 − µ2 is
s
1 1
(y 1 − y 2 ) ± tα/2 Sp2 +
n1 n2

– Sp2 is the pooled sample variance where

(n1 − 1)S12 + (n2 − 1)S22

Sp2 =
n1 + n2 − 2

– tα/2 is based on n1 + n2 − 2 degrees of freedom

Comparing two population means: independent sampling – hypothesis testing
• Hypotheses
– H0 : µ1 − µ2 = D0

Page 10

µ1 − µ2 > D0
 (upper-tail alternative)
– Ha : µ1 − µ2 < D0 (lower-tail alternative)

µ1 − µ2 6= D0 (two-tailed alternative)


• Small-sample case
– Assumptions
1. Independent samples
2. Samples are from normal distribution
3. σ12 = σ22
– Test statistic
y 1 − y 2 − D0
T = q ∼ tn1 +n2 −2
Sp n11 + n12

• Large-sample case
– Test statistic when σ12 and σ22 are known:
y − y − D0
Zc = 1q 2 2 2 ∼ N (0, 1)
σ1 σ2
n1 + n2

– Test statistic when σ12 and σ22 are unknown:

y − y − D0
Zc = 1q 2 2 2 ∼ N (0, 1)
s1 s2
n1 + n2

Comparing two population proportions: independent sampling

• Properties of the sampling distribution of p̂1 − p̂2
1. The mean of the sampling distribution of p̂1 − p̂2 is p1 − p2 , i.e.

E(p̂1 − p̂2 ) = p1 − p2

– p1 and p2 are the proportions of the two populations

– p̂1 − p̂2 is an unbiased estimator of p1 − p2
2. The SD of the sampling distribution of p̂1 − p̂2 is
s
p1 (1 − p1 ) p2 (1 − p2 )
σp̂1 −p̂2 = +
n1 n2

3. If the sample sizes n1 and n2 are large, the sampling distribution of p̂1 − p̂2 is approximately
normal
• Assumptions and conditions when comparing proportions
1. Randomization condition: the data in each group is drawn independently and at random from
the target population
2. The (least important) 10% condition: the sample is less than 10% of the population
3. Independent group assumption: the two groups we are comparing are independent of each
other
4. Success/failure conditions: both groups are big enough so that at least 10 successes and at
least 10 failures have been observed in each group

Page 11
• In the large-sample case, the 100(1 − α)% CI for p1 − p2 is
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) ± Zα/2 +
n1 n2

Comparing two population proportions: independent sampling – hypothesis testing

• Large-sample test of hypothesis about p1 − p2 : normal statistic:

– H0 : p1 − p2 = 0

p1 − p2 > 0 (upper-tail alternative)

– Ha : p1 − p2 < 0 (lower-tail alternative)

p1 − p2 6= 0 (two-tailed alternative)


– Test statistic:
p̂1 − p̂2
Zc = r
p̂(1 − p̂) n11 + 1
n2

where
n1 p̂1 + n2 p̂2
p̂ =
n1 + n2
Paired Samples and Blocks: Paired t-Test
• Paired data
– Two results are dependent of each other
– Since we care about the difference, we could only look at the difference and ignore the original
columns
– Use simple one-sample t-test
– Sample size is the number of pairs
• Hypotheses

– We make inferences about the mean of the population of differences, µd = µ1 − µ2

– H0 : µd = d0

µd > d0
 (upper-tail alternative)
– Ha : µd < d0 (lower-tail alternative)

µd 6= d0 (two-tailed alternative)


• Test statistic
xd − d0
t= √ ∼ tnd −1
sd / nd
– xd is the sample mean difference
– sd is the sample SD of differences
– nd is the number of differences (i.e. number of pairs)
– Assumptions: the population of differences in test scores is approximately normally distributed.
The sample differences are randomly selected from the population differences
• Confidence interval: large sample
sd
xd ± Zα/2 √
nd

Page 12
– Conditions required: a random sample of differences is selected from the target population of
differences, and that the sample size nd is large (i.e. nd ≥ 30)
• Confidence interval: small sample
sd
xd ± tα/2 √
nd
– tα/2 is based on nd − 1 degrees of freedom
– Conditions required: a random sample of differences is selected from the target population of
differences, and that the population of differences has a distribution that is approximately normal

Page 13
6 Regression and Correlation
Deterministic Model
• Hypothesizes an exact relationship between variables
• E.g. y = f (x)
• Implies that y can always be determined exactly when the value of x is known

• No allowance for error

Probabilistic Model
• Includes both a deterministic component and a random error component

• E.g. y = f (x) + random error

Simple Linear Regression
y = β0 + β1 x +
• The deterministic portion of the model graphs as a straight line

• y is the dependent or response variable

• x is the independent or predictor variable
• β0 + β1 x is the deterministic component
• is the random error component which is assumed to follow a N (0, σ) distribution

• β0 is the y-intercept of the line

• β1 is the slope of the line
Estimating Model Parameters

• Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be the observed n-pairs

• The vertical deviation of the point (xi , yi ) from a line y = b0 + b1 x is

height of point − height of line = yi − (b0 + b1 xi )

• The sum of squared vertical deviations from the points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) to the line is
n
X
g(b0 , b1 ) = [yi − (b0 + b1 xi )]2
i=1

• The point estimates of β0 and β1 , denoted by β̂0 and β̂1 , respectively, are called the least squares
estimates whose values minimize g(b0 , b1 )
• The estimated regression line or least squares regression line (LSRL) is the line whose equation
is
y = β̂0 + β̂1 x

• The least squares estimate of the slope coefficient β1 of the true regression line is
P
(xi − x)(yi − y)
b1 = β̂1 = P
(xi − x)2

Page 14
• The least squares estimate of the intercept β0 of the true regression line is

b0 = β̂0 = y − β̂1 x

• Under the normality assumption of the simple linear regression model, β̂0 and β̂1 are the maximum
likelihood estimates
• Notations for sums
n
X
Sxy = (xi − x)(yi − y)
i=1
Xn
Sxx = (xi − x)2
i=1
n
X
Syy = (yi − y)2
i=1

Residuals and Estimating σ

• The fitted (or predicted) values ŷ1 , ŷ2 , . . . , ŷn are obtained by successively substituting the x values
x1 , x2 , . . . , xn into the equation of the LRSL, i.e. the ith fitted value is

ŷi = β̂0 + β̂1 xi = y + β̂1 (xi − x)

• The residuals (estimated error) e1 , e2 , . . . , en are the vertical deviations from the LSRL, i.e. the ith
residual is
ei = yi − ŷi = yi − β̂0 + β̂1 xi = (yi − y) − β̂1 (xi − x)

• The error sum of squares (or residual sum of squares), denoted by SSE, is
X X X
SSE = (ei − e)2 = e2i = (yi − ŷi )2

• The least squares estimate of σ 2 is

SSE
σ̂ 2 =
n−2
• The residual standard deviation is an estimate of σ given by
r
SSE
σ̂ =
n−2

• SSE can be computed by

2
Sxy
SSE = Syy −
Sxx
Coefficient of Determination
• Total sum of squares: a quantitative measure of the total amount of variation in the observed y values
X
SST = (yi − y)2 = Syy

• The coefficient of determination, denoted by R2 , is given by

SSE
R2 = 1 −
SST

Page 15
• R2 is interpreted as the proportion of observed y variation that can be explained by the simple linear
regression model
• The closer R2 is to 1, the more successful the simple linear regression model is in explaining y variation
Decomposition of Total Sum of Squares

• The total sum of squares can be decomposed by

X
SST = (yi − y)2
X 2
= [(yi − ŷi ) + (ŷi − y)]
X X
= (yi − ŷi )2 + (ŷi − y)2

• The regression sum of squares is X

SSR = (ŷi − y)2

• Therefore
SST = SSR + SSE

• Coefficient of determination can be rewritten to

SSR
R2 =
SST

Inferences About the Regression Coefficient β1

• Assumptions and conditions

1. Linearity assumption: the straight enough condition is satisfied if a scatterplot looks straight
2. Independent assumption: the errors in the true underlying regression model (i.e. the s) must
be mutually independent
– No way of checking whether this holds
3. Equal variance assumption: the variability of y should be about the same for all values of x
4. Normal population assumption: the errors around the idealized regression line at each value
of x follows a normal model
– The response y is normally distributed at any x value

• Properties of the estimated slope

1. The mean value of β̂1 is E(β̂1 ) = β1
– β̂1 is an unbiased estimator of β1
2. The variance and SD of β̂1 are

σ2
V (β̂1 ) = σβ̂2 =
1 Sxx
σ
σβ̂1 =√
Sxx
– σ can be replaced by its estimate σ̂
3. The estimator β̂1 has a normal distribution
– Because it is a linear function of independent normal RVs

Page 16
• As a result, the assumptions of the simple linear regression model imply that

β̂1 − β1
T = ∼ tn−2
Sβ̂1

• A 100(1 − α)% confidence interval for the slope β1 of the true regression line is
σ̂
β̂1 ± tn−2,α/2 √
Sxx

• Hypothesis testing procedures

– H0 : β1 = β10

β1 > β10
 (upper-tail alternative)
– Ha : β1 < β10 (lower-tail alternative)

β1 6= β10 (two-tailed alternative)


– Test statistic:
β̂1 − β1
T = ∼ tn−2
Sβ̂1

Inferences for the (Mean) Response

• We want to choose an estimator of the mean y value using the least squares prediction equation

ŷ = β̂0 + β̂1 x∗

where x∗ is some fixed value of x

• Substituting β̂0 and β̂1 :
n
(x∗ − x)(xi − x)

X 1
ŷ = + yi
i=1
n Sxx
n
X
= di yi
i=1

(x∗ −x)(xi −x)

h i
1
where di = n + Sxx

• The coefficients d1 , . . . , dn involve the xi s and x∗ , all of which are fixed

• Sampling distribution of ŷ
1. The mean value of ŷ is

E[ŷ] = E[β̂0 + β̂1 x∗ ] = E[β0 + β1 x∗ ] = E[y]

2. The variance of ŷ is
(x∗ − x)2

1
V (ŷ) = σŷ2 =σ 2
+
n Sxx
The estimated variance of ŷ is
(x∗ − x)2

1
Sŷ2 = σ̂ 2
+
n Sxx
3. ŷ has a normal distribution, because it is a linear function of the yi s, which are normally distributed
and independent

Page 17
• Consequently, the variable
ŷ − E[y]
t= ∼ tn−2
Sŷ

Prediction Interval for a Future Value of y

• Prediction error
– The prediction error is
ŷ − y = ŷ − (β0 + β1 x∗ + )

– The variance of ŷ − y is
(x∗ − x)2

1 2
V [ŷ − y] = σ 1 + +
n Sxx
– The estimated variance of ŷ − y is

(x∗ − x)2

2 2 1
Sŷ−y = σ̂ 1 + +
n Sxx

– Consequently, the variable

(ŷ − y)
t= ∼ tn−2
Sŷ−y
Correlation

• The sample correlation coefficient for the n pairs (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) is
n
1 X xi − x yi − y Sxy
r= =√ p
n − 1 i=1 Sx Sy Sxx Syy

• Properties of r
1. The value of r does not depend on which of the two variables is labelled x and which is labelled y
2. The value of r is independent of the units in which x and y are measured, i.e. r is unitless
3. The square of the sample correlation gives the value of the coefficient of determination that would
result from fitting the simple linear regression model, i.e. r2 = R2
4. −1 ≤ r ≤ 1
5. r = ±1 iff all (xi , yi ) pairs lie on a straight line

Page 18
7 Analysis of Variance
The analysis of variance (ANOVA) is a collection of statistical procedures for the analysis of quantitative
responses
• The simplest ANOVA problem is referred to variously as a single-factor, single-classification, or one-
way ANOVA and involves the analysis of data samplesd from two or more numerical populations (i.e.
distributions)

The response variable is the variable of interest to be measured in the experiment

• Also called dependent variable
• Typically quantitative

Factors are those variables whose effect on the resonse is of interest to the experimenter
• Also called independent variables
• Quantitative factors are measured on a numerical scale
Terminology

• Factor level: values of the factor utilized in the experiment

• Treatment: factor level combinations utilized in the experiment
• Experimental unit: objecto n which the response and factors are observed or measured
• Design study: an experiment in which the analyst controls the specification of the treatments and
the method of assigning the experimental units to each treatment
• Observational study: an experiment in which the analyst simply observes the treatments and the
response on a sample of experimental units
Single-Factor ANOVA

• Focuses on comparison of 2 or more populations

• t is the number of populations/treatments being compared
• µi is the mean of population i (or the true average resopnse when treatment i is applied)

• The hypotheses of interests are

– H0 : µ = µ2 = · · · = µt
– Ha : at least 2 of the µi s are different
Sigle-Factor ANOVA Model

• The mathematical model for the data from a completely randomized design (CRD) with an
unequal number of replicates for each factor level is

yij = µ + τi + ij

where

– yij is the resopnse for the jth experimental unit subject to the ith level of the treatment factor,
i ∈ [1, t], j ∈ [1, ni ]
– ni is the number of experimental units or replications in ith level of the treatment factor

Page 19
– The distribution of the experimental errors, ij , are mutually independent due to the random-
izaiton and is assumed to be normally distributed
– τi represents the treatment effect
– µ is the overall mean

• We could write the null hpothesis in terms of the treatments effects, where H0 : τ1 = τ2 = · · · = τt
• Assumptions
– The t population or treatment distributions are all normal with the same variance σ 2 , i.e. the
yij s are independent and normally distributed with

E(yij ) = µi = µ + τi
V (yij ) = σ 2

Single-Factor ANOVA Notations

• The sample means fo the data in the ith level of the treatment factor is represented by
yi
y i. =
ni

• The grand mean is

y..
y .. =
n
where
t
P
– n= ni
i=1
ni
P
– yi. = yij
j=1
ni
t P
P
– y.. = yij
i=1 j=1

• A measure of between-samples variation is the treatment sum of squares (SSTr), given by

ni
t X
X
SSTr = (y i. − y .. )2
i=1 j=1
t
X
= ni (y i. − y .. )2
i=1
t
X y2 i. y..2
= −
i=1
ni n

• The total sum of squares is

ni
t X
X
SSTotal = (yij − y .. )2
i=1 j=1
ni
t X
X
2 y..2
= yij −
i=1 j=1
n

Page 20
• A measure of within-samples variations is the error sum of squares (SSE), given by
ni
t X
X
SSE = (yij − y i. )2
i=1 j=1

= SSTotal − SSTr

Single-Factor ANOVA Result

• When the ANOVA assumptions are satisfied:
1. SSE and SSTr are independent RVS
SSE
2. σ2 ∼ χ2df =n−t
SSTr
3. When H0 is true, σ2 ∼ χ2df =t−1
• The mean square for treatments (MSTr) and the mean square for error (MSE) are

SSTr SSE
MSTr = MSE =
t−1 n−t

• When the ANOVA assumptions are satisfied,

E(MSE) = σ 2

that is, MSE is an unbiased estimator for σ 2

• Moreover, when H0 is true,
E(MSTr) = σ 2
in this case, MSTr is an unbiased estimator for σ 2
• When ANOVA assumptions are satisfied and H0 is true, the test statistic f = MSTr
MSE has an F distri-
bution with t − 1 numerator df and n − t denominator df
• Rejection region for level α test: f > Fα,t−1,n−t
• p-value: area under Ft−1,n−t curve to the right of f
Multiple Comparisons in ANOVA
• When H0 is rejected, we want to know which of the µi s are different with each other
• Let Z1 , Z2 , . . . , Zm be m independent standard normal RVs, and let W be a χ2 RV independent of the
Zi s. Then the distribution of
max Zm − min Zm
max |Zi − Zj | i∈[1,m] i∈[1,m]
Q= p = p
W/ν W/ν

is the studentized range distribution

• This distribution has 2 parameters
1. m is the number of Zi s
2. ν is the denominator df
• We denote the critical value that captures the upper-tail area α under the density curve of Q by Qα,m,ν
Multiple Comparisons in ANOVA Result

Page 21
• We consider the equal number of replications n0 = n1 = · · · = nt . For each i < j, form the interval
r
MSE
y i. − y j. ± Qα,t,n−t
n0

• There are t(t − 1)/2 such intervals, e.g. µ1 − µ2 , µ1 − µ3 , etc.

• The simultaneous CI that every interval includes for the correponding value of µi − µj is 100(1 − α)%
Multiple Comparisons when Sample Sizes are Unequal – Tukey-Kramer Procedure

• Assumption: the t sample sizes n1 , n2 , . . . , nt are reasonably close to each other (i.e. mild imbalance)
• Let s
MSE 1 1
dij = Qα,t,n−t +
2 ni nj

• Then the probability is approximately 1 − α that

y i. − y j. − dij ≤ µi − µj ≤ y i. − y j. + dij

for every i and j with i 6= j

• The simultaneous confidence level of 100(1 − α)% is an approximate

Page 22
8 Logistic Regression
Logit Function
eβ0 +β1 x
p(x) = = [1 + exp(−β0 − β1 x)]−1
1 + eβ0 +β1 x
Odds
• Logistic regression means assuming that p(x) is related to x by the logit function

p(x)
= exp(β0 + β1 x)
1 − p(x)

– The expression on the left side is called the odds

Log-Odds

• Taking natural logs on both sides,

p(x)
log = β0 + β1 x
1 − p(x)

the logarithm of the odds is a linear function of the predictor

• The slope parameter β1 is the change in the log-odds associated with a one-unit increase in x
• The quantity eβ1 is the odds ratio, because it represents the ratio of the odds of success when the
predictor variable equals x + 1 to the odds of success when the predictor variable equals x

Likelihood Function
• There are no analytical solutions for the MLEs β̂0 and β̂1
• The maximization process must be carried out using iterative numerical methods
β̂1 −β1
• For large n, the MLE has approximately a normal distribution and the standardized variable Sβ̂
1
has approximately a standard normal distribution

Page 23
9 Chi-Squared Tests (Extra)
A multinomial experiment satisfies the following conditions:
1. The experiment consists of a sequence of n trials for some fixed n
2. Each trial can result in one of the same k possible outcomes (aka categories)
3. The trials are independent
4. Th probability that a trial results in a category i is pi , which is a constant
P
The parameters p1 , . . . , pk must satisfy pi ≥ 0 and pi = 1
• Generalization of a binomial experiment, allows each trial to result in one of > 2 possible outcomes
Null hypothesis: pi s are assigned some fixed values, alternative ypothesis: at least one of the pi s has a value
different from that asserted by H0

E.g. an experiment with n = 50 and k = 3 might yield N1 = 22, N2 = 13, N3 = 15

• The Ni s are the observed counts
E(Ni ) = (total number of trials)(hypothesized probability of category i) = npi0
• These are the expected counts under H0
Pearson’s Chi-Squared Theorem
• When H0 : p1 = p10 , . . . , pk = pk0 is true, the statistic
k
X (Ni − npi0 )2 X (observed count - expected count)2
χ2 = =
i=1
npi0 expected count
all categories

has approximately a chi-squared distribution with k − 1 df

• This approximation is reasonable provided that npi0 ≥ 5 for every i
Chi-Squared Goodness-of-Fit Test
• H0 : p1 = p10 , . . . , pk = pk0
• Ha : at least one pi does not equal pi0
• Test statistic value:
k
X (Ni − npi0 )2
χ2 =
i=1
npi0
n o
• Rejection region for level α test: χ2 ≥ χ2α,k−1

Goodness-of-Fit Tests for Composite Hypotheses

• H0 : p1 = π1 (θ), . . . , pk = πk (θ) for some θ = (θ1 , . . . , θm )
• Ha : the hypothesis H0 is not true
Method of Multinomial Estimation
• Let n1 , . . . , nk denote the observed values of N1 , . . . , Nk . Then θ̂1 , . . . , θ̂m are those values of the θj s
that maximize the expression

P (N1 = n, . . . , Nk = nk ) ∝ [π1 (θ)]n1 × · · · × [πk (θ)]nk

Page 24
Fisher’s Chi-Squared Theorem
• Under general regularity conditions on θ1 , . . . , θm , and the πi (θ)s, if θ1 , . . . , θm are estimated by max-
imizing the multinomial expression, then the rv
k k
X (Ni − nP̂i )2 X [Ni − nπi (θ̂)]2
χ2 = =
i=1 nP̂i i=1 nπi (θ̂)

has an approximately a chi-squared distribution with k − 1 − m df when H0 is true

• An approximately level α test of H0 vs. Ha is then to reject H0 if χ2 ≥ χ2α,k−1−m

• This test can be used if nπi (θ̂) ≥ 5 for every i

Page 25
10 Bayesian Estimation (Extra)
Prior Distribution
• A prior distribution for a parameter θ, denoted π(θ), is a probability distribution on the set of
possible values for θ
• If the possible values of θ form an interval I, then π(θ) is a pdf that must satisfy
Z
π(θ)dθ = 1
I

• If θ is potentially any value in a discrete set D, then π(θ) is a pmf that must satisfy
X
π(θ) = 1
θ∈D

Posterior Distribution
• Suppose X1 , . . . , Xn have joint pdf f (x1 , . . . , xn ; θ) and the unknown parameter θ has been assigned
a continuous prior distribution π(θ), then the posterior distribution of θ, given the observations
X1 = x1 , . . . , Xi = xi , is

π(θ)f (x1 , . . . , xn ; θ)
π(θ|x1 , . . . , xn ) = R ∞
−∞
π(θ)f (x1 , . . . , xn ; θ)dθ

• If X1 , . . . , Xn is discrete, the joint pdf is replaced by their joint pmf

• Constructing the posterior distribution of a parameter requires a specific probability model f (x1 , . . . , xn ; θ)
for the observed data

Page 26

Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
Theory of Estimation
100% (1)
Theory of Estimation
30 pages
Statistics Cheatsheet
100% (1)
Statistics Cheatsheet
2 pages
Statistics Cheat Sheet
100% (3)
Statistics Cheat Sheet
23 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
ST102 Notes
0% (1)
ST102 Notes
21 pages
Tests of Significance
No ratings yet
Tests of Significance
36 pages
Stimation: Statistic
No ratings yet
Stimation: Statistic
46 pages
webMATH236 Lecture5
No ratings yet
webMATH236 Lecture5
87 pages
Materi Estimasi
No ratings yet
Materi Estimasi
34 pages
Study Notes On Estimation
No ratings yet
Study Notes On Estimation
17 pages
RM Note Unit - 4
No ratings yet
RM Note Unit - 4
21 pages
Ed Inference1
No ratings yet
Ed Inference1
20 pages
Stat 115 - Basic Statistical Methods
No ratings yet
Stat 115 - Basic Statistical Methods
6 pages
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
No ratings yet
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
44 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
29 pages
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
No ratings yet
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
37 pages
Section 5.7
No ratings yet
Section 5.7
47 pages
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
No ratings yet
Basic Concepts of Inference: Corresponds To Chapter 6 of Tamhane and Dunlop
40 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
x (sample mean) is the most unbiased estimate for the population mean μ p= x n
No ratings yet
x (sample mean) is the most unbiased estimate for the population mean μ p= x n
5 pages
2.parameter Estimation
No ratings yet
2.parameter Estimation
59 pages
Chap 2 Parameter Estimation
No ratings yet
Chap 2 Parameter Estimation
14 pages
Estimation
No ratings yet
Estimation
10 pages
Unit 10
No ratings yet
Unit 10
20 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Estimation New
No ratings yet
Estimation New
37 pages
Pro Band Stat
No ratings yet
Pro Band Stat
27 pages
Statistics For Economists Lecture VI
No ratings yet
Statistics For Economists Lecture VI
33 pages
Unit-3 (Estimation)
No ratings yet
Unit-3 (Estimation)
16 pages
Notes 12
No ratings yet
Notes 12
23 pages
Unit V Estimation
No ratings yet
Unit V Estimation
33 pages
Session2 QTII 24
No ratings yet
Session2 QTII 24
31 pages
Business Statistics CH 2
No ratings yet
Business Statistics CH 2
49 pages
Statistics
No ratings yet
Statistics
49 pages
Chapter 8
No ratings yet
Chapter 8
42 pages
Chapters4 5 PDF
No ratings yet
Chapters4 5 PDF
96 pages
Chapters4 5 PDF
No ratings yet
Chapters4 5 PDF
96 pages
Bab 4
No ratings yet
Bab 4
7 pages
2A3. Review of Mathematical Statistics
No ratings yet
2A3. Review of Mathematical Statistics
8 pages
Probability and Statistics - 3
No ratings yet
Probability and Statistics - 3
59 pages
Chapter Two (Estimation and Hypothesis Testing)
No ratings yet
Chapter Two (Estimation and Hypothesis Testing)
20 pages
C 4
No ratings yet
C 4
61 pages
Estimation
No ratings yet
Estimation
5 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Topic 10 Point Estmation of Parameters
No ratings yet
Topic 10 Point Estmation of Parameters
36 pages
FIN 640 - Lecture Notes 4 - Sampling and Estimation
100% (1)
FIN 640 - Lecture Notes 4 - Sampling and Estimation
40 pages
Stats 2 Formulae
No ratings yet
Stats 2 Formulae
5 pages
Formuleblad Statistiek
No ratings yet
Formuleblad Statistiek
10 pages
Statistical Inference Point Estimators Estimating The Population Mean Using Confidence Intervals
No ratings yet
Statistical Inference Point Estimators Estimating The Population Mean Using Confidence Intervals
40 pages
Chapter 9 (Independent Means Only) UPDATED!!!
No ratings yet
Chapter 9 (Independent Means Only) UPDATED!!!
27 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Par Est
No ratings yet
Par Est
36 pages
Ch-1.Ppt Business Statx
No ratings yet
Ch-1.Ppt Business Statx
66 pages
Topic 9 Statistical Inference: (Revision Notes)
No ratings yet
Topic 9 Statistical Inference: (Revision Notes)
21 pages
Statistics: Dealing With Skewed Data
No ratings yet
Statistics: Dealing With Skewed Data
7 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Elements of Partial Differential Equations
From Everand
Elements of Partial Differential Equations
Ian N. Sneddon
4.5/5 (14)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Stress and Strategy: An Investigation Into Stressors and Coping Mechanisms Among Quick Service Restaurant Employees in Cavite
No ratings yet
Stress and Strategy: An Investigation Into Stressors and Coping Mechanisms Among Quick Service Restaurant Employees in Cavite
19 pages
Leadership Theories and Practice: Charting A Path For Improved Nigerian Schools
No ratings yet
Leadership Theories and Practice: Charting A Path For Improved Nigerian Schools
25 pages
Part-A Assignment No. 2
No ratings yet
Part-A Assignment No. 2
2 pages
Summer Internship Report
No ratings yet
Summer Internship Report
41 pages
International Board of Standards For Training
No ratings yet
International Board of Standards For Training
3 pages
Effect of Temperature On Enzyme Activity HSC
62% (13)
Effect of Temperature On Enzyme Activity HSC
3 pages
Dom Group 5 Field Work #3
No ratings yet
Dom Group 5 Field Work #3
10 pages
WABO-SEAW Snow Load White Paper
No ratings yet
WABO-SEAW Snow Load White Paper
12 pages
Arshi
No ratings yet
Arshi
11 pages
SGW Limits Solns PDF
No ratings yet
SGW Limits Solns PDF
6 pages
3 Lestari+20 26 2
No ratings yet
3 Lestari+20 26 2
7 pages
Job Satisfaction Literature Review
100% (1)
Job Satisfaction Literature Review
8 pages
KANTAR
100% (1)
KANTAR
22 pages
Thesis of PA
No ratings yet
Thesis of PA
54 pages
What Is Research: RESEARCH - Is A Purposive, Systematic Process of Gathering, Analyzing, Classifying
No ratings yet
What Is Research: RESEARCH - Is A Purposive, Systematic Process of Gathering, Analyzing, Classifying
21 pages
2021 - Stark & Bettini - Teachers' Perceptions of Emotional Display Rules in Schools
No ratings yet
2021 - Stark & Bettini - Teachers' Perceptions of Emotional Display Rules in Schools
13 pages
Carlson 2010
No ratings yet
Carlson 2010
21 pages
Review 1 - Limits - Continuity (Pcalc+ To AP Calc)
No ratings yet
Review 1 - Limits - Continuity (Pcalc+ To AP Calc)
19 pages
Identifying Regions Under Normal Curve
No ratings yet
Identifying Regions Under Normal Curve
13 pages
HMM Stats
No ratings yet
HMM Stats
3 pages
DOE Diagnostic Testing of Underground Cable Systems CDFI - Phase - 1 - Final-Report
No ratings yet
DOE Diagnostic Testing of Underground Cable Systems CDFI - Phase - 1 - Final-Report
323 pages
Final Project: (Project Case Study: IDEO and Creativity)
No ratings yet
Final Project: (Project Case Study: IDEO and Creativity)
3 pages
Tech-5 Work Schedule and Planning For Deliverables - English
100% (1)
Tech-5 Work Schedule and Planning For Deliverables - English
5 pages
GLM Multivariate Analysis (Presentation 4) Adv Stat
No ratings yet
GLM Multivariate Analysis (Presentation 4) Adv Stat
84 pages
Conservation Plans For The Perplexed
No ratings yet
Conservation Plans For The Perplexed
3 pages
RoadMap ISO9001
No ratings yet
RoadMap ISO9001
5 pages
Grade 9 (24-25) Project 2 Guidelines Mathematics
No ratings yet
Grade 9 (24-25) Project 2 Guidelines Mathematics
2 pages
An Emprical Analysis of Marketing of Oilseeds in Erode District of Tamilnadu State
No ratings yet
An Emprical Analysis of Marketing of Oilseeds in Erode District of Tamilnadu State
9 pages
Market Research
100% (1)
Market Research
41 pages
Lesson 1 - SSP 112
No ratings yet
Lesson 1 - SSP 112
5 pages