CH 1
CH 1
Introduction
Outline
• It satisfies E(Y) = Var(Y) =. It is unimodal with mode equal to the integer part.
• Note: A key feature of Poisson distribution is that its variance equals its mean.
Statistical Inference For A Proportion
• In practice, the probability distribution assumed for the response variable has
unknown parameter values. Using sample data, we estimate the parameters. This
section introduces the methods of sample data to make inference about the binomial
and multinomial parameter, called maximum likelihood method.
Likelihood Function And Maximum Likelihood Estimation
• The parametric approach to statistical modeling assumes a family of probability
distributions, for the response variable. For a particular family, we can substitute the
observed data into the formula for the probability function and then view how that
probability depends on the unknown parameter value. For example, in n = 10 trials,
suppose a binomial count equals y = 0. From the binomial formula with parameter
π, the probability of this outcome equals
Cont’d
• This probability is defined for all the potential values of π between 0 and 1. The probability of
the observed data, expressed as a function of the parameter, is called the likelihood function.
With y = 0 successes in n = 10 trials, the binomial likelihood function is . It is defined for π
between 0 and 1. From the likelihood function, if for instance, the probability that Y = 0 is
l(0.40) = Likewise, if π = 0.20 then ) = = 0.107, and if π = 0.0 then l (0.0) = = 1.0.
• The maximum likelihood estimate of a parameter is the parameter value for which the
probability of the observed data takes its greatest value (It is the parameter value at which the
likelihood function takes its maximum). Thus, when n = 10 trials have y = 0 successes, the
maximum likelihood estimate of π equals 0.0. This means that the result y = 0 in n = 10 trials is
more likely to occur when π = 0.00 than when π equals any other value.
Cont’d
• In general, for the binomial outcome of y successes in n trials, the maximum likelihood
estimate of π equals p = y/n. This is the sample proportion of successes for the n trials. If we
observe y = 6 successes in n = 10 trials, then the maximum likelihood estimate of π equals p =
6/10 = 0.60.
• Denote each success by a 1 and each failure by a 0. Then the sample proportion equals the
sample mean of the results of the individual trials. For instance, for four failures followed by six
successes in 10 trials, the data are 0,0,0,0,1,1,1,1,1,1, and the sample mean is
p = (0 + 0 + 0 + 0 + 1 + 1 + 1 + 1 + 1 + 1)/10 = 0.60.
• Thus, results that apply to sample means with random sampling, such as the Central Limit
Theorem (large-sample normality of its sampling distribution) and the Law of Large Numbers
(convergence to the population mean as n increases) apply also to sample proportions. The
abbreviation ML symbolizes the term maximum likelihood. The ML estimate is often denoted
by the parameter symbol with (a “hat”) over it.
Cont’d…….
• Before we observe the data, the value of the ML estimate is unknown. The estimate is
then a variant having some sampling distribution. We refer to this variant as an
estimator and its value for observed data as an estimate. Estimators based on the
method of maximum likelihood are popular because they have good large-sample
behavior. Most importantly, it is not possible to find good estimators that are more
precise, in terms of having smaller large-sample standard errors.
Significance Test About A Binomial Proportion
For the binomial distribution, we now use the ML estimator in statistical inference for
the parameter π. The ML estimator is the sample proportion, p. The sampling
distribution of the sample proportion p has mean and standard error
•=
• As the number of trials n increases, the standard error of p decreases toward zero; that is, the
sample proportion tends to be closer to the parameter value π. The sampling distribution of p
is approximately normal for large n. This suggests large-sample inferential methods for π.
• Consider the null hypothesis that the parameter equals some fixed value, . The test statistic,
divides the difference between the sample proportion p and the null hypothesis value by the
null standard error of p. The null standard error is the one that holds under the assumption
that the null hypothesis is true. For large samples, the null sampling distribution of the z test
statistic is the standard normal – the normal distribution having a mean of 0 and standard
deviation of 1. The z test statistic measures the number of standard errors that the sample
proportion falls from the null hypothesized proportion.
Confidence Intervals For A Binomial Proportion
• A significance test merely indicates whether a particular value for a parameter (such as 0.50) is
plausible. We learn more by constructing a confidence interval to determine the range of
plausible values. Let SE denote the estimated standard error of p. A large sample 100(1 − α) %
confidence interval for π has the formula
• where denotes the standard normal percentile having right-tail probability equal to α/2; for
example, for 95% confidence, α = 0.05, = = 1.96.
• This formula substitutes the sample proportion p for the unknown parameter π in
Cont’d
• Example; For the attitudes about abortion example just discussed, p = 0.448 for n = 893
observations. The 95% confidence interval equals
•
This is,
• We can be 95% confident that the population proportion of Americans in 2002 who favored
legalized abortion for married pregnant women who do not want more children is between
0.415 and 0.481.
• Formula (*) is simple. Unless π is close to 0.50, however, it does not work well unless n is very
large. Consider its actual coverage probability, that is, the probability that the method produces
an interval that captures the true parameter value. This may be quite a bit less than the
nominal value (such as 95%). It is especially poor when π is near 0 or 1.
Cont’d……
• Here is a simple alternative interval that approximates this one, having a similar
midpoint in the 95% case but being a bit wider: Add 2 to the number of successes
and 2 to the number of failures (and thus 4 to n) and then use the ordinary formula
(*) with the estimated standard error. For example, with nine successes in 10 trials,
you find and obtain confidence interval This simple method, sometimes called the
Agresti–Coull confidence interval, works well even for small samples
More On Statistical Inference For Discrete Data
• We have just seen how to construct a confidence interval for a proportion using an
estimated standard error or by inverting results of a significance test using the null
standard error. In fact, there are three ways of using the likelihood function to
conduct inference (confidence intervals and significance tests) about parameters.
They apply to any parameter in a statistical model, but we will illustrate using the
binomial parameter.
Wald, Likelihood-Ratio, And Score Inference
• Let β denote an arbitrary parameter. Consider a significance test of (such
• As , for which = 0).
• The simplest test statistic uses the large-sample normality of the ML estimator .
• Let SE denote the standard error of, evaluated by substituting the ML estimate for the
unknown parameter in the expression for the true standard error. The first large sample
inference method has test statistic using the estimated standard error has approximately a
standard normal distribution. Equivalently, has approximately a chi-squared distribution with
df = 1. This type of statistic, which uses the standard error evaluated at the ML estimate, is
called a Wald statistic. The z or chi-squared test using this test statistic is called a Wald test.
You can refer z to the standard normal table to get one-sided or two-sided P values.
Equivalently, for the two-sided alternative , has a chi-squared distribution with df = 1.
• The P-value is then the right-tail chi-squared probability above the observed value. The two-tail
probability beyond ±z for the standard normal distribution equals the right-tail probability above for
the chi-squared distribution with df = 1. For example, the two-tail standard normal probability of
0.05 that falls below −1.96 and above 1.96 equals the right-tail chi-squared probability above = 3.84
when df = 1.
• The second general purpose method uses the likelihood function through the ratio of two
maximizations of it:
• (i) The maximum over the possible parameter values that assume the null hypothesis,
• (ii) The maximum over the larger set of possible parameter values, permitting the null or the
alternative hypothesis to be true.
• Let denote the maximized value of the likelihood function under the null hypothesis, and let denote
the maximized value more generally. For instance, when there is a single parameter β, is the likelihood
function calculated at, and is the likelihood function calculated at the ML estimate. Then is always at
least as large as, because refers to maximizing over a larger set of possible parameter values.
• The likelihood-ratio test statistic equals
• The Wald, likelihood-ratio, and score tests are the three major ways of constructing
significance tests for parameters in statistical models. For ordinary regression
models assuming a normal distribution for Y, the three tests provide identical
results. In other cases, for large samples they have similar behavior when H 0 is true.
Small-Sample Binomial Inference
• For inference about a proportion, the large-sample two-sided z score test and the confidence
interval based on that test (using the null hypothesis standard error) perform reasonably well
when When π0 is not near 0.50 the normal P-value approximation is better for the test with a
two-sided alternative than for a one-sided alternative; a probability that is “too small” in one
tail tends to be approximately counter-balanced by a probability that is “too large” in the other
tail.
• For small sample sizes, it is safer to use the binomial distribution directly (rather than a
normal approximation) to calculate P-values. To illustrate, consider testing H 0: π = 0.50 against
Ha: π > 0.50. When the number of successes y = 9 in n = 10 trials. The exact P-value, based on
the right tail of the null binomial distribution with π = 0.50, is =
• For the two sided alternative Ha: 0.50, the P-value is
• P (Y ≥ 9 or Y ≤ 1) = 2 × P (Y ≥ 9) = 0.021