0% found this document useful (0 votes)
26 views22 pages

ES 209 Lecture Notes Week 14

This document discusses statistical hypothesis testing and linear regression. It covers: 1. The concepts of the null and alternative hypotheses, type I and type II errors, and significance levels. Hypothesis tests involve choosing between the null and alternative hypotheses based on sample data. 2. An example that tests if a new cold vaccine is more effective than the standard vaccine using a binomial test. 3. The simple linear regression model, which relates a response variable Y to a predictor variable x using a linear equation with estimated intercept and slope parameters. Multiple regression generalizes this to multiple predictors.

Uploaded by

Janna Ann Jurial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views22 pages

ES 209 Lecture Notes Week 14

This document discusses statistical hypothesis testing and linear regression. It covers: 1. The concepts of the null and alternative hypotheses, type I and type II errors, and significance levels. Hypothesis tests involve choosing between the null and alternative hypotheses based on sample data. 2. An example that tests if a new cold vaccine is more effective than the standard vaccine using a binomial test. 3. The simple linear regression model, which relates a response variable Y to a predictor variable x using a linear equation with estimated intercept and slope parameters. Multiple regression generalizes this to multiple predictors.

Uploaded by

Janna Ann Jurial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Tests of Hypotheses

Linear Regression
Analysis of Variance (ANOVA)
Statistical Hypotheses: General
Concepts

ü Problem: the formation of a data-based decision procedure that can


produce a conclusion about some scientific system.
ü Definition 10.1: A statistical hypothesis is an assertion or
conjecture concerning one or more populations.

The Role of Probability in Hypothesis Testing


ü The decision procedure must include an awareness of the probability
of a wrong conclusion.
ü The reader must be accustomed to understanding that rejection of a
hypothesis implies that the sample evidence refutes it. Put another
way,
ü rejection means that there is a small probability of obtaining the
sample information observed when, in fact, the hypothesis is true.
Statistical Hypotheses: General
Concepts
The Null and Alternative Hypotheses
• Null hypothesis -refers to any hypothesis we wish to test and is
denoted by H0.
• The rejection of H 0 leads to the acceptance of an alternative
hypothesis, denoted by H1.
• The alternative hypothesis H1 usually represents the question to be
answered or the theory to be tested.
• The null hypothesis H 0 nullifies or opposes H 1 and is often the
logical complement to H1.
• The analyst arrives at one of the two following conclusions:
-reject H0 in favor of H1 because of sufficient evidence in the
data or fail to reject H0 because of insufficient evidence in the
data.
Testing a Statistical Hypothesis

• Example : A certain type of cold vaccine is known to be only 25% effective after a
period of 2 years. To determine if a new and more expensive vaccine is superior
in providing protection against the same virus for a longer period of time,
suppose that 20 people are chosen at random and inoculated. If more than 8 of
those receiving the new vaccine surpass the 2-year period without contracting
the virus, the new vaccine will be considered superior to the one presently in use.
• We are testing the null hypothesis that the new vaccine is equally effective after a
period of 2 years as the one now commonly used. The alternative hypothesis is
that the new vaccine is superior. This is equivalent to testing the hypothesis that
the binomial parameter for the probability of a success on a given trial is p = 1/4
against the alternative that p > 1/4. This is usually written as follows:

• The Test Statistic


The test statistic on which we base our decision is X, the number of individuals in
our test group who receive protection from the new vaccine for a period of at least 2
years. The possible values of X, from 0 to 20, are divided into two groups: those
numbers less than or equal to 8 and those greater than 8. All possible scores greater
than 8 constitute the critical region. The last number that we observe in passing into
the critical region is called the critical value.
Testing a Statistical Hypothesis

• Example : A certain type of cold vaccine is known to be only 25% effective after a
period of 2 years. To determine if a new and more expensive vaccine is superior
in providing protection against the same virus for a longer period of time,
suppose that 20 people are chosen at random and inoculated. If more than 8 of
those receiving the new vaccine surpass the 2-year period without contracting
the virus, the new vaccine will be considered superior to the one presently in use.

• The critical value is the number 8. Therefore, if x > 8, we reject H0 in favor of the
alternative hypothesis H1. If x ≤ 8, we fail to reject H0.
Testing a Statistical Hypothesis

The Probability of a Type I Error


ØRejection of the null hypothesis when it is
true is called a type I error.
ØNonrejection of the null hypothesis when it
is false is called a type II error.
Testing a Statistical Hypothesis

ü The probability of committing a type I error, also called the level


of significance, is denoted by the Greek letter α.

ü The probability of committing a type II error, denoted by β.

ü The true value of p is at least 0.7. If p = 0.7, this test procedure


gives
Testing a Statistical Hypothesis
Ø Important Properties of a Test of Hypothesis:
1. The type I error and type II error are related. A decrease in the
probability of one generally results in an increase in the probability of
the other.
2. The size of the critical region, and therefore the probability of
committing a type I error, can always be reduced by adjusting the
critical value(s).
3. An increase in the sample size n will reduce α and β simultaneously.
4. If the null hypothesis is false, β is a maximum when the true value of
a parameter approaches the hypothesized value. The greater the
distance between the true value and the hypothesized value, the
smaller β will be.

Ø The power of a test is the probability of rejecting H0 given that a


specific alternative is true.
Ø The power of a test can be computed as 1 − β.
Testing a Statistical Hypothesis

One- and Two-Tailed Tests


Ø A test of any statistical hypothesis where the alternative is one
sided, such as

or perhaps

is called a one-tailed test.

Ø A test of any statistical hypothesis where the alternative is two


sided, such as

is called a two-tailed test, since the


critical region is split into two parts, often having equal
probabilities, in each tail of the distribution of the test statistic.

Ø The alternative hypothesis θ ≠ θ0 states that either θ < θ0 or θ > θ0 .


The Use of P-Values for Decision Making
in Testing Hypotheses
• A P-value is the lowest level (of significance) at which the
observed value of the test statistic is significant.
• Choose an α of 0.05 or 0.01 and select the critical region
accordingly
• If the test is two tailed and α is set at the 0.05 level of
significance and the test statistic involves, say, the standard
normal distribution, then a z-value is observed from the data
and the critical region is z > 1.96 or z < −1.96, where the value
1.96 is found as z0.025 in Table A.3.
• A value of z in the critical region prompts the statement “The
value of the test statistic is significant,” which we can then
translate into the user ’s language. For example, if the
hypothesis is given by
H0: μ = 10,
H1: μ = 10,
• One might say, “The mean differs significantly from the value
10.”
The Use of P-Values for Decision Making
in Testing Hypotheses
Approach to Hypothesis Testing with Fixed Probability of
Type I Error:
1. State the null and alternative hypotheses.
2. Choose a fixed significance level α.
3. Choose an appropriate test statistic and establish the
critical region based on α.
4. Reject H0 if the computed test statistic is in the critical
region. Otherwise, do not reject.
5. Draw scientific or engineering conclusions.
The Use of P-Values for Decision Making
in Testing Hypotheses
Significance Testing (P-Value Approach)
1. State null and alternative hypotheses.
2. Choose an appropriate test statistic.
3. Compute the P-value based on the computed value of the
test statistic.
4. Use judgment based on the P-value and knowledge of
the scientific system.
Introduction to Linear Regression

Example: A procedure for estimating the tar content for various levels of the
inlet temperature from experimental information. It is likely that for many
example runs in which the inlet temperature is the same, say 130◦C, the outlet
tar content will not be the same.
ü Tar content, gas mileage (mpg), and the price of houses (in thousands of
dollars) are natural dependent variables, or responses.
ü Inlet temperature, engine volume (cubic feet), and square feet of living
space are, respectively, natural independent variables, or regressors.
ü A reasonable form of a relationship between the response Y and the
regressor x is the linear relationship
Y = β0 + β1x,
where, of course, β0 is the intercept and β1 is the slope.
ü If the relationship is exact, then it is a deterministic relationship between
two scientific variables and there is no random or probabilistic component
to it.
Introduction to Linear Regression
• The concept of regression analysis deals with finding the best relationship between Y
and x, quantifying the strength of that relationship, and using methods that allow for
prediction of the response values given values of the regressor x.
Introduction to Linear Regression

• In many applications, there will be more than one regressor


(i.e., more than one independent variable that helps to explain
Y).
• For example, in the case where the response is the price of a
house, one would expect the age of the house to contribute to
the explanation of the price, so in this case the multiple
regression structure might be written
Y = β0 + β1x1 + β2x2,
where Y is price, x1 is square footage, and x2 is age in years.
§ The resulting analysis is termed multiple regression, while
the analysis of the single regressor case is called simple
regression.
The Simple Linear Regression (SLR)
Model

v In the above, β 0 and β 1 are unknown intercept and slope


parameters, respectively, and is a random variable that is
assumed to be distributed with E( ) = 0 and Var ( ) = σ 2 . The
quantity σ2 is often called the error variance or residual variance.
The Simple Linear Regression (SLR)
Model
The Fitted Regression Line
• An important aspect of regression analysis is to
estimate the parameters β0 and β1 (i.e., estimate the
so-called regression coefficients).
• Suppose we denote the estimates b0 for β0 and b1 for
β1. Then the estimated or fitted regression line is
given by
ŷ = b0 + b1x,
where ŷ is the predicted or fitted value.
Analysis-of-Variance Technique
• One very common procedure used to deal with testing
population means is called the analysis of variance, or
ANOVA.
Example:
Suppose in an industrial experiment that an engineer is
interested in how the mean absorption of moisture in
concrete varies among 5 different concrete aggregates. The
samples are exposed to moisture for 48 hours. It is decided
that 6 samples are to be tested for each aggregate, requiring
a total of 30 samples to be tested.
Analysis-of-Variance Technique

• The model for this situation may be set up


as follows. There are 6 observations taken
from each of 5 populations with means μ1,
μ2, . . . , μ5, respectively. We may wish to
test
Analysis-of-Variance Technique
Analysis-of-Variance Technique

Example 13.1: Test the hypothesis μ1 = μ2 = · · · = μ5 at the 0.05


level of significance for the data of Table 13.1 on absorption of
moisture by various types of cement aggregates.
Decision: Reject H0 and conclude that the aggregates do not
have the same mean absorption. The P-value for f = 4.30 is
0.0088, which is smaller than 0.05.
Analysis-of-Variance Technique

You might also like