Multiple Regression PDF
Multiple Regression PDF
Multiple Regression PDF
Regression Equation
Rating = −0.7560 + 0.15453 Conc + 0.21705 Ratio + 0.010806 Temp + 0.09464 Time
Analysis of Variance
S R-sq R-sq(adj)
Coefficients
R Large residual
Perform the analysis
Enter your data for Multiple Regression (Data tab)
Learn more about Minitab
On the Data tab of the Multiple Regression dialog box, specify the data for your analysis and determine whether
you want to include the intercept in the regression equation.
In This Topic
• Enter your data
• Fit intercept
Enter your data
Complete the following steps to specify the columns of data that you want to analyze.
1. In Response, enter the columns of numeric data that you want to explain or predict. The response is
also called the Y variable.
2. In Continuous predictors, enter the columns of numeric data that may explain or predict changes in
the response. The predictors are also called X variables.
3. (Optional) In Categorical predictor, enter the categorical classifications or group assignments, such
as a type of raw material, that may explain or predict changes in the response. You can enter one
categorical predictor. The predictors are also called X variables.
In this worksheet, Strength is the response and contains the strength measurements of a sample of synthetic
fibers. Temperature in a continuous predictor and Machine is a categorical predictor. The predictors may
explain differences in fiber strength. The first row of the worksheet shows that the first sample of fiber has a
strength measurement of 40, has a temperature of 136, and was produced on Machine A.
C1 C2 C3
Strength Temperature Machine
40 136 A
53 142 A
32 119 B
36 127 A
42 151 B
45 121 B
Fit intercept
Select Fit intercept to include the intercept (also called the constant) in the regression model. In most cases,
you should include the constant in the model.
A possible reason to remove the constant is when you can assume that the response is 0 when the predictor
values equal 0. For example, consider a model that predicts calories based on the fat, protein, and carbohydrate
contents of a food. When the fat, protein, and carbohydrates are 0, the number of calories will also be 0 (or very
close to 0).
When you compare models that do not include the constant, use S instead of the R2 statistics to evaluate the fit
of models.
Specify confidence intervals for Multiple Regression (Options tab)
Learn more about Minitab
On the Options tab of the Multiple Regression dialog box, specify confidence intervals for the regression
coefficients.
In This Topic
• Confidence level
• Type of interval
Confidence level
From Confidence level, select the level of confidence for the confidence intervals for the regression
coefficients. The confidence intervals (CI) are ranges of values that are likely to contain the true value of the
coefficient for each term in the model.
Usually, a confidence level of 95% works well. A 95% confidence level indicates that if you took 100 random
samples from the population, the confidence intervals for approximately 95 of the samples would contain the
true value of the coefficient. For a given set of data, a lower confidence level produces a narrower interval, and
a higher confidence level produces a wider interval.
Type of interval
From Type of interval, select a two-sided interval or a one-sided bound. For the same confidence level, a bound
is closer to the point estimate than the interval. The upper bound does not give a likely lower value. The lower
bound does not give a likely upper value.
For example, the coefficient for a predictor is 13.2 mg/L. The 95% confidence interval for the coefficient is 12.8
mg/L to 13.6 mg/L. The 95% upper bound for the coefficient is 13.5 mg/L, which is more precise because the
bound is closer to the predicted mean.
Two-sided
Use a two-sided confidence interval to estimate both likely upper and lower values for the coefficient.
Lower bound
Use a lower confidence bound to estimate a likely lower value for the coefficient.
Upper bound
Use an upper confidence bound to estimate a likely higher value for the coefficient.
Coefficients
Analysis of Variance
A point that is far away from the other points in the An influential point
x-direction
In this residuals versus fits plot, the data do not appear to be randomly distributed about zero. There appear to
be clusters of points that may represent different groups in the data. Investigate the groups to determine their
cause.
If you see a pattern, investigate the cause. The following types of patterns may indicate that the residuals are
dependent.
Trend
Shift
Cycle
In this residuals versus order plot, the residuals do not appear to be randomly distributed about zero. The
residuals appear to systematically decrease as the observation order increases. You should investigate the trend
to determine the cause.
For more information on how to handle patterns in the residual plots, go to Interpret all statistics and graphs for
Multiple Regression and click the name of the residual plot in the list at the top of the page.
Normality plot of the residuals
Use the normal probability plot of residuals to verify the assumption that the residuals are normally distributed.
The normal probability plot of the residuals should approximately follow a straight line.
The patterns in the following table may indicate that the model does not meet the model assumptions.
n number of observations
p number of coefficients in the
model, not counting the
constant
mean response
SS Total:
Coefficient (Coef)
The formula for the coefficient or slope in simple linear regression is:
Term Description
yi ith observed response value
The formula for the intercept (b0) is:
mean response
xi ith predictor value
In matrix terms, the formula that calculates the
mean predictor
vector of coefficients in multiple regression is:
X design matrix
b = (X'X)-1X'y
y response matrix
Term Description
Mallows' Cp
SSEp sum of squared errors for the model under
consideration
MSEm mean square error for the model with all predictors
n number of observations
p number of terms in the model, including the
constant
Degrees of freedom (DF)
The degrees of freedom for each component of the model are:
Sources of variation DF
Regression p
Error n–p–1
Total n–1
If your data meet certain criteria and the model includes at least one continuous predictor or more than one
categorical predictor, then Minitab uses some degrees of freedom for the lack-of-fit test. The criteria are as
follows:
• The data contain multiple observations with the same predictor values.
• The data contain the correct points to estimate additional terms that are not in the model.
Notation
Term Description
n number of observations
p number of coefficients in the model, not counting the constant
Fit
Notation
Term Description
fitted value
xk kth term. Each term can be a single predictor, a polynomial term, or an interaction term.
bk estimate of kth regression coefficient
F-value
The formulas for the F-statistics are as follows:
F(Regression)
F(Term)
F(Lack-of-fit)
Notation
Term Description
MS Regression A measure of the variation in the response that the
current model explains.
MS Error A measure of the variation that the model does not
explain.
MS Term A measure of the amount of variation that a term
explains after accounting for the other terms in the
model.
MS Lack-of-fit A measure of variation in the response that could be
modeled by adding more terms to the model.
MS Pure error A measure of the variation in replicated response data.
The degrees of freedom are the degrees of freedom for error, as follows:
n–p–1
Notation
Term Description
The cumulative distribution function of the t distribution with degrees of
freedom equal to the degrees of freedom for error.
tj The t statistic for the jth coefficient.
n The number of observations in the data set.
p The sum of the degrees of freedom for the terms. The terms do not include
the constant.
Regression equation
For a model with multiple predictors, the equation is:
y = β0 + β1x1 + … + βkxk + ε
The fitted equation is:
In simple linear regression, which includes only one predictor, the model is:
y=ß0+ ß1x1+ε
Using regression estimates b0 for ß0, and b1 for ß1, the fitted equation is:
Notation
Term Description
y response
xk kth term. Each term can be a single predictor, a polynomial term, or an
interaction term.
ßk kth population regression coefficient
ε error term that follows a normal distribution with a mean of 0
bk estimate of kth population regression coefficient
fitted response
Residual (Resid)
Notation
Term Description
ei i th residual
i th observed response value
i th fitted response
R-sq
R2 is also known as the coefficient of determination.
Formula
Notation
Term Description
yi i th observed response value
mean response
i th fitted response
R-sq (adj)
While the calculations for adjusted R2 can produce negative values, Minitab displays zero for these cases.
Notation
Term Description
ith observed response value
mean response
n number of observations
p number of terms in the model
R-sq (pred)
While the calculations for R2(pred) can produce negative values, Minitab displays zero for these cases.
Notation
Term Description
yi i th observed response value
mean response
n number of observations
ei i th residual
hi i th diagonal element of X(X'X)–1X'
X design matrix
S
Notation
Term Description
The standard errors of the coefficients for multiple regression are the square roots of the diagonal elements of
this matrix:
Notation
Term Description
xi ith predictor value
mean of the predictor
X design matrix
X' transpose of the design matrix
s2 mean square error
Standardized residual (Std Resid)
Standardized residuals are also called "internally Studentized residuals."
Formula
Notation
Term Description
ei i th residual
hi i th diagonal element of X(X'X)–1X'
s2 mean square error
X design matrix
X' transpose of the design matrix
T-value
Notation
Term Description
tj test statistic for the jth coefficient
jth estimated coefficient
Notation
Term Description
R2( xj) coefficient of determination with xj as the response variable and the other terms
in the model as the predictors