0% found this document useful (0 votes)
12 views56 pages

Lecture 5 Chapter 3

Uploaded by

mehranullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views56 pages

Lecture 5 Chapter 3

Uploaded by

mehranullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Lecture 5

Chapter 3
DOX 6E Montgomery 2
Minitab

DOX 6E Montgomery 3
DOX 6E Montgomery 4
DOX 6E Montgomery 5
Standard
Mean Median deviation

Abs(0.34-0.520)= 0.18

DOX 6E Montgomery 6
DOX 6E Montgomery 7
DOX 6E Montgomery 8
DOX 6E Montgomery 9
DOX 6E Montgomery 11
The Box-Cox transformation is a statistical technique used to
stabilize variance and make data more closely resemble a normal
distribution, which can improve the accuracy of various statistical
analyses. It’s especially useful when data is positively skewed or
when the spread of the data increases with the mean.
DOX 6E Montgomery 17
DOX 6E Montgomery 18
Comparing Pairs of Treatment Means
Practical Interpretation of Results
• After conducting the experiment, performing the statistical analysis, and
investigating the underlying assumptions, the experimenter is ready to draw
practical conclusions about the problem he or she is studying. For this more
formal techniques need to be applied.
• A Regression Model
• The factors involved in an experiment can be either quantitative or qualitative.
• A quantitative factor is one whose levels can be associated with points on a
numerical scale, such as temperature, pressure, or time.
• Qualitative factors, on the other hand, are factors for which the levels cannot
be arranged in order of magnitude. Operators, batches of raw material, and
shifts are typical qualitative factors because there is no reason to rank them in
any particular numerical order
• With a quantitative factor such as time, the experimenter is usually
interested in the entire range of values used, particularly the response from a
subsequent run at an intermediate factor level. That is, if the levels 1.0, 2.0,
and 3.0 hours are used in the experiment, we may wish to predict the
response at 2.5 hours. Thus, the experimenter is frequently interested in
developing an interpolation equation for the response variable in the
experiment. This equation is an empirical model of the process that has
been studied.
• The general approach to fitting empirical models is called regression
analysis
• As a first approximation, we could try fitting a linear model to
the data, say
• R-squared, or the coefficient of determination, measures the
proportion of the variance in the dependent variable that is predictable
from the independent variables.
• It provides an indication of how well the model explains the data. The
value ranges from 0 to 1, with values closer to 1 indicating a better fit.
• Adjusted R-squared modifies R2 to account for the number of
predictors in the model.
• It penalizes the addition of irrelevant variables that don’t improve the
model.
• This makes it a more reliable metric for comparing models with
different numbers of predictors.
• Predicted R-squared measures how well the model predicts new,
unseen data, as opposed to how well it fits the current data.
• This metric is often calculated using cross-validation techniques,
where the model is trained on a portion of the data and then tested on
the remaining data.
• There isn’t a single formula for predicted R2 as it depends on the
cross-validation method, but it can be expressed as:
Prediction Error Sum of Squares (PRESS)

Total Sum of Squares (TSS)


• Pure error is the variability in the response variable that arises from
the natural randomness in repeated measurements under identical
conditions (i.e., at the same values of the predictors). This error is
based on the replicates in the data — multiple observations with the
same predictor values.
• Lack of fit is the error that occurs when the chosen model does not
adequately capture the underlying relationship between the predictor(s) and
the response. This error reflects how well the model's form (such as linear,
quadratic, etc.) matches the actual data trend.
• The Coefficient of Variation (C.V.) is a standardized measure of the
dispersion of data points around the mean, expressed as a percentage.
• In regression, C.V. is used to assess the relative spread of the residuals
(errors) around the mean response value, which helps evaluate the
consistency or precision of the model's predictions.
• Lower C.V. values indicate lower relative variability, meaning the
model's predictions are more consistent with observed values.
• Higher C.V. values suggest higher variability, indicating that the model
may be less precise.
• Generally, a C.V. of less than 10-20% is desirable, but this threshold
can vary depending on the field and the type of data.
• If the C.V. is 5%, this means that the spread of residuals (errors)
around the mean response is 5% of the mean response, which indicates
relatively low variability in predictions.
• Adequate Precision is a measure of the signal-to-noise ratio in a
model.
• It indicates how well the model is able to distinguish meaningful
signal (trends in the data) from background noise (random variation).
• Adequate Precision helps determine whether the model can be used to
make predictions.
• Adequate Precision is calculated as the ratio of the range of predicted
values (the signal) to the average prediction error (the noise). In
formula terms, it’s often represented as:
• Adequate Precision > 4: This is generally considered a good threshold.
• A value greater than 4 suggests the model has a sufficient signal-to-
noise ratio, indicating it’s likely adequate for prediction purposes.
• Low Adequate Precision (< 4): If the ratio is below 4, the model may
not have a strong enough signal relative to noise, suggesting the model
might need refinement or may not be suitable for prediction.
• The Variance Inflation Factor (VIF) is a metric used to detect
multicollinearity in a regression model.
• Multicollinearity occurs when predictor variables are highly correlated
with each other, which can inflate the variance of the coefficient
estimates, making them less reliable and harder to interpret.
• VIF quantifies how much the variance of a regression coefficient is
inflated due to multicollinearity.

You might also like