0% found this document useful (0 votes)
4 views

Week 5 Lecture Note-1

Uploaded by

Jay VN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week 5 Lecture Note-1

Uploaded by

Jay VN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

17/03/2023

228.371 Statistical Modelling for


Engineers and Technologists

Design of Experiments (DOE)

Week 5

Module 1 – An Introduction to DOE

Also, welcome 228797 (Research Methods) students

General Introduction - Available Resources

• Stream - Lecture notes, lecture recordings, tutorials,


lab notes, lab recordings, the study guide, past exam
questions and model answers. Comprehensive!

• Textbook (Optional) – Montgomery, D. C., Runger,


G. C., & Hubele, N. F. (2011). Engineering statistics
(5th ed.). Hoboken, NJ: Wiley.

• The above book is not available in e-book


form.

General Introduction – The Nature of


Statistically Designed Experiments
Design of Experiments (DOE) – WK 5 through to 10

Engineers and food technologists conduct experiments


to improve an existing process or design a new
process altogether to serve their customers though
products that meet and exceed customer expectations.
E.g.:
 To develop a new robotic sensor to identify
out-of-spec items in a production line
 To develop a new food product to a specific
target market
 To develop a better weather resistant paint

1
17/03/2023

Minimum Statistics Knowledge to Get


Going: Linear Multiple Regression (LMR)
Sweetener Pectin Calcium Consumer
Quantity Qty Qty Acceptability The LMR model characterizes
X1 X2 X3 Y what is going on out there!
0.09 1.10 0.04 65
0.21 1.10 0.04 57
0.09 1.40 0.04 57 We conduct a planned
0.21 1.40 0.04 67
0.09 1.10 0.06 26 experiment to collect data to
0.21 1.10 0.06 42
0.09 1.40 0.06 27
fit a LMR model, which may
0.21 1.40 0.06 57 take the form:
0.05 1.25 0.05 34
0.25 1.25 0.05 65
0.15 1.00 0.05 56
0.15 1.50 0.05 77 Y = β0 + β1X1 + β2X2 + β3X3 ++ +
12X12*X
ββ23 *X23++ββ2313
X2X*X
1*X
0.15 1.25 0.03 72
0.15 1.25 0.07 16 3 +3β13X1*X3 +
0.15 1.25 0.05 72 + β11X12 + β22X22 + β33X32
0.15 1.25 0.05 69
0.15 1.25 0.05 63
0.15 1.25 0.05 67 10 unknown parameters!

“Maximum Information with Minimum


Effort” is the name of the game in DOE!
Industrial experiments are very costly. We need a tried
and tested method of gaining maximum information
about a process with minimum effort/resources. This
generally means minimum number of experimental
trials! This is why we study DOE!

Sometimes we know how a process behaves


reasonably well - only fine tuning the process may be
what is need. However, more often than not, we know
very little about a new process (e.g. in New PD).

We cannot learn how a process behaves without


disturbing it (i.e. manipulating process parameters).
5

Example: An Experiment to Study the Effects sugar content,


temperature and brewing time on the fermentation of
Kombucha Beverage
The responses pH, Brix,
Colour, Ethanol content,
and CO2 reflect the
fermentation process in
Kombucha

We need to conduct a
planned experiment to
predict Y1, Y2, Y3, Y4,
and Y5 from X1, X2, and
X3.
We deviate somewhat from what we did up to now with R on two counts:
 We collect data via a planned experiment (so controlled conditions).
 We have limited number of observations.

2
17/03/2023

Kombucha Blending Continued…

We have five response variables (so five regression models).


We need to set
each factor (i.e.
X1, X2, and
X3), at two
levels (at least)
to observe an
effect of that
factor.
We need to design the experiment in a cost effective manner
(minimum trials), without compromising the goals of the
experiment. Each trial is money!
How many predictors do we include in our model initially? This
dictates the complexity of the experiment and data requirements.

The scientific method!


1. Present the problem statement.

2. Formulate a hypothesis/hypotheses and set out the


materials and methods required to test the hypothesis.

3. Conduct the experiment (or the observational study, if


an experiment is not feasible) to collect data.

4. Present the results on and around the hypothesis/


hypotheses. Plan
1,2
Act Do
5. Discuss the results (e.g. against 2). 6 3,4

Check
6. Make actionable conclusions. 5

The Scientific/Engineering Method


and Statistical Thinking
• Engineers and technologists solve problems of interest to
society by applying the scientific or engineering method.

Regression
Factors Predict Confirm Conclude
Problem Model

Why?
How?
Example of a problem:
What?
“What settings of X1,X2, and
Data Collection X3 get us the desired
sensory experience to the
consumers?” - See slide 6

3
17/03/2023

Observational Study vs Experiment

One can fit a fit a regression model to data by


collecting data in two ways:

• Via an Observational Study

Or

• Via a Planned Experiment

A conclusion that we make via an observational study


is less powerful, but sometimes we have no choice!

10

Observational Study: An Example

One of the by-products of potato processing is “filter


cake” (the stuff that builds-up on the filters), which can
be sold as cattle feed. However, % of solids included
in the feed is important to the purchasing customers.
The specification is 12% + 0.5% solids content.

Something has gone wrong in a potato processing


plant in Dipping Springs, TX, USA! The solid content
contained in the filter cakes has dropped
considerably, and the process engineers want to
study what is happening out there, without disturbing
the process! The boss is not allowing them to shut the
plant 
11

Observational Study Continued

Engineers know potential variables that could affect


the “% solids”. So, they went through recent records
on all variables of interest:
Only 10 rows of 66 rows are shown

Best-fitting model :

12

4
17/03/2023

More Examples/Research Questions

Area The Research Question/Problem


Food What should be the proportions of the key
Technology ingredients — Snapper (x1), Squid (x2) and
Oysters (x3) — that should go into a seafood
patty to get the right texture (y1) and
optimum taste (y2)?

Electrical What should be the number of turns (x1),


Engineering/ thickness of the winding wire (x2), and the
PD composition of the magnetic core material
(x3) of the starter motor of a motor car that
gives a consistent starting performance,
irrespective of the ambient condition (z1)?

13

The Process Model


Controlled in the
Controllable input factors experiment; fixed
x1 x2 xp after the experiment

...
Output Product
input PROCESS
 Labour y Typically, a quality
 Material/components characteristic(s)
 Machinery ... of a product
 Knowledge
z1 z2 zq Cannot be
controlled in the
Uncontrollable input factors experiment
(sometimes can,
with difficulty) or in
the actual operation.

14

Four Different Types of Experimental


Objectives
• Determining the most influential controllable input factors (x’s)
on the response y (usually important at early stages of
learning)
and /or
• Setting the controllable input factors (x’s) at right settings to
get the output (y) nearer to the target value (max, min, or T)
and /or
• Setting the controllable input factors (x’s) at right settings to
minimise the variability of the output (y)
and/or
• Setting the controllable input factors (x’s) at right settings to
minimise the influence of the uncontrollable variables (z)
A clear statement of objectives is extremely important!
15

5
17/03/2023

What is Design of Experiments (DoE)?

“A collection of methods and a strategy to make a


change to a product or process and observe the
effect of that change on one of more quality
characteristics, with the purpose of helping
experimenters gain the most information with the
resources available”.
(Moen, Nolan, & Provost, 1999, p. 405)

Moen, R.D, Nolan, T.W., & Provost, L.P (1999). Quality improvement
through planned experimentation (2nd ed.). New York: McGraw-Hill

Key goal: maximum information using minimum


resources

16

The Iterative Nature of Experimentation

Strategy: Keep it sequential, and simple (KISS)


Larger experiments are risky!

A eleven factor experiment!

Possible two-way Interactions:

55 two-way interactions to
estimate, in addition to the 11
main effects and the intercept! So
at least 67 trials. Lot of money for
little gain, because in reality, only few terms in the model would be significant.

17

For starters, let us think about the most


basic experiment we can think of:
Just one factor to manipulate! We want to see whether
changing the level of the factor from one level to
another causes any effect on the response. We can
conduct a one factor, two level study and analyse the
data (in Minitab or R – the two sample t test).

This week we study:

• One factor multi-level studies (in Minitab or R - the F test


in one-way ANOVA).

• One factor multi-level studies with a confounding factor (in


Minitab - the General Linear Model or Balanced ANOVA).

18

6
17/03/2023

228.371 Statistical Modelling for


Engineers and Technologists
Week 5
Module 2 - One factor experimental designs

• Topic 1: The three pillars (golden rules) of DOE


• Topic 2: A Completely Randomised Design (CRD)
• Topic 3: A Randomised Complete Block Design (RCBD)
(Study guide part 1, section 3)

19

Here are the two types of problems that


we solve this week
One factor multilevel design (Topic 2): The experimenter is
manipulating just one experimental factor at two or more
levels (typically the latter) and recording the response.

The objective is to study the effect of that factor on the


response variable. The experimenter may want to find out
which level gets them the best (optimum) results. The trials of
such experiments must be conducted in a completely
randomised fashion. As such, such designs are known as
completely randomised designs (CRDs).

One factor multilevel design, but with a confounding factor


that will be controlled at certain levels known as blocks (Topic
3): These designs are known as randomised complete block
designs. Here, the randomisation takes place within each block.

20

Example 1 - a one factor two-level


Completely Randomised Design (CRD)
A chemical engineer is interested in determining which catalyst produces a
greater yield from a chemical reaction: catalyst A versus catalyst B. Her apriori
understanding (literature review) says it is B!
Catalyst A (level 1)
The label given to our factor is ‘Catalyst’
Catalyst B (level 2)
The engineer calculated the yield (%) produced from catalyst A and catalyst
B separately, for 8 replicated runs. The experiment was conducted in a
completely randomised fashion (how?). The results obtained for the 16
trials are shown below:
Catalyst A Catalyst B Questions:
1 91.50 2 89.19
Standard Order

3 94.18 4 90.95 (a) Why do values within each column differ, even
5 92.18 6 90.46 though we are not changing anything?
7 95.39 8 93.21
9 91.79 10 97.19 (b) How do we test whether or not both catalysts
11 89.07 12 97.04 produce the same mean yield?
13 94.72 14 91.07
15 89.21 16 92.75 (c) What would be your H0 and H1? Think about the obj.

21

7
17/03/2023

Statistical Analysis – Minitab 21 software


will be your new friend (for DOE)

22

Minitab 21 Results
Individual Value Plot of Catalyst A, Catalyst B
98
Graphical plots are
97 also very informative
96 in DOE
95

94
μ ^ = 92.73
^ = 92.26; μ
Data

93 A B
92.73
92 92.26
91

90
Estimate for difference
89 (i.e. μ^A – μ^B) = -0.48.
Catalyst A Catalyst B

Makes sense ?
H0: μA - μB = 0
 Null hypothesis
H1: μA - μB ≠ 0  Alternative hypothesis (can also be μA - μB < 0)

95% CI of mean difference: (-3.37, 2.42); t0 = -0.35 p = 0.729 df = 14


23

Topic 1: The Three Pillars (Golden Rules) of DOE


Rule 1: Replication of Experimental Runs. Why?
• Greater the number of replications, more precise we
become in estimating the unknowns (previous example:
μA and μB).The confidence intervals become narrower!

• Replication is also required to calculate error variation


such as the means squares error in our analysis of
variance (ANOVA), which is an integral part of
establishing statistical significance.

• Can we replicate our experimental trials in any order


we like, such as finishing the trials with catalyst A first
and then go to catalyst B? Ans: Ideally no. This leads
us to the second golden rule in planned experiments.

24

8
17/03/2023

Second Golden Rule – Randomizing; and Third


Golden Rule – Blocking Why these two rules??
• The results of an experiment can always be vulnerable for
known and unknown sources of discrepancies (confounding
variables).

• A good experimenter must do their best to guard against these


sources of discrepancies.

• Randomisation – the strategy to guard against unknown


sources of discrepancies.

• Blocking – a well-known strategy to guard against a known


source/s of discrepancy. If the discrepancy is controllable in an
experiment (e.g., the chemical reactor), we can allow the
discrepancy to occur the way we want (i.e., control) so that this
does not — in theory — affect our results (effects estimates).

25

Now let us focus our attention on single


factor multi-level (> 2 levels) experiments

Reasons for using more than 2 levels in experiments:

– If our factor is text type (qualitative): we might


want to test more than two options (e.g. Blend A,
Blend B, and Blend C)

– If our factor is numeric (e.g. temperature): we


might suspect nonlinearity in the response versus
factor relationship (e.g. Yield vs Temperature)

26

Topic 2: Completely Randomised Design (CRD)


– specifically, a one factor multi-level CRD
A single factor multi-level (4-level) experiment
A manufacturer of paper used for making grocery bags is
interested in improving the tensile strength of the product.

Product engineering thinks that tensile strength is a function


of the hardwood concentration in the pulp that goes into
manufacturing the paper. The experimenters selected 4
hardwood concentrations to test: 5%, 10%, 15%, and 20%.

The response is tensile strength.

Source: Montgomery, D. C., Runger, G. C.,


& Hubele, N. F. (2011). Engineering
statistics (5th ed.). Hoboken, NJ: Wiley.

27

9
17/03/2023

Each treatment has six repetitions means, six specimens being


treated at the same treatment level (e.g. 5%). Consequently,
Results the 24 observations shown here come from 24 specimens.
There is a difference between repeated measurements and
For your information:
• Hardwood concentration is the factor repeated experimental trails (i.e. replicates).
• The four levels at which the factor was manipulated are known as treatments
• Each treatment has six repetitions or replicates (1-6)
• Each repetition is called a trial
• The trials are run in randomised order (the order shown in parentheses)
Hardwood Tensile Strength Observations (replicate no.)
concentration
(%) 1 2 3 4 5 6
7 8 15 11 9 10
5 (21) (14) (3) (16) (8) (19)
12 17 13 18 19 15
10 (1) (15) (9) (24) (4) (12)
14 18 19 17 16 18
15 (6) (2) (20) (10) (17) (7)
19 25 22 23 18 20
20 (22) (11) (13) (5) (18) (23)

28

For statistical analysis, we can


present data in 2 ways in Minitab
Run Concentration Tensile Strength Preferred Method (Concentration is the Factor)
Order (Factor) Y (Response)
1 10 12 Alternative Method
2 15 18
3 5 15 (showing response values in separate columns)
4 10 19
5 20 23
6 15 14 Conc 5% Conc 10% Conc 15% Conc 20%
7 15 18 7 12 14 19
8 5 9
9 10 13
8 17 18 25
10 15 17 15 13 19 22
11 20 25 11 18 17 23
12 10 15
13 20 22 9 19 16 18
14 5 8 10 15 18 20
15 10 17
16 5 11 H0: µ5% = µ10% = µ15% = µ20%
17 15 16
18 20 18 H1: At least one mean is different
19 5 10
20 15 19
21 5 7
We need to analyse the variation of the
22 20 19 observations to test whether or not H0
23 20 20
24 10 18 can be rejected in favour of H1

29

Individual data plot on the 24 observations


belonging to four concentration levels
The mean of the six observations @ 5% conc. = 10.0000; The mean of the six observations @ 10% conc. = 15.6667
The mean of the six observations @ 15% conc. = 17.0000; The mean of the six observations @ 20% conc. = 21.1667
The mean of all the 24 observations (aka the grand mean) = 15.9583

Individual Value Plot of Strength vs Concentration

25

21.1667
20
Strength

17.0000 µ = 15.9583
15.6667
15

Keep in mind
10 10.0000
that H0 and
So what do you think? Your expectation (H1), “at least H1 refer to
one mean is different” might be supported by the data? population
5 means and
5% 10% 15% 20% not anything
Concentration else

30

10
17/03/2023

Analysis of Variance Calculation (side 1 of 4)

Overall Mean of Y = Average(a2:d7) = 15.9583

Mean of Y @ 5% Conc = 10.0000


Mean of Y @ 10% Conc = 15.6667
Mean of Y @ 15% Conc = 17.0000
Mean of Y @ 20% Conc = 21.1667

• There is variation in the 24 observations around the overall mean value of


15.9583. This is known as Total Variation (SST).

• There is variation in the 6 observations within each group (factor level) around the
group averages. This is known as within-group variation or Error Variation (SSE).

• There is variation in the observations between groups (treatments) on account of


the fact that group averages are not the same. This is known as between-group
variation or Treatment Variation (SSTR).

31

Analysis of Variance Calculation (side 2 of 4)

Run Concentration Tensile Strength


Order (Factor) Y (Response) Let us calculate the total variation
1 10 12
2 15 18 (SST) to understand what it means:
3 5 15
4 10 19 SST is the sum of following 24 squared quantities
5 20 23
6 15 14 SST (Total Sum Squares) = (12-15.9583)2 +
7 15 18
8 5 9
(18-15.9583)2 + (15-15.9583)2 +…….. +
9 10 13 (20-15.9583)2 + (18-15.9583)2 = 512.9583
10 15 17
11 20 25
12 10 15 Likewise, if you hand-calculate the
13 20 22
14 5 8 quantities SSTR and SSE using the
15 10 17
16 5 11
formulae given in your study guide,
17 15 16 on a good day, you should get the
18 20 18
19 5 10 following answers :
20 15 19
21 5 7
22 20 19 SSTR = 130.1667 and SSE = 382.1667
23 20 20
24 10 18

32

Analysis of Variance Calculation (side 3 of 4)

From statistics first principals we can prove that SST = SSTR + SSE
SST corresponds to (24 - 1) degrees of freedom (df).
SSTR corresponds to (4 -1) df
SSE corresponds to (24 - 4) df

More generally,

SST corresponds to “total number of observations -1” df

SSTR corresponds to “number of levels of the factor -1” df

SSE corresponds to “total number of observations - number of


levels of the factor” df
Once we know the df of SSTR and SSE, we can calculate the
mean squares quantities to calculate our F statistic!

33

11
17/03/2023

Analysis of Variance Calculation (slide 4 of 4)

Partitioning the Total Variation:


Total Variation (SST) = Treatment Variation (SSTR) + Error Variation (SSE)

Source SS DF MS F p
Treatment (TR) √ √ SS/DF

Error (E) √ √ SS/DF

Total (T) √ √

SS = Sum of Squares
DF = Degrees of Freedom
MS = Mean Square
F = F statistic (test statistic)
p = p-value (significance of the test statistic)

34

Minitab ANOVA output

One-way ANOVA: Strength versus Concentration


Null hypothesis (H0) All means are equal
Alternative hypothesis (H1) At least one mean is different
Significance level α = 0.05

Equal variances were assumed for the analysis.


You will derive
Factor Information these sorts of
Factor Levels Values
results in the lab 
Concentration 4 5%, 10%, 15%, 20%

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Concentration 3 382.8 127.597 19.61 0.000
Error 20 130.2 6.508
Total 23 513.0
We can reject H0 in favour of H1

35

We rejected our null hypothesis in favour of our


alternative hypothesis. But, how do we know
where the inequality exists?
Interval Plot of Conc 5%, Conc 10%, ...
95% CI for the Mean
25

20
Data

15

Taking two pairs at a


10
time is OK to size a
Conc 5% Conc 10% Conc 15% Conc 20%
difference between a
The pooled standard deviation is used to calculate the intervals. pair of mean values,
Some Acceptable Approaches: but you should not
use the t-test 6 times!
(1) Tukey’s HSD approach
(2) Tukey simultaneous 95% CI approach
Tukey stuff – In the lab!

36

12
17/03/2023

Underlying linear model for a


completely randomised design

yij = µ + τi + εij
Where:
yij = jth value of the ith treatment level
µ = overall (grand) mean
τi = effect of ith treatment level
εij = random error of the jth value of the ith treatment level

i = 1,2,…,a In our example on paper testing, i = 1,2,3,4


j = 1,2,…,n and j = 1,2,3,4,5,6

37

The components of the model shown


diagrammatically, for our data
yij = µ + τi + εij
Individual Value Plot of Strength vs Concentration
yij
25
i =1, 2, 3, 4 εij
j = 1, 2, 3, 4, 5, 6
µ^4
20
τ4
µ^3
τ3
Strength

τ2 µ^2 µ
15
τ^ 1

10 µ^1
Predicted value for each treatment

5
5% 10% 15% 20%
Concentration

38

Minitab’s graphical plot to test the assumptions on statistical


inferencing (Can say ANOVA assumptions)

A normal probability plot of A plot of residuals εij against the fitted


residuals εij to assess (predicted) values of the model to assess
normality. equal variance (homoscedasticity) across
all levels of the factors.
Residual Plots for Strength
Normal Probability Plot Versus Fits
99
4
90
Residual

2
Percent

50
0

10 -2

1 -4
-5.0 -2.5 0.0 2.5 5.0 10.0 12.5 15.0 17.5 20.0
Residual Fitted Value

Histogram Versus Order


4.8
4
Frequency

3.6
Residual

2.4 0

1.2 -2

0.0 -4
-4 -2 0 2 4 2 4 6 8 10 12 14 16 18 20 22 24
Residual Observation Order

A histogram of residuals εij to assess A plot of residuals εij against the order in
normality. which the observations were obtained to
test independence of observations.

39

13
17/03/2023

Topic 3: Randomised Complete Block


Design – An Example
A chemical engineer is interested in testing four formulations
of new fertilizer — Blend A, Blend B, Blend C, and Blend D —
that she blended for growing grape trees. Her response
variable of interest is the yield of grapes in MT per hectare

The engineer suspects that the four plots of lands that she
selected for her experiment could potentially influence the
results. The engineer needs to control the effect of plot-of-land
(background variable) statistically, within her ANOVA.

She has for plots of land: Land 1, Land 2, Land 3 and Land 4
Factor Information
Factor Levels Values
Land (Blocks) 4 1, 2, 3, 4
Formulation 4 A, B, C, D

40

The data set (from the engineer’s logbook)

• The fertilizer formulation is the 'factor' of focal interest in this investigation


• The four plots of land being used represent the 4 levels of the nuisance factor (blocks)
• There are no replications; there is only one yield value, given the particular blend of
fertilizer being used in a particular plot of land
• The trials are run in a random order within each block (green)
The Yield in MT/Ha
Blocks Treatment (Formulation)
(Plot of Land) A B C D
6.527 5.709 12.287 12.074
1
(12) (10) (11) (9)

9.099 3.827 8.191 15.314


2
(4) (1) (3) (2)
8.626 5.479 10.956 13.174
3
(16) (15) (13) (14)
11.490 16.111 14.501 23.040
4
(6) (8) (5) (7)

41

Minitab ANOVA for Yield vs Land,


Formulation
H0 for the Factor Formulation: All treatment means are the same.
H1 for the Factor Formulation: At least 1 mean is different.
Likewise, we can have an H0 & H1 for the background factor. But
as scientists, we are not interested in finding out which plot of
land is more fertile .
Analysis of Variance for Yield
Source DF SS MS F P
Land(Blocks) 3 148.07 49.356 8.34 0.006
Formulation 3 155.47 51.823 8.76 0.005
Error 9 53.26 5.918
Total 15 356.80
Results suggest that both
Model Summary factors are significant!
S R-sq R-sq(adj) Land is a nuisance factor
2.43261 85.07% 75.12% lurking in the background!

42

14
17/03/2023

Underlying linear model for a


randomised block design

yij = µ + τi + βj + εij
Where:
yij = the value in the jth block for the ith treatment level
µ = overall (grand) mean
τi = effect of treatment i
βj = effect of block j
εij = random error component
i = 1,2,…,a
j = 1,2,…,b

43

Space for additional note taking

15

You might also like