0% found this document useful (0 votes)
66 views37 pages

Statistics 2 For Chemical Engineering: Department of Mathematics and Computer Science

This document provides information about the Statistics 2 for Chemical Engineering course, including the lecturers, important details to remember, goals of the course, and an overview of the topics to be covered in the first week including an introduction to analysis of variance (ANOVA) and examples of its applications. Software and websites relevant to the course are also listed.

Uploaded by

Khuram Maqsood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views37 pages

Statistics 2 For Chemical Engineering: Department of Mathematics and Computer Science

This document provides information about the Statistics 2 for Chemical Engineering course, including the lecturers, important details to remember, goals of the course, and an overview of the topics to be covered in the first week including an introduction to analysis of variance (ANOVA) and examples of its applications. Software and websites relevant to the course are also listed.

Uploaded by

Khuram Maqsood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

12

2DS01

Statistics 2 for Chemical


Engineering
http://www.win.tue.nl/~sandro/2DS01

/ department of mathematics and computer science


12 Lecturers
Marko Boon ([email protected])
Dr. A. Di Bucchianico ([email protected])

Ir. G.D. Mooiweer ([email protected])


Drs. C.M.J. Rusch Groot ([email protected])

/ department of mathematics and computer science


12
Important to remember
Web site for this course:
http://www.win.tue.nl/~sandro/2DS01/
No textbook, but handouts + Powerpoint sheets through
web site
Bring notebook to fourth lecture (12th of April) and self-
study
Software:
Statgraphics (version 5.1). If not installed, install through
http://w3.tue.nl/nl/diensten/dienst_ict/organisatie/groepen/wins/campus_sof
tware/

Java (at least version 1.4). Install through http://java.com.

/
Java is needed to run Statlab (http://www.win.tue.nl/statlab).
Important: In order to
department ofrun Statlab during
mathematics thecomputer
and exams, security
sciencesettings
12 Goals of this course
teach students need for statistical basis of
experimentation
teach students statistical tools for
experimentation
design of experiments (factorial designs, optimal designs)
analysis of experiments (ANOVA)
use of statistical software
give students short introduction to recent
developments

/ department of mathematics and computer science


12 Week schedule
Week 1: Introduction to Analysis of Variance
(ANOVA)
Week 2: Factorial designs: screening
Week 3: Factorial designs: optimisation
Week 4: Optimal experimental design
and mixture designs (by A. Di
Bucchianico
Bring your laptop!)
/ department of mathematics and computer science
12
Detailed contents of week 1
statistics and experimentation
short recapitulation of regression analysis
one-way ANOVA
one-way ANOVA with blocks
multiple comparisons

/ department of mathematics and computer science


12
Statistics and experimentation
Chemical experiments often depend on several
factors (pressure, catalyst, temperature, reaction
time, ...)

Two important questions:


which factors are really important?
what are optimal settings for important factors?

/ department of mathematics and computer science


12
Use of statistical experimentation in chemical
engineering
Chemical synthesis (synthetic steps; work up and
separation; reagents, solvents, catalysts; structure,
reactivity and properties, ...)
Biotech industry (drug design, analytical
biochemistry, process optimization fermentation,
purification ,...)
Process industry (process optimization and control
-yield, purity, through put time, pollution, energy
consumption; product quality and performance -

/
material strength, warp, color, taste, odour; ...)
... department of mathematics and computer science
12
Short history of statistics and
experimentation
1920s - ... introduction of statistical methods in
agriculture by Fisher and co-workers
1950s - ... introduction in chemical engineering
(Box, ...)
1980s - ... introduction in Western industry of
Japanese approach (Taguchi, robust design)
1990s - ... combinatorial chemistry, high
througput processing

/ department of mathematics and computer science


12
Link to Statistics 1 for Chemical Engineering
introduction to measurements
data analysis
error propagation
regression analysis
use of statistical software (Statgraphics)

/ department of mathematics and computer science


12
Types of regression analysis

Linear means linear in coefficients, not linear functions!

Y 0 1 x
Simple linear regression

Multiple linear regression


Y 0 1 x1 2 x2 ...
2
Non-linear regression
Y 1C

/ department of mathematics and computer science


12 Linear regression

Model: Yi 0 1 x1i 2 x2i ... i


ssumptions:
the model is linear (+ enough terms)
the i's are normally distributed with =0 and
variance 2
the i's are independent.

/ department of mathematics and computer science


12 Specific warmth
specific warmth of vapour at constant pressure as function of
temperature
data set from Perrys Chemical Engineers Handbook
thermodynamic theories say that quadratic relation between
temperature and specific warmth usually suffices:

C p 0 1T 2T 2

/ department of mathematics and computer science


12
Scatter plot of specific warmth data

Plot of Cp vs T
2200

2100
C p

2000

1900

1800
250 300 350 400
T

/ department of mathematics and computer science


12
Regression output specific warmth data
Polynomial Regression Analysis
-----------------------------------------------------------------------------
Dependent variable: Cp
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
CONSTANT 3590.36 76.3041 47.0533 0.0000
T -12.1386 0.454369 -26.7153 0.0000
T^2 0.0213415 0.000670762 31.8169 0.0000
-----------------------------------------------------------------------------

Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 169252.0 2 84626.2 6227.13 0.0000
Residual 285.388 21 13.5899
-----------------------------------------------------------------------------
Total (Corr.) 169538.0 23

R-squared = 99.8317 percent


R-squared (adjusted for d.f.) = 99.8156 percent
Standard Error of Est. = 3.68645
Mean absolute error = 2.94042

/
Durbin-Watson statistic = 0.310971 (P=0.0000)
Lag 1 residual autocorrelation = 0.640511

department of mathematics and computer science


12
Issues in regression output
significance of model
significance of individual regression parameters
residual plots:
normality (density trace, normal probability plot)
constant variance (against predicted values + each independent
variable)
model adequacy (against predicted values)
outliers
independence

influential points

/ department of mathematics and computer science


12
Residual plot specific warmth data
This behaviour is visible in plot of fitted line only after
rescaling!
Residual Plot
Studentized residu al

4.2

2.2

0.2

-1.8

-3.8
1800 1900 2000 2100 2200

predicted Cp

/ department of mathematics and computer science


12
Plot of fitted quadratic model for specific
warmth data
Plot of Fitted Model
2200

2100
Cp

2000

1900

1800
250 300 350 400 450
T

/ department of mathematics and computer science


12
Conclusion regression models for specific
warmth data
we need third order model (polynomial of degree
3)
careful with extrapolation
original data set contains influential points
original data set contains potential outliers

/ department of mathematics and computer science


12 Analysis of variance
name refers to mathematical technique, not to
goal
comparison of means (!!) using variances
(extension of t-test to more than 2 samples)
samples usually are groups of measurements with
constant factor settings

/ department of mathematics and computer science


12 Example: ANOVA
production of yarns: influence of fibre composition on
breaking tension

simplification:
one factor: % cotton

fixed factor levels: 15%, 20%, 25%, 30%, 35%

experimental design: produce on the same


machine 5 threads of each type of fibre
composition in random order

/ department of mathematics and computer science


12Statistical setting

Basis model: Yij = + i + ij

influence error term:


replications overall
factor levels normal =0, 2
j=1,2,,n mean
i=1,2,k independent

Basis hypotheses:
H0: i = 0 for all i
H1: i 0 for at least one i

/ department of mathematics and computer science


12
Expectation under H0 (= no effect of factor
level)

spread observations with


respect to group means
chance
spread group means with
respect to overall mean

/ department of mathematics and computer science


12 Expectation under H1

spread observations with respect to chance


group means

spread group means with respect


to overall mean
systematic

/ department of mathematics and computer science


12
Illustration of group means

y3
y1
y
y 2

/ department of mathematics and computer science


12
Group means versus overall mean

y3 j y3
y3 j y
y3
y1
y
y3 y
y 2

/ department of mathematics and computer science


12 Conclusion

Comparison of both spreads yields indication for H0 vs

H1.

2 2 2

y y.. n yi . y.. yij yi .


k n k k n

ij
i 1 j 1 i 1 i 1 j 1

total = treatment: + rest: within groups


between groups
/ department of mathematics and computer science
12 Conclusion

Comparison of both spreads yields indication for H0 vs

H1. are converted into sums of squares:


Spreads

2 2 2

y y.. n yi . y.. yij yi .


k n k k n

ij
i 1 j 1 i 1 i 1 j 1

total = treatment: + rest: within groups


between groups
/ department of mathematics and computer science
12Mean Sums of Squares
sums of squares differ with respect to number of
contributions.
for fair comparison: divide by degrees of freedom.
we expect under H0: MSbetween MSwithin
we expect under H1: MSbetween >> MSwithin

summary in ANOVA table

/ department of mathematics and computer science


12
Completely Randomized One-factor Design

Experiment, in which one factor varies on k


levels.

At each level n measurements are taken.

The order of all measurements is random.

/ department of mathematics and computer science


12 Multiple comparisons
ANOVA only indicates whether there are significantly
different group means
ANOVA does not indicate which groups have different means
(although we may construct confidence intervals for differences)
various methods exist for correctly performing pairwise
comparisons:
LSD (Least Significant Difference) method
HSD (Honestly Significant Difference) method
Duncan
Newman Keuls
...

/ department of mathematics and computer science


12
Randomized one-factor block design

Experiment with one factor and observations in


blocks

In each block all treatments occur equally often;


randomization within blocks

Blocks are levels of noise factor.

/ department of mathematics and computer science


12 Example
testing method for material hardness :

force
pressure pin/tip

strip testing material

practical problem: 4 types of pressure pins


do these yield the same results?
/ department of mathematics and computer science
12 Experimental design 1

1 5 9 13
testing 2 6 10 14
strips 3 7 11 15
4 8 12 16

pin 1 pin 2 pin 3 pin 4

Problem: if the measurements of strips 5 through 8 differ, is


this caused by the strips or by pin 2?

/ department of mathematics and computer science


12Experimental design 2

Take 4 strips on which you measure (in random


order) each pressure pin once :

1 1 4 2
pressure 3 4 3 3
pins
2 3 2 1
4 2 1 4

strip 1 strip 2 strip 3 strip 4

/ department of mathematics and computer science


12 Blocking
Advantage of blocked experimental design
2: differences between strips are filtered out

Model: Yij = + i + j+
ij
factor block effect
pressure pin error term
strip

Primary goal: reduction error term


/ department of mathematics and computer science
12 Summary
completely randomized design
randomized block design
multiple comparisons

Reading material:
Statgraphics lecture notes section 4.1 through 4.3

http://www.acc.umu.se/~tnkjtg/chemometrics/editorial/aug20
02.html

/ department of mathematics and computer science

You might also like