Gauss-Newton Method For Algebraic Models: As Seen in Chapter 2 A Suitable Measure of The Discrepancy Between A Model
Gauss-Newton Method For Algebraic Models: As Seen in Chapter 2 A Suitable Measure of The Discrepancy Between A Model
Gauss-Newton Method For Algebraic Models: As Seen in Chapter 2 A Suitable Measure of The Discrepancy Between A Model
and a set of data is the objective function, S(k), and hence, the parameter values are obtained by minimizing this function. Therefore, the estimation of the parameters can be viewed as an optimization problem whereby any of the available general purpose optimization methods can be utilized. In particular, it was found that the Gauss-Newton method is the most efficient method for estimating parameters in nonlinear models (Bard, 1970). As we strongly believe that this is indeed the best method to use for nonlinear regression problems, the Gauss-Newton method is presented in detail in this chapter. It is assumed that the parameters are free to take any values.
In this chapter we are focusing on a particular technique, the Gauss-Newton method, for the estimation of the unknown parameters that appear in a model described by a set of algebraic equations. Namely, it is assumed that both the structure of the mathematical model and the objective function to be minimized are known. In mathematical terms, we are given the model
y = f(x,k)
(4.1)
49
50
Chapter 4
where k=[k1,k?,...,kp]1 is a p-dimensional vector of parameters whose numerical values are unknown, x=[X],x2,...,xn]T is an n-dimensional vector of independent variables (which are often set by the experimentalist and their numerical values are
either known precisely or have been measured), f is a m-dimensional vector function of known form (the algebraic equations) and y=[yi,y2,---,y m ] T is the mdimensional vector of depended variables which are measured experimentally (output vector). Furthermore, we are given a set of experimental data, [ y j , X j ] , i=l,...,N that we need to match to the values calculated by the model in some optimal fashion. Based on the statistical properties of the experimental error involved in the measurement of the output vector y (and possibly in the measurement of some of the independent variables x) we generate the objective function to be minimized as mentioned in detail in Chapter 2. In most cases the objective function can be written as
(4.2a)
where Cj = [y; - f ( X j , k ) J are the residuals and the weighting matrices Q;, i=l,...,N are chosen as described in Chapter 2. Equation 4.2a can also be written as
; X i ,k)J
(4.2b)
(4.2c)
1=1^=11=1 Minimization of S(k) can be accomplished by using almost any technique available from optimization theory. Next we shall present the Gauss-Newton method as we have found it to be overall the best one (Bard, 1970).
4.2 THE GAUSS-NEWTON METHOD
Let us assume that an estimate kw is available at the jth iteration. We shall try to obtain a better estimate, k^ +1> . Linearization of the model equations around kw yields,
Copyright 2001 by Taylor & Francis Group, LLC
51
f(x,, k
ti+1)
) == f(xs, k ) +
0)
T
Ak(i+1> + H.O.T. ; i=l,...,N (4.3)
Neglecting all higher order terms (H.O.T.), the model output at k+1> can be approximated by yCx^0) = y( Xi ,k) + G ; AkO +1) ; i=l,...,N (4-4)
where Gj is the f/wx/^-sensitivity matrix \8f /3kjj = |Vf Jj evaluated at x( and k . It is noted that G is also the Jacobean matrix of the vector function f(x,k).
Substitution of y(x1,k(rl>) as approximated by Equation 4.4, into the LS objective function and use of the critical point criterion
(4.6)
A = ^GTQjGj
i=l
(4.7)
and
Solution of the above equation using any standard linear equation solver yields Ak^ +l) . The next estimate of the parameter vector, k^+1), is obtained as
k (M>
kO)
where a stepping parameter, \i (0<u. < 1), has been introduced to avoid the problem of overstepping. There are several techniques to arrive at an optimal value for u.; however, the simplest and most widely used is the bisection rule described below.
52
Chapter 4
that obtained in the previous iteration (Hartley, 1961). Namely we "accept" the first value of u that satisfies the inequality
S(ka) + uAk^11) < S(kG)) (4.10)
More elaborate techniques have been published in the literature to obtain optimal or near optimal stepping parameter values. Essentially one performs a
univariate search to determine the minimum value of the objective function along the chosen direction (Ak^1') by the Gauss-Newton method.
A typical test for convergence is ||Ak^+1)|| < TOL where TOL is a userspecified tolerance. This test is suitable only when the unknown parameters are of the same order of magnitude. A more general convergence criterion is
Ak}
(4.11)
where p is the number of parameters and NSIG is the number of significant digits
desired in the parameter estimates. Although this is not guaranteed, the above convergence criterion yields consistent results assuming of course that no parameter converses to zero!
4. Solve the linear equation AAk^'Mb and obtain Ak y+1) . 5. Determine u using the bisection rule and obtain k1-'41)=k(^+
Copyright 2001 by Taylor & Francis Group, LLC
^'''
53
is achieved (i.e.,
1^P
< 1(TNS1G).
7. Compute statistical properties of parameter estimates (see Chapter 11). In summary, at each iteration of the estimation method we compute the
model output, y(x,,k(j)), and the sensitivity coefficients, G i; for each data point
i=l,...,N which are used to set up matrix A and vector b. Subsequent solution of the linear equation yields Ak^1' and hence k^1' is obtained. The converged parameter values represent the Least Squares (LS), Weighted LS or Generalized LS estimates depending on the choice of the weighting matrices QJ. Furthermore, if certain assumptions regarding the statistical distribution of the residuals hold, these parameter values could also be the Maximum Likelihood (ML) estimates.
4.2.3
Formulation of the Solution Steps for the Gauss-Newton Method: Two Consecutive Chemical Reactions
Let us consider a batch reactor where the following consecutive reactions take place (Smith, 1981) A
k|
>B
k2
> D
(4.12)
Taking into account the concentration invariant C A +C B +C D = CAO, i.e. that there is no change in the total number of moles, the integrated forms of the isothermal rate equations are
(4.13a)
(4.13b)
(4.13c)
where CA, CB and CD are the concentrations of A, B and D respectively, t is the reaction time, and k b k2 are the unknown rate constants. During a typical experiCopyright 2001 by Taylor & Francis Group, LLC
54
Chapter 4
ment, the concentrations of A and B are only measured as a function of time. Namely, atypical dataset is of the form [tj, CAl, CBj], i=l,...,N. The variables, the parameters and the governing equations for this problem can be rewritten in our standard notation as follows:
Parameter vector:
Vector of independent variables:
Output vector (dependent variables): Model equations:
= [k,,k 2 ] T
x = [X]]
y [yi,y2]
T T f = Ff, f,l f=[f,,f 2] = T
w n e r exx,|= =t where
where yi= CA, y2= CE
f2(x1,k1,k2)=CAok1
k2-kj
k2-k,
(4.14b)
df,
M;
(4.15a)
(4.15b)
G21 =
G22 =
(4.15d)
(k2-k,)2
Equations 4.14 and 4.15 are used to evaluate the model response and the sensitivity coefficients that are required for setting up matrix A and vector b at each iteration of the Gauss-Newton method.
Copyright 2001 by Taylor & Francis Group, LLC
55
This is the well-known Gauss-Newton method which exhibits quadratic convergence to the optimum parameter values when the initial guess is sufficiently close. The Gauss-Newton method can also be looked at as a procedure that converts the nonlinear regression problem into a series of linear regressions by linearizing the nonlinear algebraic equations. It is worth noting that when the model equations are linear with respect to the parameters, there are no higher order terms (HOT) and the linearization is exact. As expected, the optimum solution is obtained in a single iteration since the sensitivity coefficients do not depend on k. In order to enlarge the region of convergence of the Gauss-Newton method and at the same time make it much more robust, a stepping parameter is used to avoid the problem of overstepping particularly when the parameter estimates are away from the optimum. This modification makes the convergence to the optimum monotonic (i.e., the objective function is always reduced from one iteration to the next) and the overall estimation method becomes essentially globally convergent. When the parameters are close to the optimum the bisection rule can be omitted without any problem. Finally, an important advantage of the Gauss-Newton method is also that at the end of the estimation besides the best parameter estimates their covariance matrix is also readily available without any additional computations. Details will be given in Chapter 11.
4.3 EXAMPLES
y = kl[]-exp(-k2t)]
(4.16)
In this case, the unknown parameter vector k is the 2-dimensional vector [k],k2]T. There is only one independent variable (xi=t) and only one output variable. Therefore, the model in our standard notation is y, = f 1 (x 1 ,k,,k 2 ) = Idll-expC-Ml)] (4.17)
56
Chapter 4
Table 4.1
Modified Reaction
Time (t)
(h kg/kmol) -> j 6 13 18
26 28 Sowrce: Gallot et al. (1998)
= I
(7K
= l-ex/7(-k 2 x,)
(4.18a)
'12
exp(-k2x.\)
(4.18b)
Equations 4.17 and 4.18 are used to evaluate the model response and the sensitivity coefficients that are required for setting up matrix A and vector b at each iteration of the Gauss Newton method.
4.3.2
Data on biological oxygen demand versus time are usually modeled by the following equation
y = k}[]-exp(-k2x)]
(4.19)
where k t is the ultimate carbonaceous oxygen demand (mg/L) and k2 is the BOD reaction rate constant (</'). A set of BOD data were obtained by 3rd year Environmental Engineering students at the Technical University of Crete and are given in Table 4.2.
Copyright 2001 by Taylor & Francis Group, LLC
57
Time (days) 1 2 3 4 5
6 1 8
As seen the model for the BOD is identical in mathematical form with the model given by Equation 4.17.
4.3.3 Numerical Example 1
Let us consider the following nonlinear model (Bard, 1970). Data for the model are given in Table 4.3. =k
K. } X } ~T K. T X ^
(4.20)
This model is assumed to be able to fit the data given in Table 4.3. Using our standard notation [y=f(x,k)] we have,
y =[y] f = [f,]
(4.21)
K-9 \2 ~^~ ^1^3
The elements of the (/jd)-dimensional sensitivity coefficient matrix G are obtained by evaluating the partial derivatives: G,,= l - - l = 1 . (4.22a)
ok i
58
Chapter 4
Table 4.3
Run
X3
Ycalc
0.14 1 0.18 2 o 0.22 _> 0.25 4 0.29 5 0.32 6 0.35 7 0.39 8 0.37 9 0.58 10 0.73 11 0.96 12 1.34 13 14 2.10 4.39 15 Source: Bard (1970).
1 2
o J
4 5
6 7 8 9 10 11 12 13 14 15
15 14 13 12 11 10 9 8 7
1 2 3 4 5
6 5 4 -> j 2 1
6 7 8 7 6 5 4 -) j 2 1
0.1341 0.1797 0.2203 0.2565 0.2892 0.3187 0.3455 0.3700 0.4522 0.5618 0.7152 0.9453 1.3288 2.0958 4.3968
5k 2
J*L ok.
(k 2 x 2 + k 3 x 3 ) 2 (k 2 x, +k 3 x 3 ) 2
(4.22b)
(4.22c)
Equations 4.21 and 4.22 are used to evaluate the model response and the sensitivity coefficients that are required for setting up matrix A and vector b at
each iteration of the Gauss-Newton method.
Data on the thermal isomerization of bicyclo [2,1,1] hexane were measured by Srinivasan and Levi (1963). The data are given in Table 4.4. The following nonlinear model was proposed to describe the fraction of original material remaining (y) as a function of time (x,) and temperature (x2). The model was reproduced from Draper and Smith (1998)
Copyright 2001 by Taylor & Francis Group, LLC
59
y = exp< - k ] x i exp - k -
620
(4.23)
Using our standard notation [y=f(x,k)] we have, Parameter vector: Vector of independent variables: Output vector: Model Equation: where
f | ( X ] , x 2 , k 1 , k 2 ) = e x p j - k [ X ] exp - k ^
vx2
620
(4.24)
Table 4.4
Run 1
2 3 4 5 6
7
X]
X2
y
0.900 0.949 0.886 0.785 0.791 0.890 0.787 0.877 0.938 0.782 0.827 0.696 0.582 0.795 0.800 0.790 0.883
0.712 0.576 0.715
Run
21
X|
X2
y
0.673 0.802 0.802 0.804 0.794 0.804 0.799 0.764 0.688
0.717 0.802
8 9 10 11 12 13 14 15 16 17
18
19
20
120.0 60.0 60.0 120.0 120.0 60.0 60.0 30.0 15.0 60.0 45.1 90.0 150.0 60.0 60.0 60.0 30.0 90.0 150.0 90.4
600 600 612 612 612 612 620 620 620 620 620 620 620 620 620 620 620 620 620 620
22 23 24 25 26 27 28 29 30
31
32 33 34 35 36 37 38 39 40 41
120.0 60.0 60.0 60.0 60.0 60.0 60.0 30.0 45.1 30.0 30.0 45.0 15.0 30.0 90.0 25.0 60.1 60.0 30.0 30.0 60.0
620 620 620 620 620 620 620 631 631 631 631 631 639 639 639 639 639 639 639 639 639
0.659 0.449
60
Chapter 4
5k,
(4.25a)
Gl2 =
_6f, ok 7
= Mi
(4.25b)
Equations 4.24 and 4.25 are used to evaluate the model response and the sensitivity coefficients that are required for setting up matrix A and vector b at
Km+S
The parameters are usually obtained from a series of initial rate experiments performed at various substrate concentrations. Data for the hydrolysis of benzoylL-tyrosine ethyl ester (BTEE) by trypsin at 30 C and pH 7.5 are given below:
10 260
5.0 220
2.5 110
In this case, the unknown parameter vector k is the 2-dimensional vector [rmax, Km]T, the independent variables are only one, x = [S] and similarly for the output vector, y = [r]. Therefore, the model in our standard notation is
y, = f ] (x 1 ,k 1 ,k 2 ) =
Copyright 2001 by Taylor & Francis Group, LLC
(4.27)
61
by
'of,
(4 28a
k +x
- >
M2
2+X|
Equations 4.27 and 4.28 are used to evaluate the model response and the sensitivity coefficients that are required for setting up matrix A and vector b at each iteration of the Gauss-Newton method.
As another example from chemical kinetics, we consider the catalytic reduction of nitric oxide (NO) by hydrogen which was studied using a flow reactor operated differentially at atmospheric pressure (Ayen and Peters, 1962). The following reaction was considered to be important
N.O + H, <> H20+-N2 (4.29)
Data were taken at 375, 400 C, and 425 C using nitrogen as the diluent. The reaction rate in gmol/(min-g-catalyst) and the total NO conversion were measured at different partial pressures for H 2 and NO. A Langmuir-Hinshelwood reaction rate model for the reaction between an adsorbed nitric oxide molecule and one adjacently adsorbed hydrogen molecule is described by:
=
KNOpNO + K H 2 p H
where r is the reaction rate in gmol/(min-g-catalyst), p H2 is the partial pressure of hydrogen (aim), PNO is the partial pressure of NO (atm), KNO= A2exp{-E2/RT} atm'1 is the adsorption equilibrium constant for NO, KH2= A3exp{-E3/KT} atm'1 is the adsorption equilibrium constant for H 2 and k=A\exp{-Ei/RT} gmol/(min-gcatalyst) is the forward reaction rate constant for surface reaction. The data for the above problem are given in Table 4.5.
Copyright 2001 by Taylor & Francis Group, LLC
62
Chapter 4
The objective of the estimation procedure is to determine the parameters k, K H 2 and K NO (if data from one isotherm are only considered) or the parameters A h
A2, A3. E|, E2, E3 (when all data are regressed together). The units of E,, E2, E3 are in cal/mol and R is the universal gas constant (1 .987 cal/mol K). For the isothermal regression of the data, using our standard notation [y=f(x,k)] we have, Parameter vector: Independent variables: Output vector Model Equation where
f1(x1.x2.k..k2.k3)= 2 . 2 (l + k 3 x 2 + k 2 x , ) (4.31)
The elements of the (/x3)-dimensional sensitivity coefficient matrix G are obtained by evaluating the partial derivatives: G,,= 1 - = , (l + k 3 x 2 + k 2 x , ) (4.32a)
k]k3x1x2
2k 1 k 2 k 3 x 1 x 2 ~ ~ k3x2
, . _, ,
(4.JZD)
k3x2+k2x,)2
(l+ k 3 x , + k 2 x , }
Equations 4.31 and 4.32 are used to evaluate the model response and the sensitivity coefficients that are required for setting up matrix A and vector b at each iteration of the Gauss Newton method.
4.3.7 Numerical Example 2
y = k1+k2et/7(k3x)
Copyright 2001 by Taylor & Francis Group, LLC
(4.33)
63
PNO
(atm)
(atm)
Total NO
Conversion (%)
0.00922 0.0136 0.0197 0.0280 0.0291 0.0389 0.0485 0.0500 0.0500 0.0500 0.0500 0.0500 0.00659 0.0113 0.0228 0.0311 0.0402 0.0500 0.0500 0.0500 0.0500 0.0500 0.0500 0.00474 0.0136 0.0290 0.0400 0.0500 0.0500 0.0500 0.0500
T=375 <C, Weight of catalyst=2.39g 1.60 0.0500 0.0500 2.56 3.27 0.0500 3.64 0.0500 0.0500 3.48 4.46 0.0500 4.75 0.0500 1.47 0.00918 0.0184 2.48 3.45 0.0298 4.06 0.0378 4.75 0.0491 T=400 C, Weight of catalyst= 1.066 g 2.52 0.0500 4.21 0.0500 5.41 0.0500 6.61 0.0500 6.86 0.0500 8.79 0.0500 3.64 0.0100 4.77 0.0153 6.61 0.0270 7.94 0.0361 7.82 0.0432 T=425 t. Weight of catalyst=l .066 g 0.0500 5.02 0.0500 7.23 0.0500 11.35 0.0500 13.00 0.0500 13.91 0.0269 9.29 0.0302 9.75 0.0387 11.89
1.96
2.36
2.99 3.54 3.41
4.23
4.78 14.0 9.15 6.24 5.40 4.30 0.59 1.05 1.44 1.76 1.91
2.57
8.83 6.05 4.06 3.20 2.70 2.62 4.17 6.84 8.19 8.53 13.3 12.3 10.4
64
Chapter 4
Data for the model are given below in Table 4.6. The variable y represents
(4.35b)
Gn =
Sf,
ok.
(4.35c)
y
127 151 379 421 460 426
-5 -3 -1 1 3
5
Source: Hartley (1961).
4.4 SOLUTIONS
The solutions to the Numerical Examples 1 and 2 will be given here. The rest of the solutions will be presented in Chapter 16 where applications in chemical reaction engineering are illustrated.
Copyright 2001 by Taylor & Francis Group, LLC
65
Starting with the initial guess k (0) =[l, 1, 1]T the Gauss-Newton method easily converged to the parameter estimates within 4 iterations as shown in Table 4.7. In the same table the standard error (%) in the estimation of each parameter is also shown. Bard (1970) also reported the same parameter estimates [0.08241, 1.1330, 2.3437] starting from the same initial guess. The structure of the model characterizes the shape of the region of convergence. For example if we change the initial guess for k, substantially, the algorithm converges very quickly since it enters the model in a linear fashion. This is clearly shown in Table 4.8 where we have used k<0)=[ 100000, 1, 1]T. On the other hand, if we use for k2 a value which is just within one order of magnitude away from the optimum, the Gauss-Newton method fails to converge. For example if k (0) =[I, 2, I] 1 is used, the method converges within 3 iterations. If however, k (0) =[l, 8, 1]T or k (0) =[l, 10, 1]T is used, the Gauss-Newton method fails to converge. The actual shape of the region of convergence can be fairly irregular. For example if we use k(0)=[l, 14, 1]T or k (0) =[l, 15, if the Gauss-Newton method converges within 8 iterations for both cases. But again, when k (0) =[l, 16, 1]T is used, the Gauss-Newton method fails to converge.
Table 4.7 Parameter Estimates at Each Iteration of the Gauss-Newton Method for Numerical Example-Iwith Initial Guess [1, 1, 1]
Iteration
LS Objective function 0 41.6817 1 1 .26470 2 0.03751 3 0.00824387 4 0.00824387 Standard Error (%)
k, 1
0.08265 0.08249 0.08243 0.08241 15.02
k2 1
1.183 1.165 1.135 1.133 27.17
k3 1
1.666 2.198 2.338 2.344 12.64
Table 4.8 Parameter Estimates at Each Iteration of the Gauss-Newton Method for Numerical Example I with Initial Guess [100000, I, 1]
Iteration LS Objective function l.SOxlO 9 1.26470 0.03751 0.00824387 0.00824387
k,
100000 0.08265 0.08249 0.08243 0.08241
k3 1
1.666
0 1 2 o j 4
66
Chapter 4
4.4.2
Numerical Example 2
Starting with the initial guess k(0)=[100, -200, -1] the Gauss-Newton method
converged to the optimal parameter estimates given in Table 4.9 in 12 iterations. The number of iterations depends heavily on the chosen initial guess. If for example we use k(0)=[1000, -200, -0.2] as initial guess, the Gauss-Newton method converges to the optimum within 3 iterations as shown in Table 4.10. At the bottom of Table 4.10 we also report the standard error (%) in the parameter estimates. As expected the uncertainty is quite high since we are estimating 3 parameters from
only 6 data points and the structure of the model naturally leads to a high correlation between k2 and k3.
Iteration 0 1 2
3
O
4 5 6 7 8 9 10 11 12
LS Objective Function 9.003xl0 8 1.514xl0 7 l.SOlxlO 7 8.722x1 04 2.47 IxlO 4 1.392xl0 4 1.346xl0 4 1.340xl0 4 1.339xl0 4 1.339xl0 4 1.339xl0 4 1.339xl0 4 1.339xl0 4
k. 100
443.6 445.1 457.9 494.1 508.2 528.9 518.7 524.8 522.5 523.6 523.2 523.3
k2
-200 -32.98 -36.00 -62.79 -127.0 -140.0 -164.1 -151.7 -158.8 -156.1 -157.3 -156.8 -156.9
k3 -1
-0.9692 -0.7660 -0.4572 -0.1751 -0.2253 -0.1897 -0.2041 -0.1977 -0.2005 -0.1993 -0.1998 -0.1997
Table 4.10 Parameter Estimates at Each Iteration of the Gauss-Newton Method for Numerical Example 2. Initial Guess [10s, -200, -0.2]
Iteration
LS Objective Function 0 1.826xl0 7 1 1.339xl0 4 2 1.339xl0 4 "i j 1.339xl0 4 Standard Error (%)
k. 1000
523.4 523.1 523.3 30.4
k2 -200
-157.1