0% found this document useful (0 votes)
20 views55 pages

The Symmetrical Fitting Method For Model Identification

This report details the Symmetrical Fitting method for model optimization and parameter estimation, implemented in R and designed to minimize overfitting. The method allows for a wide range of model types, including linear and nonlinear functions, and includes various examples to demonstrate its application. The report also discusses the derivation and formalization of the method, emphasizing its advantages over classical least squares minimization.

Uploaded by

Hugo Hernandez
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views55 pages

The Symmetrical Fitting Method For Model Identification

This report details the Symmetrical Fitting method for model optimization and parameter estimation, implemented in R and designed to minimize overfitting. The method allows for a wide range of model types, including linear and nonlinear functions, and includes various examples to demonstrate its application. The report also discusses the derivation and formalization of the method, emphasizing its advantages over classical least squares minimization.

Uploaded by

Hugo Hernandez
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Vol.

10, 2025-06

The Symmetrical Fitting Method for Model Identification

Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
[email protected]

doi: 10.13140/RG.2.2.31220.26244

Abstract
In this report, the Symmetrical Fitting method for model
optimization and parameter estimation is described in detail.
The algorithm has been implemented in R language and is
freely available (doi: 10.13140/RG.2.2.30381.40160). The
algorithm comprises various functions for estimating model
parameters while minimizing the risk of over-fitting. The main
R function for executing the symmetrical fitting method
(sm.fit) works for a wide range of single-output models,
including single-input and multiple-input variables, as well as
linear and nonlinear functions of the input variables and
model parameters. In the case of models with nonlinear
functions of the parameters, the CheMO multi-algorithm
numerical optimization method (doi: 10.13140/RG.2.2.29472.90887) is employed. Different step-
by-step examples are included to illustrate the usage of the R functions included in the toolbox.

Keywords
Correlation, Error, Model Identification, Optimization, Over-fitting, Parameter Estimation,
Parsimony, Probability Distribution, R, Randomistics, Relevance, Symmetrical Model

1. Introduction
The idea of symmetrical fitting of model parameters was introduced in a previous report [1]
with the purpose of avoiding overfitting a model when the focus is placed on minimizing the
error on a single variable (the response variable). It was shown that classical least squares
minimization (or regression in general) is not symmetric and may yield inconsistent sets of
model parameters when the roles of response and input variable are interchanged.

Cite as: Hernandez, H. (2025). The Symmetrical Fitting Method for Model Identification. ForsChem
Research Reports, 10, 2025-06, 1 - 55. doi: 10.13140/RG.2.2.31220.26244. Publication Date: 08/04/2025.
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Symmetrical fitting was also used to determine the relevance of a difference or effect in simple
linear models, working as an effective, practical substitute of statistical significance [2].
Symmetrical fitting was also found to be an efficient strategy for optimizing the structure of
mathematical models, even when they involve heteroscedastic model residuals [3].
During the development of this method, various improvements have been made to the
algorithm originally published. For that reason, the purpose of this report is to present the
rigorous derivation, formalization and robust implementation of those improvements in the
symmetrical fitting method for model identification.
Of course, several examples are included to illustrate the use and performance of the
algorithm.

2. Derivation of the Method


The Symmetrical Fitting method is based on a general standardized representation of
mathematical models. This representation transforms all variables into dimensionless variables,
guarantees that the model is always unbiased, and allows the direct comparison between
different model coefficients.

2.1. General Standardized Model


A mathematical model structure suitable for parameter identification using symmetrical fitting
must satisfy the following conditions:
 The model is expressed as a linear combination of (type I) standard transformations [4]
of variables or groups of variables (denoted by ).
 The model must be expressed explicitly in terms of the overall residual error ( ).
 The overall residual error is unbiased ( ( ) )§.

The overall residual error can have any distribution of values (normal or non-normal) and can
be either homoscedastic or heteroscedastic [2,3].
The most general representation of a symmetrical model involving different functions
(terms) of a set of variables and a set of model parameters is:

∑ ( )

(2.1)
where ( )
is a standard transformation of ( )
:

§
Where ( ) represents the expected value operator.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (2 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

( ) ( )
( )
( )

(2.2)

( is any arbitrary nonlinear function of the set of observed variables , which also depends
)
on the optional set of model parameters , ( )
is the mean of ( ), and ( ) is the
standard deviation of ( ).

represents the standard deviation of the model error, and is the standard random
distribution of the model error.
The standard transformation ( )
has the following properties:

( ( )) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( )

(2.3)
( ( )) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( )

(2.4)
where ( ( )) ( )
is the variance operator applied to ( ).

The terms represent model coefficients. In symmetrical fitting, these terms must satisfy
certain conditions that will be derived in the following sections. For the moment, they will be
considered as arbitrary real numbers ( ).
Note that Eq. (2.1) describes the behavior of the model residual error ( ), which is a
dimensionless random variable with the following properties:

( ) (∑ )
∑ ( )
( ) ( )

(2.5)
(consistent with ( ) ), and

( ) (∑ )
∑∑ ( )
( ) ( ) ( )

(2.6)
where ( ) represents the covariance operator:

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (3 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

( ( ) ( )
) ( ( ) ( )
) ( ( )
) ( ( )
) ( ( ) ( )
)

( ( ) ( )
) ( ( ) ( )
)
( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( )

( ( ) ( ))

(2.7)
( ( ) ( ))
is the linear correlation coefficient between functions ( ) and ( ).

Now, since ( ) , we may conclude that:

∑∑ ( ( ) ( ))

(2.8)
The goal of model identification is minimizing the residual error, so we would like to minimize
. However, this would lead to trivial results when all . In such a scenario, the resulting
model does not involve any variable and thus, is completely useless for modeling purposes. For
this reason, the first constraint on is that at least one of the coefficients should be different
from zero.
The function term whose coefficient is necessarily different from zero will be denoted as the
function of interest (or output or response variable) and will be represented by the variable .
Furthermore, we will arbitrarily assign to the corresponding coefficient a value of . Thus, if we
assume that function ( ) is the function of interest, then we set:

( )
(2.9)

(2.10)
and the general standardized model becomes (from Eq. 2.1, 2.9 and 2.10):

∑ ( ) ( )

(2.11)
with
( ) ( )

( )

(2.12)
where the remaining values are obtained by solving an optimization problem. ( )
represents the standard error determined using as function of interest.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (4 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Let us now consider some specific situations, in ascending order of complexity.

2.2. Constant Models


The simplest model is obtained when , corresponding to a constant, unbiased model and
involving only the function of interest. The general standardized model (Eq. 2.11) becomes:

( )
(2.13)

Now, since ( ) ( ) , we may conclude that ( ) , and therefore, ( ) .


So, the constant unbiased model becomes:

(2.14)
or equivalently,

(2.15)
Eq. (2.15) is the randomistic [5] representation of the constant unbiased model.
In this case, the standard distribution of model residuals belongs to the same family as the
probability distribution of the function of interest.

In addition, since ( ) is already the minimum variance of the model error, no further
optimization is needed. In fact, there are no decision variables ( ) available to perform the
optimization.

2.3. Simple Linear Models


The following case is obtained when (simple model), with no additional model
parameters (linear model of parameters). The simple linear model obtained can be
represented in general as follows (from Eq. 2.1):

( ) ( )

(2.16)
Here, we have two potential functions of interest. If we assume that ( ) is the function of
interest, then and:

( ) ( ) ( ( ))

(2.17)
In this case,

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (5 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

( ( )) ( ) ( ( ) ( ))
( ) ( )

(2.18)
The minimization of ( ( ))
yields the following optimal coefficient:

( ( )) ( ( ) ( ))

(2.19)
This minimization corresponds to least squares minimization. The superscript indicates the
least-squares minimization optimum.
However, if we divide Eq. (2.17) by , the following expression is obtained:
( ( ))
( ) ( ) ( ( ))

(2.20)
corresponding to the model obtained when ( ) is the function of interest.

The error variance is:


( ( ) ( ))
( ( )) ( )
( ) ( )

(2.21)
with an optimal coefficient:

( ( ))
( ( ) ( ))

(2.22)
The results obtained in Eq. (2.19) and (2.22) are clearly different, indicating the lack of symmetry
of least squares minimization.

A symmetrical solution is obtained when ( ( )) ( ( ))


, that is, when:

( ( ) ( ))
( ( ) ( ))

(2.23)
with the following possible real results [1]:

(2.24)
Only one of those solutions represents a minimum error, corresponding to:

( ( ) ( )) ( ( ))

√ ( ( ) ( ))
√ ( ( ))

(2.25)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (6 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

The resulting error variance is:

( ( )) ( ( ))
( | ( ( ))
|) ( | (
|)
( ) ( ))

(2.26)
And finally, the symmetrical simple linear model becomes (considering ( ) as the variable
of interest):
( )
( )
√ ( | ( ) |)
√ ( )

(2.27)
Or equivalently,

( ) ( )
( ) ( ) √ ( | ( ) |)
( )
√ ( )
( )
√ ( ))
( ) (
(2.28)
Eq. (2.28) shows the simple linear model obtained by symmetrical fitting, expressed in terms of
the optimal slope coefficient ( ( ) ) found by least-squares minimization for the standardized
model.

2.4. Simple Nonlinear Model


A simple nonlinear model also considers but now, a set of additional unknown model
parameters ( ) is considered by the functions. The simple nonlinear model can be represented
in general by:

( ) ( )

(2.29)
where ( ) is the function of interest for the model.

Coefficient is obtained similarly as in the previous case, that is, according to Eq.
(2.25). The difference here is that the optimization problem for identifying the additional
parameters ( ) becomes:

(2.30)
which is equivalent to (from Eq. 2.30 and 2.26):

| ( ( ) ( ))
|

(2.31)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (7 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Depending on the nonlinear nature of functions ( ) and ( ) , the optimization problem


might require necessarily a numerical solution. Note that nonlinearity here refers to the model
parameters ( ) and not to the original observed variables ( ).

2.5. Multiple Linear Models


In multiple linear models we have , and no additional model parameters. The general
standardized expression is:

( )
∑ ( )

(2.32)
Eq. (2.32) can be alternatively expressed as a single nonlinear** model as follows (considering
( ) ):

( )

(2.33)
where

( ) ∑ ( )

(2.34)
∑ ( )
( )
√∑ ∑ ( ( ) ( ))

(2.35)
and
( ( )
)

√ ( ( ))

(2.36)

( ) is a new function containing all terms except the function of interest for the model,
is a set of additional model parameters emerging from this transformation, and is the
symmetrical coefficient of the resulting single nonlinear model.
Now, the original coefficients of the multiple-linear model are related to those of the single
linear model as follows:

**
Strictly, the model obtained is linear, but the parameters are treated as in the case of simple nonlinear
models, that is, by solving an optimization problem with the additional parameters as decision variables.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (8 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

√∑ ∑ ( ( ) ( ))

(2.37)
And the model parameters are obtained by solving the following minimization problem:

(2.38)
which is equivalent to:

| ( )
|
( )

(2.39)
Since the solution to this optimization problem is the result of least squares minimization, we
may conclude that:

(2.40)
and therefore (from Eq. 2.36, 2.37 and 2.40):

( ( )
)

√∑ ∑ ( ( ) ( )) ( ( ))

(2.41)
The least-squares optimal coefficient values can be determined analytically using the following
expression [3]:

(2.42)
where

( ( ( ) ( ))
( ) ( ))

( ( ( ) ( ))
( ) ( ))

[ ( ( ) ( )) ( ( ) ( )) ]
(2.43)
( ( ))

( ( ))

[ ( ( )) ]
(2.44)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (9 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

2.6. Multiple Nonlinear Models


The most general situation is obtained with multiple nonlinear models, where and the
function terms depend on a set of additional unknown model parameters ( ):

( )
∑ ( )

(2.45)
Proceeding similarly as in the previous case, the symmetrical coefficients obtained are:

( )
( ( ) )
( ( ))

∑ ∑ ( ) ( ) ( ( ))
√ ) (
( ( ) )
( ( ))

(2.46)
where

(
∑ ( )
( )) ( )

(2.47)
( ) ( ) ( )
(2.48)
( ( ( ) ( ))
( ) ( ))

( ( ( ) ( ))
( ) ( ))
( )

[ ( ( ) ( )) ( ( ) ( )) ]
(2.49)
( ( ) ( ))

( ( ) ( ))

[ ( ( ) ( )) ]
(2.50)
and the optimal set of additional model parameters is obtained by solving the following
optimization problem:

(2.51)
or equivalently:

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (10 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

| |
( ( ) )
( ( ))

(2.52)
where both problems are constrained by Eq. (2.48).
Once the optimal coefficients and parameters of the general standardized model have been
obtained, the function of interest can be transformed back into its de-standardized form as
follows:

( ) ∑ ( ) ( )

(2.53)
where

( )
∑ ( )

(2.54)
( )

( )

(2.55)
( ) ( )

(2.56)

√ ( | |)
( ( ) )
( ( ))

(2.57)

2.7. Parameter Estimation and Estimation Error


The general model obtained by symmetrical fitting (Eq. 2.53 to 2.57) is expressed in terms of
certain population parameters ( ) obtained from the functions of . Unfortunately, this
information is not available, and instead only a sample of observations ( ) can be used.
The estimations of those population parameters obtained from a data sample are denoted as
statistics, and will be denoted by the circumflex or hat symbol ( ̂ ). Let us then consider the
following estimations of those population parameters obtained from a data sample of size :

̂ ( )
∑ ( )

(2.58)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (11 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

̂ ( )
√ ∑( ( ) ̂ ( )
)

(2.59)
̂ ( ) ( )
̂ ( )
̂ ( )
̂( ))
( )
( ) ( ̂ ( )
̂ ( )

(2.60)
Then, the general model obtained by symmetrical fitting using a sample of observations
becomes:

( )
̂ ∑ ̂ ( ) ̂ ( )

(2.61)
where

̂ ̂ ∑ ̂ ̂
( ) ( )

(2.62)
̂
̂ ̂
( )

̂ ( )

(2.63)
̂ ( ) ̂
( ( ) )
( ̂ ( ))
̂
∑ ∑ ̂ ( )̂ ( ) ̂( ( ))
̂
√ ) (
( ( ) )
( ̂
( ))

(2.64)
̂ ( )
̂ ( )
̂ ( )
(2.65)
̂( ̂( ))
( ) ( )) ( ) (

̂( ̂(
̂ ( ) ( ) ( )) ( ) ( ))

[ ̂( ( ) ( ))
̂( ( ) ( )) ]
(2.66)
̂( ( ) ( ))
̂(
̂ ( ) ( ))

[ ̂( ( ) ( )) ]
(2.67)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (12 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

∑ ̂ ( )
̂
( ̂ ( )) ( )

(2.68)
( ) ̂ ( )
̂
( )
̂ ( )

(2.69)
̂ ( )
̂ ( )
̂ ̂
(2.70)
Unfortunately, the use of sample statistics to estimate population parameters always introduce
error, including uncertainty and eventually also bias. In addition, such error may also depend on
the probability distribution of the observed variables ( ), and the nonlinear functions ( ). In
addition, since nonlinear operations involving the sample statistics are present in this method,
the evaluation of bias and uncertainty analytically is a complex task. Monte Carlo simulation
methods [6] assuming a specific probability distribution of the observed variables can be used
to estimate bias and uncertainty of the model parameters obtained by symmetrical fitting.
The overall effect of bias and uncertainty in parameter estimation, along with the error
introduced during the measurement of the observed variables, is propagated through the
model resulting in an increased model error. Thus, the estimated overall residual error
becomes:

̂ √ ̂ ̂

(2.71)
where represents the error due to lack-of-fit of the model, given by:

( | |)
( ( ) )
( ( ))

(2.72)
̂ represents the estimation uncertainty propagated through the standardized model due to
sampling, and ̂ is the estimation uncertainty propagated through the standardized model
due to experimental errors (including measurement errors).
Unfortunately, is unknown since the true value is needed (the true
( ( ) )
( ( ))

( ) coefficients must be accurately known, and the data must be free of experimental error).

The sampling uncertainty ̂ can be considered inversely proportional to the degrees of


freedom of the model ( ), which is the number of observations minus the total number of
parameters estimated in the model. As the degrees of freedom in the model decrease,

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (13 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

sampling error eventually becomes the dominant error term. Similarly, as the degrees of
freedom in the model increase, the sampling error term becomes less important. The exact
analytical expression will depend on the nature of the function terms ( ), and on the
distribution of experimental values ( ).

Finally, ̂ can be determined as follows:


̂ ̂
( ) ( )
̂ ( )( ∑ ̂ ( ) )
̂ ( )
̂ ( )

(2.73)

where ̂ ( )
represents the experimental error in the determination of ( ). Also note that
̂ ̂ since experimental error is also included in the variability of the data.
( ) ( )

2.8. Over-fitting, Relevance and Parsimony


Since the goal of parameter identification is minimizing the error of the model, we need to find
a suitable way to estimate from the available data. We might then assume that all error
terms (lack of fit, sampling and experimental error) are included by considering:

̂̂ ∑ ( ̂ )
√ ( )
̂ ∑ ̂
( )
̂ √∑ ( ( ) ( ))
̂ ( )
̂ ( ) ̂ ( ) √

(2.74)
where

(2.75)
and represents the number of additional parameters ( ) included in the model.

From the three sources of error, only experimental error ( ̂ ) can be easily determined from
the data sample. Then, we would expect:

̂ ̂
(2.76)
where ̂ is determined from Eq. (2.74) and ̂ from Eq. (2.73).

If Eq. (2.76) is not valid, that is, when ̂ ̂ , Eq. (2.71) becomes inconsistent (since
imaginary uncertainties should be present). This situation occurs when the model has been
over-fitted. Over-fitting can be avoided by introducing Eq. (2.76) as a constraint in the model
error minimization problem.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (14 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

As long as constraint Eq. (2.76) is satisfied, the model error can be minimized by maximizing the
model fit but also by maximizing the degrees of freedom of the model. Ideally, we might
maximize the degrees of freedom by increasing the number of observations until sampling
error becomes negligible. However, when the number of observations is fixed, the degrees of
freedom can only be increased by removing parameters from the model.
Two situations are possible: 1) The overall error decreases due to the increase in degrees of
freedom, or 2) the overall error increases due to the lack of fit of the simplified model. In the
first case, the parameter removed from the model was irrelevant, while for the second case the
parameter was relevant.
Thus, we may define a relevant parameter as any parameter whose presence in the model
allows decreasing the overall error. Then, by removing irrelevant parameters from the model,
the model performance will improve. This effect can be considered as a practical result of the
principle of Parsimony.
One important advantage of symmetrical fitting is that the model term coefficients are
dimensionless and directly comparable. Thus, we can evaluate first the relevance of the term
with less contribution to the model by choosing the term with minimum absolute value of .
In this procedure, two models are obtained, one with the selected term and a second without
the selected term. If the second model achieves a lower value of ̂ (according to Eq. 2.74),
then the term can be considered irrelevant and can be safely removed from the model (the
corresponding is set to zero). When all remaining terms are relevant, we can now proceed
to check the relevance of additional model parameters.
The order for evaluating the additional model parameters is rather arbitrary. For each
additional model parameter, a reference value is selected (not necessarily zero). The reference
value can be determined as the closest integer, or in terms of special numbers (e.g., ), or it can
be obtained theoretically, or simply defined by aesthetical considerations. Then, the residual
error ( ̂ ) is compared between the original model and the model obtained using the reference
value. Notice that by arbitrarily setting a value for the additional model parameter, it is no
longer estimated from the data and thus, it cannot be considered in . That is, the degrees of
freedom increase by one when the parameter is arbitrarily assigned instead of fitted from the
data. If the model error decreases, then the reference value can be incorporated in the model,
otherwise, it must be fitted from the data. The procedure is then repeated for all other
additional model parameters.
As a simple but illustrative example, let us consider the estimation of the simple linear model
( ):
( )
̂ ̂ ( ) ̂ ̂ ( )
̂
( )

(2.77)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (15 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Assuming large samples with negligible experimental error, the overall error residual error ̂
can be estimated as (from Eq. 2.71):

̂ ( ) ( | ̂( ( ) ( ))
|)
(2.78)
On the other hand, if the term ( ) is removed from the model, we obtain the constant model:

( )
̂ ̂ ̂ ( )
̂
( )

(2.79)
with (see Section 2.2)
̂ ( )
(2.80)
So, we can conclude that the term ( ) is relevant when ̂ ( ) ̂ ( ), or equivalently,
when:

| ̂( ( ) ( ))
|
(2.81)
This limiting correlation coefficient value can be used as an alternative, heuristic rule of thumb
for approximately determining the relevance of each model term.

2.9. Modeling the Residual Error


A final step of the modeling procedure is the identification of a suitable random model for
describing the probability distribution of residuals. The term random model should not to be
confused with random effect models or random coefficients models. Particularly, we will
choose the best model describing the residuals according to the best goodness-of-fit in a
selection of general models. The general models considered are: Normal, uniform, exponential,
and log-normal. Additional distribution models might be considered, but these are probably the
most representative. Note that the best of these four distribution models is not necessarily a
suitable distribution model to describe the residuals.
The evaluation of the goodness-of-fit of a random model must be done comparing the model
with experimental results. Different performance metrics have been previously proposed for
evaluating residual model fitness [7]. However, we will consider only the sum of squared
differences in cumulative probability: [8]

(2.82)
where

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (16 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

( ̂ ) ( ̂ )

( ̂ )

( ̂ ) ( ̂ )
{
(2.83)

represents the cumulative probability function of the residuals, ̂ are the individual residual
values, is the ascending rank of the residual value ̂ , and n is the total number of residuals.
This metric can also be expressed in terms of the random coefficient of determination:

(2.84)
where

∑( )

(2.85)
Thus, the coefficient of determination of the random model becomes:

( )∑

(2.86)

Both models (deterministic and random models) are important components of the randomistic
model describing the experimental observations. For his reason, the randomistic goodness-of-
fit can be expressed as follows [8,9]:

( )
(2.87)

3. Algorithm Implementation
The symmetrical fitting method for model identification has been implemented in R language
(v.4.2.1). R (https://www.r-project.org/) is a free software for statistical computing and
graphics. The symmetrical fitting algorithm employs different R functions, all of which have
been assembled in a single R file (smtools.R) and is freely available (doi:
10.13140/RG.2.2.30381.40160).
A simplified flow diagram providing an overview of the algorithm is presented in Figure 1.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (17 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Figure 1. Simplified flow diagram of the symmetrical fitting algorithm.

A brief description of each function included in smtools is presented next.

3.1. sm.fit (Symmetrical model fitting function)


The symmetrical fitting procedure is implemented in R language as the function sm.fit.
Usage:
smout=sm.fit(y,x,terms,param0,lower,upper,config,display,maxit,Uexp,heur,cr,ptol,
plots)

Input arguments:

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (18 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

 y: Vector containing the observed values of the response variable to be used in the
identification procedure.
 x: Vector or matrix of observed values of the input variables to be used in the
identification procedure. For matrices, each input variable must be represented by a
single column. The number of rows of x must be identical to the number of elements in
the response variable y.
 terms: Optional vector of function names (using quotation marks "") representing the
different terms considered in the model. If omitted, each input variable in x is
considered as a different term.
Term functions must follow the following structure:
termfn<-function(x,param) {...}
All term functions considered must share the same arguments x and param.
 param0: Optional vector of initial values for the additional parameters used by term
functions. The vector length must correspond to the length and order of the param
argument used by the different term functions.
These initial values will be used as reference values for evaluating the relevance of the
parameters. This input can be omitted only when no additional parameters are used.
 lower: Optional vector of lower bounds for the additional parameters to be used in the
optimization. If omitted, they are set by default to -Inf.
 upper: Optional vector of upper bounds for the additional parameters to be used in the
optimization. If omitted, they are set by default to Inf.
 config: Optional list describing the configuration of the CheMO optimization method
[10]. By default, only one Queen (maoptim [11]) is used.
 display: Optional text indicating the type of display to be used by the CheMO method:
No display ('none'), display after each iteration ('iter'), or display of final results ('final').
The 'iter' display option allows following the results of the optimization procedure in
real time.
 maxit: Optional value of the maximum number of iterations to be performed by the
CheMO method. By default it is set to . Early stop of the optimization procedure, with
unsatisfactory results, may occur when maxit is low. Depending on the complexity of
the optimization problem, larger maxit values might be needed. Use display='iter' to
observe the evolution of the optimization procedure and decide if maxit should be
increased or not.
 Uexp: Optional vector of experimental standard error values for the response variable
(first element) and each model term (given in the same order as terms). If only a single
uncertainty value is given, it is assumed that it corresponds to the uncertainty in the
determination of the response variable whereas the uncertainties of model terms are
set to zero. If no value is given, all uncertainties are assumed to be zero. This also
implies that potential over-fitting cannot be evaluated by the algorithm. The

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (19 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

uncertainty values are propagated through the fitted model, and this result is employed
as a constraint in the optimization procedure. The uncertainty propagated through the
model is also employed to determine the fitness coefficient of the model [12]. The
experimental error values should include at least measurement error due to truncation
or instrument resolution.
 heur: Optional logical value indicating if a heuristic over-fit control is used or not. The
heuristic over-fit control removes terms having | | values less than . By default, it is
set to FALSE (no heuristic over-fit control).
 cr: Optional value or vector of the desired resolution (by truncation/rounding) for the
bias and each model term coefficient. By default it is set to zero (no rounding). If a
single value is given, the resolution will be the same for all coefficients (including bias).
Otherwise, the first value will represent the bias resolution, followed by the model term
coefficients (in the same order given by terms or by x).
 ptol: Optional value or vector of the desired resolution (by truncation/rounding) for
the additional parameters. By default it is set to zero (no rounding). If a single value is
given, the resolution will be the same for all additional parameters. Otherwise, they will
be assigned to each parameter in the same order given by param0.
 plots: Optional logical value indicating if the results are plotted or not. By default it is
set to FALSE (no plots).
The output (smout) is a list containing the following information:
 bias: Estimated value of the bias correction coefficient ( ) for the optimized model. It
is usually known as intercept, or as independent term.
 coeff: Data frame showing the names of the model terms, estimated optimal
coefficient values ( ), estimated symmetrical model coefficients ( ), and a logical
variable indicating if the term was included or not in the optimal model structure.
 par: Vector with the estimated optimal values of additional parameters. If no additional
parameters are considered the output is NULL.
 partype: Vector with categorical types of additional parameters: 'param' for calculated
parameter, or 'const' for reference constant value. If no additional parameters are
considered the output is NULL.
 dof: Degrees of freedom of the final model obtained (total number of observations of
the response variable minus the total number of parameters and coefficients estimated
from the data).
 model_performance: Vector of model performance results, including: Standard error
(s), R2 coefficient (R2), Experimental uncertainty (UE), and Fitness coefficient (CF).

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (20 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

 residual_model: Data frame indicating the best distribution model for the residuals,
with the corresponding random and randomistic R2 coefficients, Normality value
(Nvalue) [13], and Scedasticity value (Hvalue) [14] ††.
 ypred: Vector containing the values of the response variable predicted by the optimal
model for the set of input variables x.
 res: Vector containing the response variable residuals for the optimal model obtained.

If the problem does not involve additional parameters (other than the coefficients of each
term) then the optimization problem is solved analytically. When additional parameters are
involved, a numerical optimization is performed using param0 as initial estimations. By default,
the optimization is performed using the CheMO optimization method with a single “Queen”
(multi-objective optimization) and a maximum of iterations, without displaying the
optimization results. These options (config, display, and maxit) can be modified by the
user as input arguments. If for any reason, the CheMO function is not available, the optim
function using the Nelder-Mead (NM) method is employed‡‡. When the numerical optimization is
performed, the execution time of the algorithm increases compared to that of analytical
solutions. Use display='iter' to monitor the evolution of the optimization.
In addition to CheMO, the sm.fit function also requires the following additional functions
(included in smtools): sm.gof, N.norm.test, and H.sked.test.
If the plots option is set to TRUE, a graphical representation of the results obtained with the
optimized model is presented. This includes the following plots:
 Scatterplot of predicted vs. observed response variable.
 Scatterplot of predicted and observed response variable vs. each input variable and/or
model terms.
 Scatterplot of model residuals vs. observed response variable.
 Histogram of residual errors compared to best distribution model.
 Scatterplot of cumulative relative frequency and cumulative probability vs. model
residuals.
 Q-Q plot of residuals considering the best distribution model.
 P-P plot of residuals considering the best distribution model.

††
The residual model performance, and the -values and -values are only illustrative as they have no
effect on the validity of the symmetrical model obtained.
‡‡
Nelder-Mead is the default method used by the optim function. It is also equivalent to a CheMO
optimization considering a single “Knight”.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (21 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

3.2. sm.fn (Symmetrical model interpolation function)


The sm.fn function is a complementary function included in smtools that uses the output of
sm.fit to obtain model predictions or interpolations§§.
Usage:
ypred=sm.fn(x,smout)
Input arguments:
 x: Vector or matrix of values of the input variables to be used for predicting the
response variable. For matrices, each input variable must be represented as a column.
The structure of x must be consistent with the structure of the matrix of input variables
used in sm.fit (also denoted by x), or otherwise unexpected errors or erroneous results
will be obtained.
 smout: Output obtained from the sm.fit function.

Output:
 ypred: Vector containing the values of the response variable predicted by the optimal
model for the set of input values x.

3.3. sm.plot (Symmetrical model plot function)


sm.plot is another complementary function included in smtools that uses the output of sm.fit
to plot graphical results. This function can be used when the plots option in sm.fit was not set
to TRUE, or to plot interpolation curves when the model uses only a single input variable***.
Usage:
ypred=sm.plot(smout,x,xlim,xname,yname,steps)
Input arguments:
 smout: Output obtained from the sm.fit function.
 x: Optional vector of experimental observations of the input variable (only when a
single input variable is considered). It must be consistent with the set of experimental
observations used by sm.fit.
 xlim: Optional vector containing the minimum and maximum values of the input
variable (only when a single input variable is considered). If xlim is not specified, the
limits can be obtained from the x vector.
 xname: Optional name of the input variable to be used in the plots (only when a single
input variable is considered).

§§
While the function also allows extrapolation, it is highly advisable to avoid extrapolating results from a
fitted model.
***
Warning: Trying to interpolate models with more than one input variable using sm.plot may lead to
unexpected errors or erroneous results.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (22 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

 yname: Optional name of the response variable to be used in the plots


 steps: Optional number of steps considered in the range of values of the input variable
(only when a single input variable is considered). By default, steps are considered.
Graphical output:
 Scatterplot of model residuals vs. observed response variable.
 Scatterplot of predicted response variable values vs. observed response variable
values.
 Interpolation curve of predicted response variable values vs. input variable in the range
xlim (to be plotted only when a single input variable is considered). It also overlaps a
scatterplot of the experimental observations when x is specified.

3.4. sm.gof (Randomistic goodness-of-fit evaluation of symmetrical models)


This function (sm.gof) evaluates the goodness-of-fit of a randomistic model. sm.gof is a
function employed by sm.fit to identify the best distribution model for the residuals. It can also
be used with the output of sm.fit to test other distribution models with the option to plot the
results.
Usage:
out=sm.gof (smout,rdist,plots)
Input arguments:
 smout: Output obtained from the sm.fit function.
 rdist: Name of the random probability distribution function used to be tested against
the model residuals (included in smout). Included in this code are the "Normal"
(default), "Uniform", "Exponential", and "Lognormal" distributions.
 plots: Optional logical value indicating if the random model performance results are
plotted or not. By default, it is set to FALSE (no plots).
Output:
 distribution: Name of the random probability distribution function tested.
 deterministicR2: Value of the determination coefficient for the deterministic model
(obtained by symmetrical fitting).
 randomR2: Value of the determination coefficient for the random distribution model
tested.
 randomisticR2: Value of the determination coefficient obtained for the overall
randomistic model (including both the deterministic and random models).
 bias: Bias of the model residuals used to test the distribution function.
 serror: Standard error of the model residuals used to test the distribution function.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (23 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Graphical output:
 Histogram of residual errors compared to the selected distribution function.
 Scatterplot of cumulative relative frequency and cumulative probability vs. model
residuals for the selected distribution function.
 Q-Q plot of residuals considering the selected distribution function.
 P-P plot of residuals considering the selected distribution function.

3.5. r.test (Relevance test for the difference between samples)


The r.test function is a complementary function included in smtools that uses symmetrical
modeling to evaluate the relevance of the difference between two samples. For evaluating the
significance in the difference between samples, optimal significance tests can be employed
[15].
Usage:
r.test(x1,x2,plot)
Input arguments:
 x1: Vector containing the data of the sample (or first sample) to be tested.
 x2: This argument can be used to input the vector containing the data of the second
sample, or a constant reference value.
 plot: Boolean argument indicating if the graphical output is plotted or not

Output:
 r: Relevance value (r-value) corresponding to the absolute linear correlation coefficient
between the data set and a binary variable representing each group.
 relevant: Boolean variable indicating if the sample difference is relevant or not. The
difference is considered relevant when r> .
 sample.diff: Average difference observed between the samples.
 model.diff: Estimated difference in mean values obtained using a symmetrical model.

Graphical output:
 Scatterplot of observations grouped according to each sample, compared to the linear
model obtained by symmetrical fitting.

3.6. CheMO (Chess-inspired Multi-algorithm Optimization Method)


The CheMO method is a numerical global optimization method that employs different search
algorithms to find the optimum, including maoptim and OAToptim. The CheMO method was
introduced and described in detail in a previous report [10].

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (24 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Usage:
CheMO(par,fn,gr,config,lower,upper,control,hessian,adaptboard,display)
Input arguments:
 par: Initial values for the parameters to be optimized.
 fn: An R function to be minimized (or maximized), with first argument the vector of
parameters over which minimization is to take place. It should return a scalar result.
 gr: Optional function used to return the gradient for the "BFGS" and "L-BFGS-B"
methods. If it is NULL, a finite-difference approximation will be used.
 config: A list containing the number of chess pieces considered in the optimization: P
(Pawns), B (Bishops), N (Knights), R (Rooks) and Q (Queens). The default value for
each type is zero. If all pieces are set to zero, the default chess configuration is used
(P=8,B=2,N=2,R=2,Q=1).
 lower, upper: Bounds on the variables for the "L-BFGS-B" or "OAT" methods.
 control: Optional list of control parameters, including:
o maxit: Maximum number of iterations for each optimizer and for the
optimization cycle. By default, maxit=10.
o fnscale: Scale constant for the objective function. Negative values are used
for maximization, positive for minimization. By default, fnscale=1.
o step0: Vector representing the initial search step-size for each decision
variable. Used for OAT.
o stepmin: Vector representing the minimum search step-size for each decision
variable. For integer decision variables the minimum search step-size must be 1.
Used for OAT.
 hessian: Optional logical value. Should a numerically differentiated Hessian matrix be
returned?
 adaptboard: Optional logical argument indicating whether the Adaptive Board
strategy (adapting the search region) is used or not.
 display: Indicates which type of display is used. Default: 'none' (Nothing is displayed).
Options: 'iter' displaying results at each iteration, and 'final' displaying only the final
iteration.
Output:
 par: Optimal values found for the decision variables.
 value: Best objective function value found.
 counts: Number of function evaluations performed.
 time: Elapsed computation time in seconds.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (25 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

3.7. maoptim (Multi-algorithm Optimization Method)


The maoptim function performs the Multi-Algorithm Optimization [11] of a function of one or
more variables. It requires the optim function (included in the stats library in R), and OAToptim
(included in smtools).
Usage:
maoptim(par,fn,gr,method,lower,upper,control,hessian)
Input arguments:
 par: Initial values for the parameters to be optimized.
 fn: An R function to be minimized (or maximized), with first argument the vector of
parameters over which minimization is to take place. It should return a scalar result.
 gr: Optional function used to return the gradient for the "BFGS" and "L-BFGS-B"
methods. If it is NULL, a finite-difference approximation will be used.
 method: A character vector containing the sequence of algorithms to be used in the
optimization procedure. The algorithms available for the sequence are all methods
accepted by the optim function ("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
and "Brent"), and "OAT".
 lower, upper: Bounds on the variables for the "L-BFGS-B" or "OAT" methods.
 control: Optional list of control parameters, including (in addition to the optional
arguments of the control argument accepted by optim):
o nsp: Non-negative integer indicating the number of starting points. By default,
nsp =14.
o fnscale: Scale constant for the objective function. Negative values are used
for maximization, positive for minimization. By default, fnscale=1.
o step0: Vector representing the initial search step-size for each decision
variable. Only used by OAT.
o stepmin: Vector representing the minimum search step-size for each decision
variable. For integer decision variables the minimum search step-size must be 1.
Only used by OAT.
o ncycles: Non-negative integer indicating the maximum number of full cycles
to be performed by the optimization algorithm. Only used by OAT.
o tol: Argument indicating the tolerance (or resolution) for the objective
function.
o MCcheck: Non-negative integer indicating the number of Monte Carlo trials
used as a test of local optima. Only used by OAT.
o display: Logical argument used to show the progress of the optimization. By
default, it is set to FALSE. Only used by OAT.
o optmode: Character argument indicating the type of optimization to be
performed: "min" for minimization or "max" for maximization. A minimization

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (26 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

problem is considered by default. It can be used as an alternative to fnscale.


The optmode argument overrides the fnscale argument.
 hessian: Optional logical value. Should a numerically differentiated Hessian matrix be
returned?
Output:
 par: Optimal values found for the decision variables.
 value: Best objective function value found.
 counts: Number of function evaluations performed.
 time: Elapsed computation time in seconds.

3.8. OAToptim (One-at-a-time Adaptive Step-size Optimization Method)


This function performs the OAT optimization method of a function of one or more variables
[16].
Usage:
OAToptim(fun,x0,lower,upper,step0,stepmin,ncycles,tol,MCcheck,display,optmode)

Input arguments:
 fun: R function representing the objective function. It must be a function with a single
argument x. x represents the values of the decision variables and can be a scalar or a
vector.
 x0: Optional argument indicating the starting point for the optimization. By default, it is
randomly chosen within the decision variable bounds.
 lower: Optional vector with lower limits for the decision variables. By default, all lower
bounds are -Inf.
 upper: Optional vector with upper limits for the decision variables. By default, all upper
bounds are Inf.
 step0: Optional vector representing the initial search step-size for each decision
variable.
 stepmin: Optional vector representing the minimum search step-size for each decision
variable. For integer decision variables the minimum search step-size must be 1.
 ncycles: Optional argument indicating the maximum number of full cycles to be
performed by the optimization algorithm.
 tol: Optional argument indicating the tolerance (or resolution) for the objective
function.
 MCcheck: Optional argument indicating the number of Monte Carlo trials used as a test
of local optima.
 display: Optional Boolean argument used to show the progress of the optimization.
By default, it is TRUE.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (27 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

 optmode: Optional argument indicating the type of optimization to be performed: 'min'


for minimization or 'max' for maximization. A minimization problem is considered by
default.
The output is a list containing the following information:
 xopt: Optimal values found for the decision variables.
 Fobj: Best objective function value found.
 nfeval: Number of function evaluations performed.
 ctime: Elapsed computation time in seconds.

3.9. N.norm.test (Optimal Normality Test)


This function evaluates the normality of a variable x, using an approximate Shapiro-Wilk Test of
Normality, using optimal significance levels [13]. The normality of the data is represented by the
-value, which is positive for normal distributions and negative for non-normal distributions.
Usage:
N.norm.test(x,type,maxerror,display)
Input arguments:
 x: Vector containing the dataset to be tested.
 type: Optional argument describing the type of calculation: 'mean' Mean order statistic
(default), 'median' Median order statistic.
 maxerror: Optional argument indicating the maximum test error (as a fraction)
tolerated by the user.
 display: Optional Boolean variable indicating if the test results are displayed or not.
Output:
 W: Approximate Shapiro-Wilk statistic.
 P: P-value of the approximate Shapiro-Wilk Normality Test.
 N: Normality value (N-value) calculated from optimal significance levels.
 eT: Total test error.

3.10. H.sked.test (Scedasticity Test)


This function evaluates the scedasticity of a variable (y) with respect to a set of reference
variables (x) [14]. An -value is calculated, which is positive for homoscedastic data and
negative for heteroscedastic data.
Usage:
H.sked.test(mainlm,ttype,display)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (28 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Input arguments:
 mainlm: Either an object of class "lm" (e.g., generated by lm), a data frame, or a list of
two objects: a response vector (y) and a matrix of reference values (x). These objects
must be given in that order.
 ttype: Optional argument representing the type of test performed. Types available:
General scedastic ("scedastic"), test for homoscedasticity ("homoscedastic"), and test
for heteroscedasticity ("heteroscedastic").
 display: Optional Boolean variable indicating if the test results are displayed or not.
Output:
 ref.var: Name of variable used as reference in the evaluation of scedasticity.
 statistic: R2 test statistic.
 crit.value: Critical value of the test statistic.
 p.value: Probability value of the scedasticity test.
 H.value: Homoscedasticity value (H-value) calculated from optimal significance levels.
 decision: Test decision (homoscedastic, heteroscedastic, or inconclusive).

4. Illustrative Examples
This section includes a collection of models obtained using the symmetrical fitting algorithm
programmed in R, for different case studies. Some of these case studies have already been
considered in previous reports.

4.1. Mechanical Properties of Aluminum Alloys


Let us consider the data set presented in Table 1, containing mechanical information for a
sample of different aluminum alloys [1].
Table 1. Representative mechanical properties for a sample of aluminum alloys
Hardness Tensile Yield Elastic Elongation
Aluminum Alloy Density
(Brinell Strength Strength Modulus at Break
Type (g/cm3)
Scale) (MPa) (MPa) (GPa) (%)
1100 Annealed 2.71 23 90 35 69 35
1100 H12 Temper 2.71 28 110 103 69 12
1100 H14 Temper 2.71 32 124 117 69 20
2024 T3 Temper 2.78 120 483 345 73 18
2024 T4 Temper 2.78 120 469 324 73 19
2024 T6 Temper 2.78 125 427 345 72 5
6061 Annealed 2.70 30 124 55 69 25
6061 T4 Temper 2.70 65 241 145 69 22
6061 T6 Temper 2.70 95 310 276 69 12
6061 T8 Temper 2.70 120 310 276 69 8
7075 Annealed 2.81 60 228 103 72 16
7075 T6 Temper 2.81 150 572 503 72 11
8090 T3 Temper 2.54 91 340 210 77 13

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (29 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

A wide selection of models can be evaluated in this example, considering different response
variables and different transformations of the experimental variables.
First, let us obtain the best multiple-linear model describing the elongation at break in terms of
all other mechanical properties.
Remember first to load the functions in smtools (saved in the current working directory) using:
source("smtools.R")

Then, the following R code can be used to solve this example.


D=c(2.71,2.71,2.71,2.78,2.78,2.78,2.70,2.70,2.70,2.70,2.81,2.81,2.54)
H=c(23,28,32,120,120,125,30,65,95,120,60,150,91)
TS=c(90,110,124,483,469,427,124,241,310,310,228,572,340)
YS=c(35,103,117,345,324,345,55,145,276,276,103,503,210)
EM=c(69,69,69,73,73,72,69,69,69,69,72,72,77)
EB=c(35,12,20,18,19,5,25,22,12,8,16,11,13)
out=sm.fit(EB,data.frame(D,H,TS,YS,EM))
sm.plot(out,yname="Elongation at break (%)")
out

Warning message:
H.sked.test: The sample size is too small (n<15). Conclusions may be unreliable

$bias
[1] 166.7157

$coeff
terms coeff alphaS include
1 D 0.0000000 0.0000000 FALSE
2 H -0.3415063 1.9046756 TRUE
3 TS 0.1772275 -3.5630364 TRUE
4 YS -0.1173737 2.0669945 TRUE
5 EM -2.0987926 0.6619995 TRUE

$par
NULL

$partype
NULL

$dof
[1] 8

$model_performance
s R2 R2adj
5.8840205 0.6322108 0.4483162

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9913432 0.9968161 1.713448 2.970658

$ypred
[1] 25.886777 19.742386 19.214314 17.630048 17.613711 8.096572 27.174493 25.3
93755 12.001310 3.463652 23.430645
[12] 6.711857 9.640479

$res
[1] 9.113222591 -7.742386132 0.785686042 0.369951687 1.386289205 -3.096572
028 -2.174493349 -3.393754703 -0.001310065
[10] 4.536348008 -7.430645245 4.288143440 3.359520550

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (30 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

The warning message is related to the evaluation of the scedasticity of residuals using the
function H.sked.test. Since this value is only informative, we may simply ignore the warning.
Figure 2 shows the graphical output obtained for this model, achieving a coefficient of
determination of .

Figure 2. Multiple linear model of elongation at break obtained by symmetrical fitting

The relative relevance of each model term is determined by the absolute value of ̂ . So, the
most important contribution to elongation at break appears to be the tensile strength,
followed by yield strength and hardness. Density had an irrelevant on the elongation at break.
Using the same data set, let us obtain the best model for describing the natural logarithm of
yield strength in terms of the natural logarithms of all other variables †††. The following code in R
can be used (the data set was already loaded in the previous example)‡‡‡:
sm.fit(log(YS),data.frame(log(D),log(H),log(TS),log(EM),log(EB)))

$bias
[1] 16.55898

†††
Rigorously speaking, special functions (such as exponential, logarithm, etc.) should not be directly
applied to variables with dimensions, but only to dimensionless variables. Unfortunately, the type I
standard transformation cannot be used with logarithms as it will result in the logarithm of negative
values (undefined in the realm of real numbers). In those cases, a type II standard transformation can be
used [4], which yields only positive (or only negative) values. Other functions having arguments with
upper and lower bounds may require type III standard transformations [4]. Now, in the case of type II
transformations for the logarithm function we have:
( )
( )
Then, the standard transformation of the logarithm becomes:
( ) ( ( ) ( )) ( )

√ ( √ ( )
( ))

Thus, the logarithm of the variables is used in the model, but formally the dimensionless type II standard
transformation has been considered.
‡‡‡
For simplicity, warnings, response variable predictions and model residuals will no longer be included
in the outputs.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (31 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

$coeff
terms coeff alphaS include
1 log.D. 0.0000000 0.0000000 FALSE
2 log.H. -0.4664228 0.3937767 TRUE
3 log.TS. 1.6728987 -1.3200833 TRUE
4 log.EM. -4.1167859 0.1797774 TRUE
5 log.EB. -0.4242647 0.2727489 TRUE

$par
NULL

$partype
NULL

$dof
[1] 8

$model_performance
s R2 R2adj
0.2670242 0.9244147 0.8866220

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9799599 0.9984853 1.378534 2.208051

In this case, a model with was obtained with normal residuals, where the logarithm
of tensile strength was the most relevant variable, followed by logarithm of hardness. The
logarithm of density was discarded for being irrelevant.
In the absence of tensile strength data, the following model for describing the logarithm of
yield strength is obtained:
sm.fit(log(YS),data.frame(log(D),log(H),log(EM),log(EB)))

$bias
[1] 1.36776

$coeff
terms coeff alphaS include
1 log.D. 0.0000000 0.000000 FALSE
2 log.H. 1.0558788 -0.891424 TRUE
3 log.EM. 0.0000000 0.000000 FALSE
4 log.EB. -0.2527525 0.162488 TRUE

$par
NULL

$partype
NULL

$dof
[1] 10

$model_performance
s R2 R2adj
0.2807916 0.8955245 0.8746294

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9929549 0.999264 1.403032 3.130867

In addition to the logarithm of density, the logarithm of elastic modulus was also irrelevant.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (32 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Let us now consider a nonlinear model of the yield strength in terms of individual powers of
hardness and elongation at break. Two functions must be defined before running the fitting
procedure:
fnH<-function(x,par) as.matrix(x)[,1]^par[1]
fnEB<-function(x,par) as.matrix(x)[,2]^par[2]
sm.fit(YS,data.frame(H,EB),param0=c(1,1),terms=c("fnH","fnEB"))

$bias
[1] 59.30159

$coeff
terms coeff alphaS include
1 fnH 0.01308405 -1 TRUE
2 fnEB 0.00000000 0 FALSE

$par
[1] 2.077278 1.000000

$partype
[1] "param" "const"

$dof
[1] 10

$model_performance
s R2 R2adj
32.4679575 0.9548629 0.9458355

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9979307 0.9999066 1.789461 3.714821

The resulting model considered only the effect of hardness on yield strength. Notice that the
second additional parameter (exponent of elongation at break) was kept at its initial (nominal)
value and is treated as a constant and no longer as an unknown parameter (liberating one
degree of freedom).
The model plots (removing elongation at break) can be obtained as follows§§§:
sm.plot(sm.fit(YS,H,param0=1,terms="fnH"), yname="Yield Strength (MPa)", xname=
"Hardness (Brinell)",x=H)

The graphical output is presented in Figure 3.

4.2. Extreme Multiple Linear Model


Let us now consider an extreme multiple linear model example [17], where each individual
input variable has a close to zero correlation with the response variable, but the combined
effect is highly correlated. The data is shown in Table 2.

§§§
Here, the same initial parameter value is used and not the optimal value found in the previous
optimization. If the optimal value is used, the parameter will be considered a constant, and an additional
degree of freedom will be gained, resulting in an artificially lower model error.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (33 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Figure 3. Nonlinear model of yield strength with respect to hardness, obtained by symmetrical fitting

Table 2. Extreme Multiple Linear Regression Data [17]

0.11 16.55 12.37


0.69 15.08 12.66
5.50 0.00 12.00
2.89 7.77 11.93
4.47 2.16 11.06
1.81 12.09 13.03
3.15 8.18 13.13
0.00 15.94 11.44
3.15 7.91 12.86
3.02 6.29 10.84
4.67 1.69 11.20
0.16 15.58 11.56
0.68 13.28 10.83
5.71 0.00 12.63
3.87 5.36 12.46

The following R code can be used to fit a model of as a linear function of and :
x1=c(0.11,0.69,5.5,2.89,4.47,1.81,3.15,0,3.15,3.02,4.67,0.16,0.68,5.71,3.87)
x2=c(16.55,15.08,0,7.77,2.16,12.09,8.18,15.94,7.91,6.29,1.69,15.58,13.28,0,5.36
)
x=data.frame(x1,x2)

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (34 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

y=c(12.37,12.66,12,11.93,11.06,13.03,13.13,11.44,12.86,10.84,11.2,11.56,10.83,1
2.63,12.46)
sm.fit(y,x,plots=TRUE)

$bias
[1] -4.534961

$coeff
terms coeff alphaS include
1 x1 3.005533 -7.437323 TRUE
2 x2 1.002219 -7.437391 TRUE

$dof
[1] 12

$model_performance
s R2 R2adj
0.0074640 0.9999258 0.9999134

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9953441 0.9999997 1.833684 3.267741

The full graphical output is presented in Figure 4.


An interesting result is obtained when the response variable is modified by a nonlinear function
of one of the input variables.

As an example, let us consider a new response variable given by .


The corresponding model is obtained by symmetrical fitting as follows:
sm.fit(y+0.5*x1^2,x)

$bias
[1] 9.877415

$coeff
terms coeff alphaS include
1 x1 2.81931 -1 TRUE
2 x2 0.00000 0 FALSE

$dof
[1] 13

$model_performance
s R2 R2adj
1.7525571 0.9089344 0.9019293

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Uniform 0.9784171 0.9980345 1.346825 2.206777

In this case, the effect of is observed as irrelevant, remaining only the effect of . In
addition, the model residuals are best described by a uniform model.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (35 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Figure 4. Extreme multiple linear model (Table 2), obtained by symmetrical fitting

The model performance can be observed graphically in Figure 5, obtained by using the
following code:

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (36 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

sm.plot(sm.fit(y+0.5*x1^2,x1),x=x1)

Figure 5. Modified version of the extreme multiple linear model, obtained by symmetrical fitting

This data set can also be used to evaluate model over-fitting. Let us first consider a modified
response variable given by . The model is obtained using the following code:
sm.fit(y+20*x1,x)

$bias
[1] -4.534348

$coeff
terms coeff alphaS include
1 x1 23.005422 -1.1484796 TRUE
2 x2 1.002182 -0.1500384 TRUE

$dof
[1] 12

$model_performance
s R2 R2adj
0.007463931 0.999999970 0.999999965

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9951947 1 1.827104 3.266348

First, the extremely high determination coefficient of the model raises doubts about over-
fitting. The second point is that | ̂ | values less than are obtained (heuristic condition for
over-fitting). However, these situations do not necessarily confirm over-fitting. So, let us

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (37 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

compare the model error with the minimum experimental error propagated through the
model. For this, we need the experimental error of the observed variables. Since we do not
have more information, let us consider only the truncation error estimated as for all

variables (assuming a uniform error model). The evaluation of model over-fitting is performed
simply including the Uexp values in the sm.fit function as follows:
sm.fit(y+20*x1,x,Uexp=rep(0.01/sqrt(12),3))

$bias
[1] 11.91705

$coeff
terms coeff alphaS include
1 x1 20.0312 -1 TRUE
2 x2 0.0000 0 FALSE
$dof
[1] 13

$model_performance
s R2 R2adj UE CF
0.83061043 0.99959479 0.99956362 0.06428043 0.01190692

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Uniform 0.995956 0.9999984 0.6650795 2.786777

The minimum experimental error propagated through the model was UE= , which is
much larger than the residual error of the previous model (s= ), thus confirming model
over-fitting when is also included in the model. Despite the increase in residual error for the
new model without over-fitting (s= ), the model performance remains highly
satisfactory (R2= ).
In the absence of experimental error information, the heuristic over-fit control strategy can be
used (yielding the same results), as follows:
sm.fit(y+20*x1,x,heur=TRUE)

$bias
[1] 11.91705

$coeff
terms coeff alphaS include
1 x1 20.0312 -1 TRUE
2 x2 0.0000 0 FALSE

$dof
[1] 13

$model_performance
s R2 R2adj
0.8306104 0.9995948 0.9995636

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Uniform 0.995956 0.9999984 0.6650795 2.786777

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (38 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

4.3. Harmonic Spectra in Titanium Nitride Films


For this example, a data set reported by Wen et al. [18] was considered, where integrated
values of second harmonic spectra are presented as a function of the excitation wavelength in
titanium nitride (TiN) films. The data set is summarized in Table 3.
Table 3. Integrated second harmonic spectra values for different excitation wavelengths in TiN films
(extracted from [18])
Excitation Integrated Spectra Excitation Integrated Spectra
Wavelength [μm] [a.u.] Wavelength [μm] [a.u.]
0.430 1.10 0.490 3.38
0.435 1.17 0.495 3.39
0.440 0.95 0.500 2.71
0.450 1.00 0.505 3.32
0.455 1.66 0.510 4.05
0.460 1.12 0.515 4.76
0.465 1.59 0.520 4.93
0.470 2.02 0.525 4.90
0.475 2.12 0.530 4.85
0.480 1.46 0.535 4.59
0.485 2.34 0.540 3.93

The goal here is to model the effect of excitation wavelength on the integrated second
harmonic spectrum obtained by considering a polynomial model up to the 10 th power. The
following R code can be used to fit the model:
y=c(1.10,1.17,0.95,1,1.66,1.12,1.59,2.02,2.12,1.46,2.34,3.38,3.39,2.71,3.32,4.0
5,4.76,4.93,4.90,4.85,4.59,3.93)
x=c(0.43,0.435,0.44,0.45,0.455,0.46,0.465,0.47,0.475,0.48,0.485,0.49,0.495,0.5,
0.505,0.51,0.515,0.52,0.525,0.53,0.535,0.54)
X=data.frame(x,x^2,x^3,x^4,x^5,x^6,x^7,x^8,x^9,x^10)
out=sm.fit(y,X)
out
$bias
[1] -130.4159

$coeff
terms coeff alphaS include
1 x 0.000 0.00000 FALSE
2 x.2 0.000 0.00000 FALSE
3 x.3 5437.589 -88.63674 TRUE
4 x.4 0.000 0.00000 FALSE
5 x.5 0.000 0.00000 FALSE
6 x.6 0.000 0.00000 FALSE
7 x.7 -722251.526 1609.43753 TRUE
8 x.8 2143333.521 -2708.72122 TRUE
9 x.9 -1677144.524 1187.12513 TRUE
10 x.10 0.000 0.00000 FALSE

$dof
[1] 17

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (39 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

$model_performance
s R2 R2adj
0.3753151 0.9466290 0.9340712

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9894352 0.9994361 1.984046 0.480491

A plot of the symmetrically fitted model, shown in Figure 6, can be obtained as follows:
xlim=c(min(x),max(x))
xmodel=xlim[1]+(xlim[2]-xlim[1])*(0:1000)/1000
Xmodel=data.frame(xmodel,xmodel^2,xmodel^3,xmodel^4,xmodel^5,xmodel^6,xmodel^7,
xmodel^8,xmodel^9,xmodel^10)
ymodel=sm.fn(Xmodel,out)
s=out$model_performance[1]
plot(xmodel,ymodel,xlab="Excitation Wavelength (um)",ylab="Integrated Spectra (
a.u.)",col="green",type="l",ylim=c(min(ymodel)-s,max(ymodel)+s))
lines(xmodel,ymodel-s,lty=2,col="red")
lines(xmodel,ymodel+s,lty=2,col="red")
points(x,y,pch=16,col="blue")
legend("top",inset=c(0,-0.22),legend=c("Experimental observations","Model predi
ctions","Standard error"), lty=c(0,1,2), pch=c(16,NA,NA), col=c("blue","green",
"red"),bty="n",xpd=TRUE)

Figure 6. Polynomial model describing the second harmonic spectra data of TiN films, obtained by
symmetrical fitting.

4.4. Oscillatory Damping Model


Maali et al. [19] reported the behavior of the interaction stiffness versus the gap between a
cantilever tip of an atomic force microscope (AFM) and the surface of liquid
octamethylcyclotetrasiloxane (OMCTS). A sample of data extracted from [19] is summarized in
Table 4.
Let us now consider the following empirical oscillatory damping model for stiffness ( ):

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (40 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

̂( ) ( )
(4.1)
Table 4. Stiffness vs. cantilever gap in AFM [19]
Stiffness Stiffness Stiffness Stiffness
Gap (A) Gap (A) Gap (A) Gap (A)
(N/m) (N/m) (N/m) (N/m)
10.28 0.338 13.92 0.099 21.83 -0.004 35.96 -0.022
10.35 0.279 14.13 0.115 22.05 0.015 37.25 -0.014
10.42 0.236 14.34 0.131 22.20 0.029 37.89 -0.009
10.49 0.206 14.56 0.150 22.48 0.042 38.32 -0.004
10.56 0.198 14.77 0.163 22.69 0.053 39.17 0.004
10.63 0.069 14.98 0.181 22.91 0.061 39.39 0.010
10.70 0.139 15.63 0.187 23.33 0.069 40.24 0.004
10.77 0.109 15.78 0.228 23.76 0.074 41.10 -0.004
10.84 0.061 16.06 0.187 24.19 0.058 41.53 -0.004
10.91 0.050 16.27 0.204 24.62 0.048 41.96 -0.004
10.98 0.039 16.48 0.166 24.83 0.034 42.60 -0.017
11.06 0.010 16.70 0.150 25.26 0.010 43.46 -0.020
11.14 -0.004 16.91 0.139 25.61 -0.001 44.10 -0.020
11.22 -0.009 17.13 0.115 25.90 -0.017 44.95 -0.014
11.30 -0.020 17.28 0.128 26.33 -0.036 45.38 -0.012
11.38 -0.039 17.34 0.096 27.19 -0.044 45.60 -0.014
11.46 -0.030 17.49 0.080 27.40 -0.044 46.67 -0.006
11.54 -0.047 17.55 0.058 28.04 -0.036 47.74 0.002
11.62 -0.052 17.77 0.039 28.47 -0.028 48.59 -0.004
11.70 -0.060 17.98 0.031 28.69 -0.017 49.24 -0.009
11.78 -0.047 18.41 0.010 28.90 -0.012 49.88 -0.009
11.99 -0.041 18.56 -0.004 29.54 -0.001 50.73 -0.012
12.33 -0.030 18.62 -0.025 30.61 0.004 51.38 -0.017
12.63 -0.022 19.05 -0.044 31.04 0.013 52.23 -0.009
12.75 -0.009 19.27 -0.057 31.47 0.021 53.09 -0.009
12.87 0.002 19.69 -0.065 31.68 0.023 53.94 -0.012
12.99 0.018 20.12 -0.074 32.32 0.015 55.23 -0.012
13.11 0.029 20.55 -0.068 32.97 0.010 55.44 -0.006
13.23 0.045 20.76 -0.052 33.39 -0.001 56.30 -0.012
13.35 0.058 20.91 -0.044 34.46 -0.014 56.94 -0.009
13.47 0.069 21.19 -0.033 34.68 -0.020 58.01 -0.009
13.62 0.074 21.34 -0.025 35.11 -0.022 59.30 -0.012
13.77 0.091 21.62 -0.001 35.32 -0.036 59.72 -0.009

The starting values considered for the parameters, and used in previous reports [3,20], are the
following: , , , .
The following R code is used to fit the model using symmetrical fitting****:
x=c(10.28,10.35,10.42,10.49,10.56,10.63,10.7,10.77,10.84,10.91,10.98,11.06,11.14,
11.22,11.3,11.38,11.46,11.54,11.62,11.7,11.78,11.99,12.33,12.63,12.75,12.87,12.99
,13.11,13.23,13.35,13.47,13.62,13.77,13.92,14.13,14.34,14.56,14.77,14.98,15.63,15

****
Note that the data might be alternatively imported from a .csv file. Also note that the optimization
results may differ between identical runs since stochastic optimization algorithms are used. However, if
enough iterations in the optimization method are considered, the optimal values obtained should be
similar.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (41 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

.78,16.06,16.27,16.48,16.7,16.91,17.13,17.28,17.34,17.49,17.55,17.77,17.98,18.41,
18.56,18.62,19.05,19.27,19.69,20.12,20.55,20.76,20.91,21.19,21.34,21.62,21.83,22.
05,22.2,22.48,22.69,22.91,23.33,23.76,24.19,24.62,24.83,25.26,25.61,25.9,26.33,27
.19,27.4,28.04,28.47,28.69,28.9,29.54,30.61,31.04,31.47,31.68,32.32,32.97,33.39,3
4.46,34.68,35.11,35.32,35.96,37.25,37.89,38.32,39.17,39.39,40.24,41.1,41.53,41.96
,42.6,43.46,44.1,44.95,45.38,45.6,46.67,47.74,48.59,49.24,49.88,50.73,51.38,52.23
,53.09,53.94,55.23,55.44,56.3,56.94,58.01,59.3,59.72)
y=c(0.338,0.279,0.236,0.206,0.198,0.069,0.139,0.109,0.061,0.05,0.039,0.01,-0.004,
-0.009,-0.02,-0.039,-0.03,-0.047,-0.052,-0.06,-0.047,-0.041,-0.03,-0.022,-0.009,
0.002,0.018,0.029,0.045,0.058,0.069,0.074,0.091,0.099,0.115,0.131,0.15,0.163,0.18
1,0.187,0.228,0.187,0.204,0.166,0.15,0.139,0.115,0.128,0.096,0.08,0.058,0.039,0.0
31,0.01,-0.004,-0.025,-0.044,-0.057,-0.065,-0.074,-0.068,-0.052,-0.044,-0.033,-
0.025,-0.001,-0.004,0.015,0.029,0.042,0.053,0.061,0.069,0.074,0.058,0.048,0.034,
0.01,-0.001,-0.017,-0.036,-0.044,-0.044,-0.036,-0.028,-0.017,-0.012,-0.001,0.004,
0.013,0.021,0.023,0.015,0.01,-0.001,-0.014,-0.02,-0.022,-0.036,-0.022,-0.014,-
0.009,-0.004,0.004,0.01,0.004,-0.004,-0.004,-0.004,-0.017,-0.02,-0.02,-0.014,-
0.012,-0.014,-0.006,0.002,-0.004,-0.009,-0.009,-0.012,-0.017,-0.009,-0.009,-
0.012,-0.012,-0.006,-0.012,-0.009,-0.009,-0.012,-0.009)
nonoscillatoryexp<-function(x,param) exp(-param[1]*x)
oscillatoryexp<-function(x,param) exp(-param[2]*x)*cos(param[3]*x-param[4])
sm.fit(y,x,terms=c("nonoscillatoryexp","oscillatoryexp"),param0=c(0.35,0.15,0.8,0
),display='iter',plots=TRUE)
[1] "Chess-Inspired Multi-Algorithm Optimization (CheMO)"
iter# piece# piece_type counts fn par_1 par_2 par_3 par_4
1 0 1 Q 1 0.0215828756327954 0.35 0.15 0.8 0
iter# piece# piece_type counts fn par_1 par_2 par_3 par_4
1 1 1 Q 862 0.019196 0.374332 0.146445 0.79136 -0.069
iter# piece# piece_type counts fn par_1 par_2 par_3 par_4
1 2 1 Q 1716 0.018083 0.336898 0.146445 0.78661184 -0.0897
iter# piece# piece_type counts fn par_1 par_2 par_3
par_4
1 3 1 Q 2581 0.017816 0.321331 0.1491542325 0.785746566976
-0.06279
iter# piece# piece_type counts fn par_1 par_2 par_3
par_4
1 4 1 Q 3424 0.017809 0.315753 0.1491542325 0.785746566976
-0.056504721
iter# piece# piece_type counts fn par_1 par_2 par_3
par_4
1 5 1 Q 4330 0.017809 0.315753 0.1491542325 0.785746566976
-0.056504721

$bias
[1] -0.006105749

$coeff
terms coeff alphaS include
1 nonoscillatoryexp 9.738633 -1.460180 TRUE
2 oscillatoryexp 1.691305 -1.391505 TRUE

$par
[1] 0.31575300 0.15000000 0.78574657 -0.05650472

$partype
[1] "param" "const" "param" "param"

$dof
[1] 126

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (42 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

$model_performance
s R2 R2adj
0.01783783 0.94855812 0.94651677

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9824529 0.9990973 -6.924064 -12.10756

The graphical output for this example is shown in Figure 7 and Figure 8.

Figure 7. Nonlinear model (Eq. 4.1) describing the interaction stiffness as a function of cantilever gap in
AFM of liquid OMCTS, obtained by symmetrical fitting.

The optimization procedure stopped after the 5th iteration, and the model error decreased
from to . Both terms were found relevant for the model, but one of the
additional parameters ( ) was considered a constant. That is, the initial parameter value can
be satisfactorily used instead of the corresponding value obtained by optimization. The model
fitted the data with a coefficient of determination of . The model residuals were fitted
using a normal distribution model with a coefficient of determination of .

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (43 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Figure 8. Residuals plots for the nonlinear model (Eq. 4.1) describing the interaction stiffness as a
function of cantilever gap in AFM of liquid OMCTS, obtained by symmetrical fitting.

4.5. Vapor Pressure Models


The behavior of vapor pressure of liquids ( ) as a function of temperature ( ) is typically
modeled using the empirical Antoine equation [21]:

̂ ( )
(4.2)
where , , and are model parameters.
This expression can be transformed using natural logarithms, resulting in:

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (44 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

̂ ( ) ( )
(4.3)
where represents the model bias, represents the term coefficient, and
is an additional model parameter.
Model (4.3) will be fitted for the vapor pressure of pure water at different temperatures in the
range – [22-24]. The available experimental data is summarized in Table 5.
Table 5. Vapor pressure data for water between and [22-24].
(K) (atm) (K) (atm) (K) (atm)
255.85 0.0013 333.15 0.197 473.15 15.35
273.16 0.0060 339.65 0.263 486.25 20.00
274.35 0.0066 343.15 0.308 493.15 22.89
275.15 0.0070 353.15 0.468 494.15 23.34
277.15 0.0080 356.15 0.526 498.15 25.17
283.15 0.0121 363.15 0.693 507.75 30.00
284.45 0.0132 369.15 0.866 513.15 33.03
287.15 0.0158 373.15 1.000 523.15 39.25
291.15 0.0204 379.15 1.230 524.25 40.00
293.15 0.0231 383.15 1.420 533.15 46.31
295.35 0.0263 393.15 1.960 537.85 50.00
298.15 0.0313 393.25 2.000 548.15 58.70
303.15 0.0419 398.15 2.290 549.65 60.00
307.15 0.0526 409.48 3.210 553.15 63.33
307.25 0.0526 413.15 3.570 573.15 84.76
313.15 0.0729 423.15 4.700 593.15 111.4
314.75 0.0789 425.55 5.000 613.15 144.1
317.15 0.0899 433.15 6.100 633.15 184.2
323.15 0.122 448.15 8.810 647.10 217.8
324.75 0.132 453.15 9.900
327.15 0.148 453.65 10.00

The following R code is used to fit the model, considering an initial parameter value , and
optimizing using CheMO:
T=c(255.85,273.16,274.35,275.15,277.15,283.15,284.45,287.15,291.15,293.15,295.35,
298.15,303.15,307.15,307.25,313.15,314.75,317.15,323.15,324.75,327.15,333.15,339.
65,343.15,353.15,356.15,363.15,369.15,373.15,379.15,383.15,393.15,393.25,398.15,4
09.48,413.15,423.15,425.55,433.15,448.15,453.15,453.65,473.15,486.25,493.15,494.1
5,498.15,507.75,513.15,523.15,524.25,533.15,537.85,548.15,549.65,553.15,573.15,59
3.15,613.15,633.15,647.096)
Pv=c(0.0013,0.006,0.0066,0.007,0.008,0.0121,0.0132,0.0158,0.0204,0.0231,0.0263,0.
0313,0.0419,0.0526,0.0526,0.0729,0.0789,0.0899,0.122,0.132,0.148,0.197,0.263,0.30
8,0.468,0.526,0.693,0.866,1,1.23,1.42,1.96,2,2.29,3.21,3.57,4.7,5,6.1,8.81,9.9,10
,15.35,20,22.89,23.34,25.17,30,33.03,39.25,40,46.31,50,58.7,60,63.33,84.76,111.4,
144.1,184.2,217.8)
Antfn<-function(T,par) (1/(T+par))
Antout=sm.fit(y=log(Pv),x=T,terms="Antfn",param0=0,display='iter')
Antout

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (45 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

[1] "Chess-Inspired Multi-Algorithm Optimization (CheMO)"


iter# piece# piece_type counts fn par_1
1 0 1 Q 1 0.100719606235385 0
iter# piece# piece_type counts fn par_1
1 1 1 Q 676 0.020789 -45.250075
iter# piece# piece_type counts fn par_1
1 2 1 Q 1409 0.020789 -45.250075

$bias
[1] 11.71458

$coeff
terms coeff alphaS include
1 Antfn -3839.601 1 TRUE

$par
[1] -45.32275

$partype
[1] "param"

$dof
[1] 58

$model_performance
s R2 R2adj
0.02043926 0.99996256 0.99996127

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.837445 0.9999939 -21.16899 -49.53544

Figure 9 shows the fitted Antoine model, plotted using:


sm.plot(Antout,x=T,xname="Temperature (K)",yname="log(Vapor Pressure (atm))")

Figure 9. Antoine model (Eq. 4.3) for the vapor pressure of water between and , obtained
by symmetrical fitting. .

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (46 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

The best model residual was the normal distribution. However, the N-value obtained indicates
that residuals do not follow a normal distribution. Nevertheless, since the magnitude of the
residual error is so low, the lack-of-fit in the residual distribution model has a minimum impact
on the model performance (randomistic R2).
An alternative empirical model (although inspired by molecular mechanics) for describing
vapor pressure has been proposed [25]:

( )
̂ ( ) ( )
( )
(4.4)
where and represent a reference observation, and and are the model
parameters.
An unbiased logarithm transformation of this model is the following:

( )
̂ ( ) ( )
( )

(4.5)
where is the bias correction term, is the coefficient of the only term in
the model, and is an additional model parameter.
Using ( ), and considering an initial parameter value , the
following symmetrical model is obtained:
erfc<-function(x) 2*pnorm(x*sqrt(2),lower=FALSE)
mmfn<-function(T,par) log(T*erfc(par/T)/(373.15*erfc(par/373.15)))
mmout=sm.fit(y=log(Pv),x=T,terms="mmfn",param0=0,display='iter')
mmout

[1] "Chess-Inspired Multi-Algorithm Optimization (CheMO)"


iter# piece# piece_type counts fn par_1
1 0 1 Q 1 0.470107206218084 0
iter# piece# piece_type counts fn par_1
1 1 1 Q 1462 0.016339 431.218414
iter# piece# piece_type counts fn par_1
1 2 1 Q 2994 0.016339 431.218414

$bias
[1] 0.004348245

$coeff
terms coeff alphaS include
1 mmfn 3.036279 -1 TRUE

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (47 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

$par
[1] 431.2184

$partype
[1] "param"

$dof
[1] 58

$model_performance
s R2 R2adj
0.01606394 0.99997687 0.99997608

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.911854 0.999998 -15.28745 -29.00121

The fitted value obtained for is , quite close to the


actual value. In fact, we may round the model coefficients using a resolution of :
mmout=sm.fit(y=log(Pv),x=T,terms="mmfn",param0=0,cr=0.01,ptol=0.01)
mmout

$bias
[1] 0

$coeff
terms coeff alphaS include
1 mmfn 3.04 -1 TRUE

$par
[1] 430.8

$partype
[1] "param"

$dof
[1] 59

$model_performance
s R2 R2adj
0.01592956 0.99997687 0.99997647

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9052866 0.9999978 -15.51949 -28.14989

This allowed us to gain an additional degree of freedom for the estimation of model residuals.
Figure 10 shows the fitted model inspired by molecular mechanics, plotted using:
sm.plot(mmout,x=T,xname="Temperature (K)",yname="log(Vapor Pressure (atm))")

While both models have comparable performance, the model inspired by molecular mechanics
has a slightly lower model error.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (48 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Figure 10. Empirical model (Eq. 4.5) for the vapor pressure of water between and ,
obtained by symmetrical fitting. .

4.6. General Arrhenius Equation


Let us now consider the following general Arrhenius equation (expressed as a logarithm) for
describing reaction rate coefficients ( ) as a function of absolute temperature ( ) [26] ††††:

̂ ( )
(4.6)
where , , and are model parameters.
Symmetrical fitting is used considering the experimental data reported by Baulch et al. [27]
corresponding to the rate coefficient of the reaction between hydrogen molecules and
monoatomic oxygen radicals determined at different temperatures. The data is loaded into the
R workspace using the following code‡‡‡‡:
x=c(3246.5,3178.1,2987.6,2987.1,2986.4,2818.4,2765.1,2764.7,2764,2665.3,2575.9,24
93.6,2447.3,2226.9,2225.8,2101.5,2101.3,1841.1,1775.3,1754.6,1694.6,1603.8,1602,1
569.7,1537.5,1491.5,1490.5,1490.3,1447.9,1433.7,1419.3,1393.6,1393.4,1393.3,1330.
8,1330.6,1319.7,1319.1,1284.7,1273.4,1251.6,1242.5,1241.7,1241.4,1201.9,1201.5,11
91.5,1173,1103,1041.3,1034.3,992.8,918.8,913.4,902.2,880.8,865.3,831.8,827,813.2,
804.7,755.5,747.6,740.5,740.3,736.8,733,715.5,676.4,673.3,622.4,612,607.1,590.3,5
90.3,572.1,572,550.7,544.7,521.8,519.8,516.3,514.5,509.2,504,499,497.2,493.9,492.
3,484.2,479.6,479.6,468.9,460.2,458.7,447.7,447.7,445,441,437.2,425.9,425.9,425.8
,422.2,415.1,412.9,412.9,408.3,405,396.3,396.3,391.1,384,378.1,377.2,375.2,372.4,

††††
Only one additional parameter is considered to correctly account for the degrees of freedom of the
model, since the term coefficient represents the degree of freedom consumed by the second function
( ).
‡‡‡‡
X represents Temperature in Kelvin, y represents the natural logarithm of the reaction rate coefficient
in .

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (49 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

371.5,369.6,368.7,364.2,363.3,362.4,356.3,353.8,349.6,348,345.6,340,336.1,328,322
.3,321.6,318.8,317.4,315.4,300.7,297.7,297.7,295.9)
y=c(-23.1658,-23.3932,-23.7342,-23.6489,-23.5067,-24.0467,-23.9045,-23.8192,-
23.6487,-23.8476,-24.7003,-25.8941,-24.4727,-24.643,-24.2735,-25.0124,-24.9555,-
25.4667,-25.7224,-25.8929,-26.0349,-26.6885,-25.523,-26.6316,-26.9157,-27.2852,-
26.5176,-26.4039,-27.4271,-27.3134,-26.7163,-27.6259,-27.4838,-27.3416,-27.4551,-
27.3414,-28.2795,-27.6541,-27.5686,-27.3127,-27.0568,-28.5634,-27.7674,-27.3978,-
28.3642,-27.8241,-27.4829,-27.9945,-27.8804,-28.3633,-28.9318,-29.1305,-29.0731,-
29.5279,-29.3856,-29.6981,-29.4421,-30.6642,-30.4083,-29.9248,-30.7491,-31.1466,-
30.4642,-31.4022,-30.8337,-31.5727,-30.9757,-31.6293,-31.9983,-32.1119,-32.1963,-
32.1109,-32.5372,-33.3044,-33.0201,-33.6735,-32.8207,-33.0761,-33.4455,-33.9282,-
33.0753,-34.1555,-33.729,-34.07,-33.7856,-34.354,-33.7854,-34.1548,-33.9558,-
34.0124,-34.9219,-34.4102,-34.4099,-34.5802,-34.3243,-35.0062,-34.8356,-35.1198,-
34.8638,-35.3185,-35.6307,-35.3465,-35.2043,-35.5453,-35.4598,-36.1988,-35.9713,-
35.7721,-36.0563,-36.3117,-36.0843,-36.1978,-36.5954,-36.3677,-37.0784,-36.6234,-
37.1634,-36.7369,-36.6516,-37.1632,-37.4757,-37.1345,-37.2482,-37.4184,-37.6741,-
37.418,-37.7306,-38.0716,-38.1849,-37.872,-38.44,-38.5533,-38.3828,-38.6668,-
38.4677,-38.7235,-39.0635,-39.2622,-39.12,-39.0346)

The model can be fitted as follows:


Tfn<-function(x,par) par*log(x)+1/x
out=sm.fit(y,x,terms="Tfn",param0=0)
out

$bias
[1] -49.94089

$coeff
terms coeff alphaS include
1 Tfn -2643.468 1 TRUE

$par
[1] -0.001289

$partype
[1] "param"

$dof
[1] 137

$model_performance
s R2 R2adj
0.3432904 0.9948727 0.9947978

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9936177 0.9999673 -2.496254 0.9542165

The results obtained with this model are graphically summarized in Figure 11, obtained with the
following code:
sm.plot(out,x,xname="Temperature [K]",yname="log(k [cm3/s])")

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (50 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Figure 11. Graphical summary of model (4.6) identified using symmetrical fitting. ,
, .

4.7. Effect of Black Tea on Cardiac Physiology


Davis and Mukamal [28] reported experimental observations of the change in high-density
lipoprotein cholesterol (HDL-C) levels of adult individuals after months of black tea
consumption. The observations are summarized in Table 6. The data input in R is the following:
scat=c("Female","Female","Female","Male","Female","Male","Male","Female","Female"
,"Female","Female","Male","Female","Female","Female","Male","Male","Male","Female
","Male","Male","Female","Male","Female","Female","Female","Female","Female")
diff=c(10,10,6,2,-2,5,-3,25,-11,-1,13,4,1,11,-13,-13,-4,4,-18,-1,-7,2,-5,3,-5,8,-
25,-1)

The purpose of this example is modeling the effect of sex on the change in HDL-C levels after 6
months of black tea consumption. Since sex is a categorical variable, it must first be
transformed into a suitable numerical variable (i.e. binary variable). This can be done as follows:
snum=scat=="Female"

snum is a Boolean variable, but it is interpreted by R as for FALSE and for TRUE.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (51 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

Table 6. HDL-C levels difference of 28 adult individuals before and after 6 months of black tea
consumption [28]
HDL-C Difference,
ID Sex
mg/dL
1 Female 10
2 Female 10
3 Female 6
4 Male 2
5 Female -2
6 Male 5
7 Male -3
8 Female 25
9 Female -11
10 Female -1
11 Female 13
12 Male 4
13 Female 1
14 Female 11
15 Female -13
16 Male -13
17 Male -4
18 Male 4
19 Female -18
20 Male -1
21 Male -7
22 Female 2
23 Male -5
24 Female 3
25 Female -5
26 Female 8
27 Female -25
28 Female -1

The symmetrical model can then be obtained as follows:


sm.fit(diff,snum)

$bias
[1] -0.1785714

$coeff
terms coeff alphaS include
1 x 0 0 FALSE

$dof
[1] 26

$model_performance
s R2 R2adj
10.42944637 0.00000000 -0.03846154

$residual_model
model randomR2 randomisticR2 Nvalue Hvalue
1 Normal 0.9920027 0.9920027 2.16558 4.075575

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (52 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

That is, the effect of sex on the difference in HDL-C levels after 6 months of black tea
consumption is irrelevant.
This evaluation can be alternatively performed using the test of relevance (r.test), as follows:
r.test(diff[which(scat=="Female")],diff[which(scat=="Male")],plot=TRUE)

r relevant sample.diff model.diff


1 0.1202521 FALSE 2.522222 0

The resulting plot is shown in Figure 12. We can observe higher dispersity of the results for
females, but not an evident difference in their means.

Figure 12. Relevance test for the effect of sex on the difference in HDL-C levels after 6 months of black
tea consumption.

Acknowledgment and Disclaimer

This report provides data, information and conclusions obtained by the author(s) from original scientific
research, based on the best knowledge available to the author(s). The main purpose of this publication is
to openly share scientific knowledge. Any mistake, omission, error or inaccuracy published, if any, is
completely unintentional.

This research did not receive any specific grant from funding agencies in the public, commercial, or non-
profit sectors.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC
4.0). Anyone is free to share (copy and redistribute the material in any medium or format) or adapt
(remix, transform, and build upon the material) this work under the following terms:

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (53 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

 Attribution: Appropriate credit must be given, providing a link to the license, and indicating if
changes are made. This can be done in any reasonable manner, but not in any way that suggests
endorsement by the licensor.
 Non-Commercial: This material may not be used for commercial purposes.

References

[1] Hernandez, H. (2024). On the Lack of Symmetry in Least-Squares Minimization. ForsChem


Research Reports, 9, 2024-15, 1 - 55. doi: 10.13140/RG.2.2.34120.64006.
[2] Hernandez, H. (2025). Relevant vs. Significant Differences in Hypothesis Testing. ForsChem
Research Reports, 10, 2025-01, 1 - 55. doi: 10.13140/RG.2.2.30012.35200.
[3] Hernandez, H. (2025). Optimal Model Structure Identification. 3. Heteroscedastic Models.
ForsChem Research Reports, 10, 2025-05, 1 - 55. doi: 10.13140/RG.2.2.14395.53286.
[4] Hernandez, H. (2018). Multidimensional Randomness, Standard Random Variables and Variance
Algebra. ForsChem Research Reports, 3, 2018-02, 1-35. doi: 10.13140/RG.2.2.11902.48966.
[5] Hernandez, H. (2022). Standard Deterministic, Standard Random, and Randomistic Variables.
ForsChem Research Reports, 7, 2022-06, 1 - 18. doi: 10.13140/RG.2.2.36316.87688.
[6] Zhang, J. (2021). Modern Monte Carlo methods for efficient uncertainty quantification and
propagation: A survey. Wiley Interdisciplinary Reviews: Computational Statistics, 13 (5), e1539. doi:
10.1002/wics.1539.
[7] Hernandez, H. (2018). Parameter Identification using Standard Transformations: An Alternative
Hypothesis Testing Method. ForsChem Research Reports 2018-04. doi:
10.13140/RG.2.2.14895.02728.
[8] Hernandez, H. (2019). Goodness-of-fit of Randomistic Models. ForsChem Research Reports, 4,
2019-10, 1-27. doi: 10.13140/RG.2.2.35386.34248.
[9] Hernandez, H. (2019). Modeling and Identification of Noisy Dynamic Systems. ForsChem Research
Reports 2019-04. doi: 10.13140/RG.2.2.12571.72489.
[10] Hernandez, H. (2025). Chess-Inspired Multi-Algorithm Optimization (CheMO). ForsChem Research
Reports, 10, 2025-04, 1 - 55. doi: 10.13140/RG.2.2.16051.13603.
[11] Hernandez, H. (2023). Multi-Algorithm Optimization. ForsChem Research Reports, 8, 2023-12, 1 -
55. doi: 10.13140/RG.2.2.21772.49284.
[12] Hernandez, H. (2023). Replacing the R² Coefficient in Model Analysis. ForsChem Research Reports,
8, 2023-10, 1 - 55. doi: 10.13140/RG.2.2.26570.13769.
[13] Hernandez, H. (2021). Testing for Normality: What is the Best Method? ForsChem Research
Reports, 6, 2021-05, 1-38. doi: 10.13140/RG.2.2.13926.14406.
[14] Hernandez, H. (2023). Evaluating Scedasticity using H-values. ForsChem Research Reports, 8, 2023-
16, 1 - 55. doi: 10.13140/RG.2.2.19965.95200.
[15] Hernandez, H. (2021). Optimal Significance Level and Sample Size in Hypothesis Testing. 7.
Implementation Remarks. ForsChem Research Reports, 6, 2021-12, 1-27. doi:
10.13140/RG.2.2.23632.64000.
[16] Hernandez, H. and Ochoa, S. (2022). Adaptive Step-size One-at-a-time (OAT) Optimization.
ForsChem Research Reports, 7, 2022-12, 1 - 55. doi: 10.13140/RG.2.2.15208.14087.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (54 / 55)
The Symmetrical Fitting Method
for Model Identification
Hugo Hernandez
ForsChem Research
[email protected]

[17] Hernandez, H. (2023). Optimal Model Structure Identification. 1. Multiple Linear Regression.
ForsChem Research Reports, 8, 2023-13, 1 - 53. doi: 10.13140/RG.2.2.31051.57121.
[18] Wen, X., Li, G., Gu, C., Zhao, J., Wang, S., Jiang, C., ... & Xiong, Q. (2018). Doubly enhanced second
harmonic generation through structural and epsilon-near-zero resonances in TiN nanostructures.
ACS Photonics, 5 (6), 2087-2093. doi: 10.1021/acsphotonics.8b00419.
[19] Maali, A., Cohen-Bouhacina, T., Couturier, G., & Aimé, J. P. (2006). Oscillatory dissipation of a
simple confined liquid. Physical Review Letters, 96 (8), 086105. doi:
10.1103/PhysRevLett.96.086105.
[20] Hernandez, H. (2023). Optimal Model Structure Identification. 2. Nonlinear Regression. ForsChem
Research Reports, 8, 2023-17, 1 - 55. doi: 10.13140/RG.2.2.25901.87527.
[21] Thomson, G. W. (1946). The Antoine equation for vapor-pressure data. Chemical Reviews, 38 (1), 1-
39. doi: 10.1021/cr60119a001.
[22] Stull, D. R. (1947). Vapor Pressure of Pure Substances. Inorganic Compounds. Industrial &
Engineering Chemistry, 39 (4), 540-550. doi: 10.1021/ie50448a023.
[23] Liu, C. T., & Lindsay Jr, W. T. (1970). Vapor pressure of deuterated water from 106 to 300. deg.
Journal of Chemical and Engineering Data, 15 (4), 510-513. doi: 10.1021/je60047a015.
[24] The Engineering ToolBox (2010). Water - Heat of Vaporization vs. Temperature. Available at:
https://www.engineeringtoolbox.com/water-properties-d_1573.html. Last accessed: March 18,
2025.
[25] Hernandez, H. (2022). Molecular Modeling of Macroscopic Phase Changes 2: Vapor Pressure
Parameters. ForsChem Research Reports, 7, 2022-16, 1 - 43. doi: 10.13140/RG.2.2.10226.38086.
[26] Hernandez, H. (2019). Collision Energy between Maxwell-Boltzmann Molecules: An Alternative
Derivation of Arrhenius Equation. ForsChem Research Reports, 4, 2019-13, 1-27. doi:
10.13140/RG.2.2.21596.33926.
[27] Baulch, D. L., Bowman, C. T., Cobos, C. J., Cox, R. A., Just, T., Kerr, J. A., ... & Walker, R. W. (2005).
Evaluated kinetic data for combustion modeling: supplement II. Journal of physical and chemical
reference data, 34(3), 757-1397. doi: 10.1063/1.1748524.
[28] Davis, R. B., & Mukamal, K. J. (2006). Hypothesis Testing: Means. Circulation, 114 (10), 1078-1082.
doi: 10.1161/circulationaha.105.586461.

08/04/2025 ForsChem Research Reports Vol. 10, 2025-06


10.13140/RG.2.2.31220.26244 www.forschem.org / t.me/forschem (55 / 55)

You might also like