0% found this document useful (0 votes)

58 views25 pages

Interpretability of Hinging Hyperplanes

Uploaded by

Fabricio Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views25 pages

Interpretability of Hinging Hyperplanes

Uploaded by

Fabricio Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Chapter 2

Interpretability of Hinging Hyperplanes

The hinging hyperplane model was proposed by Breiman [20]. This type of nonlinear
model is often referenced in the literature since it suffers from convergency and range
problems [19, 33–35]. Methods such as a penalty of the hinging angle were proposed
to improve Breiman’s algorithm [18]; alternatively, the Gauss-Newton algorithm can
be used to obtain the final nonlinear model [34]. Several application examples have
also been published in the literature; e.g., it can be used in the identification of
piecewise affine systems via mixed-integer programming [36], and this model also
lends itself to forming hierarchical models [19].
In this chapter a much more applicable algorithm is proposed for hinging hyper-
plane identification. The key idea is that in a special case (c = 2), the fuzzy c-
regression method (FCRM) [37] can be used for identifying hinging hyperplane
models. To ensure that two local linear models used by the fuzzy c-regression algo-
rithm form a hinging hyperplane function, it has to be guaranteed that local models
intersect each other in the operating regime of the model. The proposed constrained
FCRM algorithm is able to identify one hinging hyperplane model; therefore, to
generate more complex regression trees, the described method should be recursively
applied. Hinging hyperplane models containing two linear submodels divide the
operating region of the model into two parts, since hinging hyperplane functions
define a linear separating function in the input space of the hinging hyperplane func-
tion. These separations result in a regression tree where branches correspond to
linear divisions of the operating regime based on the hinge of the hyperplanes at
a given node. This type of partitioning can be considered as the crisp version of a
fuzzy regression-based tree described in [38]. Fortunately, in the case of a hinging
hyperplane-based regression tree there is no need to select the best splitting variable
at a given node, but, on the other hand, it is not as interpretable as regression trees
utilizing univariate decisions at nodes.
To support the analysis and building of this special model structure, novel model
performance and complexity measures are presented in this work. Special attention
is given to modeling and controlling nonlinear dynamical systems. Therefore, an
application example related to the Box-Jenkins gas furnace benchmark identification
© The Author(s) 2015 9
T. Kenesei and J. Abonyi, Interpretability of Computational
Intelligence-Based Regression Models, SpringerBriefs in Computer Science,
DOI 10.1007/978-3-319-21942-4_2
10 2 Interpretability of Hinging Hyperplanes

problem is added. It will also be shown, that thanks to the piecewise linear model
structure, the resulting regression tree can be easily utilized in model predictive
control. A detailed application example related to the model predictive control of a
water heater will demonstrate the benefits of the proposed framework.
A critical step in the application of model-based control is the development of a
suitable model for the process dynamics. This difficulty stems from lack of knowledge
or understanding of the process to be controlled. Fuzzy modeling has been proven
to be effective for the approximation of uncertain nonlinear processes. Recently,
nonlinear black-box techniques using fuzzy and neuro-fuzzy modeling have received
a great deal of attention [39]. Readers interested in industrial applications can find
an excellent overview in [40]. Details of relevant model-based control applications
are well presented in [41, 42].
Most nonlinear identification methods are based on the NARX (Nonlinear AutoRe-
gressive with eXogenous input) model [8]. The use of NARX black-box models for
high-order dynamic processes in some cases are impractical. Data-driven identifica-
tion techniques alone may yield unrealistic NARX models in terms of steady-state
characteristics, local behavior and unreliable parameter values. Moreover, the iden-
tified model can exhibit regimes which are not found in the original system [42].
This is typically due to insufficient information content of the identification data and
the overparametrization of the model. This problem can be remedied by incorporat-
ing prior knowledge into the identification method by constraining the parameters
of the model [43]. Another way to reduce the effects of overparametrization is to
restrict the structure of the NARX model, using, for instance, the Nonlinear Additive
AutoRegressive with eXogenous input (NAARX) model [44]. In this book a differ-
ent approach is proposed; a hierarchical set of local linear models are identified to
handle complex systems dynamics.
Operating regime-based modeling is a widely applied technique for identification
of these nonlinear systems. There are two approaches building operating regime-
based models. An additive model uses the sum of certain basis functions to represent
a non-linear system, while partitioning approach partitions the input space recursively
to increase modeling accuracy locally [18]. Models generated by this approach are
often represented by trees [45]. Piecewise linear systems [46] can be easily repre-
sented in a regression tree structure [47]. A special type of regression tree is called
the locally linear model tree, which combines a heuristic strategy for input space
decomposition with a local linear least squares optimization (like LOLIMOT [1]).
These models are hierarchical models consisting of nodes and branches. Internal
nodes represent tests on input variables of the model, and branches correspond to
outcomes of the tests. Leaf (terminal) nodes contains regression models in the case
of regression trees.
Thanks to the structured representation of the local linear models, hinging hyper-
planes lend themselves to a straightforward incorporation into model-based control
schemes. In this chapter this beneficial property is demonstrated in the design of an
instantaneous linearization-based model predictive control algorithm [32].
This chapter organized as follows: the next section discusses how hinging hyper-
plane function approximation is done with the FCRM identification approach. The
2 Interpretability of Hinging Hyperplanes 11

description of the tree growing algorithm and the measures proposed to support model
building are given in Sect. 2.2. In Sect. 2.3, application examples are presented, while
Sect. 2.4 concludes the chapter.

2.1 Identification of Hinging Hyperplanes

2.1.1 Hinging Hyperplanes

The following section gives a brief description of the hinging hyperplane approach
on the basis of [18, 34, 48], followed by a description of how the constraints can be
incorporated into FCRM clustering.
For a sufficiently smooth function f (xk ), which can be linear or nonlinear,
assume that regression data {xk , yk } is available for k = 1, . . . , N . Function
f (xk ) can be represented as the sum of a series of hinging hyperplane functions
h i (xk ) i = 1, 2, . . . , K . Breiman [20] proved that we can use hinging hyperplanes to
approximate continuous functions on compact sets, guaranteeing a bounded approx-
imation error
K
en = f − h i (x) ≤ (2R)4 c2 /K , (2.1)
i=1

where K is the number of hinging hyperplane functions, R is the radius of the sphere
in which the compact set is contained, and c is such that

w2 | f (w) |dw = c < ∞. (2.2)

The approximation with hinging hyperplane functions can get arbitrarily close if
a sufficiently large number of hinging hyperplane functions are used. The sum of
K
the hinging hyperplane functions, i=1 h i (xk ), constitutes a continuous piecewise
linear function. The number of input variables n in each hinging hyperplane function
and the number of hinging hyperplane functions K are two variables to be determined.
The explicit form for representing a function f (xk ) with hinging hyperplane functions
becomes (see Fig. 2.1)

K
K
f (xk ) = h i (xk ) = max | min xkT θ1,i , xkT θ2,i , (2.3)
i=1 i=1

where max | min means max or min.

Suppose two hyperplanes are given by:

yk = xkT θ1 , yk = xkT θ2 , (2.4)

12 2 Interpretability of Hinging Hyperplanes

Fig. 2.1 Basic hinging

hyperplane definitions

where xk = xk,0 , xk,1 , xk,2 , . . . , xk,n , xk,0 ≡ 1, is the kth regressor vector and yk
is the kth output variable. These two hyperplanes are continuously joined together at
{x : x T (θ1 − θ2 ) = 0}, as can be seen in Fig. 2.1. As a result they are called hinging
hyperplanes. The joint = θ1 − θ2 , is defined as hinge for the two hyperplanes
yk = xkT θ1 and yk = xkT θ2 . The solid/shaded parts of the two hyperplanes are
explicitly given by

yk = max(xkT θ1 , xkT θ2 ) or yk = min(xkT θ1 , xkT θ2 ). (2.5)

The hinging hyperplane method has some interesting advantages for nonlinear func-
tion approximation and identification:
1. Hinging hyperplane functions could be located by a simple computationally effi-
cient method. In fact, hinging hyperplane models are piecewise linear models;
the linear models are usually obtained by repeated use of the linear least squares
method, which is very efficient. The aim is to improve the whole identification
method with more sophisticated ideas.
2. For nonlinear functions that resemble hinging hyperplane functions, the hinging
hyperplane method has very good and fast convergence properties.
The hinging hyperplane method practically combines some advantages of neural
networks (in particular, the ability to handle very large dimensional inputs) and
constructive wavelet-based estimators (availability of very fast training algorithms).
The essential hinging hyperplane search problem can be viewed as an extension
of the linear least squares regression problem. Linear least squares regression aims
to find the best parameter vector
θ by minimizing a quadratic cost function with the
regression model that gives the best linear approximation to y. For nonsingular data
matrix X, the linear least squares estimate y = x T θ is always uniquely available.
The hinging hyperplane search problem, on the other hand, aims to find the two
parameter vectors θ1 and θ2 , defined by
2.1 Identification of Hinging Hyperplanes 13

2
[θ1 , θ2 ] = arg min max | min yk − xkT θ1 , yk − xkT θ2 . (2.6)
θ1 ,θ2
k=1

A brute force application of the Gauss-Newton method can solve the above
described optimization problem. However, two problems exist [18]:

1. High computational requirement. The Gauss-Newton method is computationally

intensive. In addition, since the cost function is not continuously differentiable,
the gradients required by the Gauss-Newton method cannot be given analytically.
Numerical evaluation is thus needed, which has high computational requirement.
2. Local minima. There is no guarantee that the global minimum can be obtained.
Therefore, an appropriate initial condition is crucial.

2.1.2 Improvements in Hinging Hyperplane Identification

The proposed identification algorithm applies a much simpler optimization method,

the so-called alternating optimization, which is a heuristic optimization technique
and has been applied for several decades for many purposes; therefore, it is an
exhaustively tested method in nonlinear parameter and structure identification as
well. Within the hinging hyperplane function approximation approach, the two linear
submodels can be identified by the weighted linear least squares approach, but their
operating regimes (where they are valid) are still an open question.
For that purpose, the fuzzy c-regression model (further referred as FCRM and
proposed by Hathaway and Bezdek [37]) was used. This technique is able to partition
the data and determine the parameters of the linear submodels simultaneously. With
the application of alternating optimization techniques and by taking advantage of
the linearity in (yk − xkT θ1 ) and (yk − xkT θ2 ), an effective approach is given for
hinging hyperplane function identification; hence the FCRM method for a special
case (c = 2) is able to identify hinging hyperplanes. The proposed procedure is
attractive from the local minima point of view as well, because in this way, although
the problem is not avoided, it is transformed into a deeply discussed problem, namely
the cluster validity problem.
The following quadratic cost function can be applied for the FCRM method:

c
N
E m (U, {θi }) = (μi,k )m E i,k (θi ) , (2.7)
i=1 k=1

where m ∈ 1, ∞) denotes a weighting exponent which determines the fuzziness of

the resulting clusters, while θi represents the parameters of local models and μi,k ∈ U
is the membership degree, which could be interpreted as a weight representing the
extent to which the value predicted by the model fi (xk , θi ) matches yk . The prediction
error is defined by
14 2 Interpretability of Hinging Hyperplanes

2
E i,k = yk − f i (xk ; θi ) , (2.8)

but other measures can be applied as well, provided they fulfill the minimizer property
stated by Hathaway and Bezdek [37].
One possible approach to the minimization of the objective function (2.7) is the
group coordinate minimization method that results in the following algorithm:
• Initialization Given a set of data {(x1 , y1 ), . . . , (x N , y N )}, specify c, the structure
of the regression models (2.8) and the error measure (2.7). Choose a weighting
exponent m > 1 and a termination tolerance ε > 0. Initialize the partition matrix
randomly.
• Repeat For l = 1, 2, . . .
• Step 1 Calculate values for the model parameters θi that minimize the cost function
E m (U, {θi }).
• Step 2 Update the partition matrix

(l) 1
μi,k = c , 1 ≤ i ≤ c, 1 ≤ k ≤ N (2.9)
(E
j=1 i,k /E j,k )
2/(m−1)

until ||U(l) − U(l−1) || < ε.

A specific situation arises when the regression functions fi are linear in the parameters
θi , f i (xk ; θi ) = xi,k
T θ , where x
i i,k is a known arbitrary function of xk . In this case,
the parameters can be obtained as a solution of a set of the weighted least squares
problem where the membership degrees of the fuzzy partition matrix U serve as the
weights.
The N data pairs and the membership degrees are arranged in the following
matrices:
⎡ T ⎤ ⎡ ⎤ ⎡ ⎤
xi,1 y1 μi,1 0 · · · 0
⎢ T ⎥ ⎢ y2 ⎥ ⎢ 0 μi,2 · · · 0 ⎥
⎢ xi,2 ⎥
X=⎢ ⎥, y = ⎢ ⎢ .
⎥
⎥ , Φ =
⎢
⎢ .. .. . .
⎥
. ⎥. (2.10)
⎢ .. ⎥ ⎣ .. ⎦
i
⎣ . . .. ⎦
⎣ . ⎦ .
T
xi,N yN 0 0 · · · μi,N

The optimal parameters θi are then computed by

θi = [XT Φi X]−1 XT Φi y. (2.11)

Applying c = 2 during FCRM identification, these models can be used as base

identifiers for hinging hyperplane functions. For hinging hyperplane function iden-
tification purposes, two prototypes have to be used by FCRM (c = 2), and these
prototypes must be linear regression models. However, these linear submodels have
to intersect each other within the operating regime covered by the known data points
(within the hypercube expanded by the data). This is a crucial problem in the hinging
hyperplane identification area [18]. To take into account this point of view as well,
2.1 Identification of Hinging Hyperplanes 15

Fig. 2.2 Hinging hyperplane y

identification restrictions hinge function

hinging hyperplanes

hinge

x
V1 V2

constraints have to be taken into consideration as follows. Cluster centers vi can also
be computed from the result of FCRM as the weighted average of the known input
data points:
N
xk μi,k
vi = k=1 N
, (2.12)
k=1 μi,k

where the membership degree μi,k is interpreted as a weight representing the extent
to which the value predicted by the model matches yk . These cluster centers are
located in the ‘middle’ of the operating regime of the two linear submodels. Because
the two hyperplanes must cross each other, the following criteria can be specified
(see Fig. 2.2):

v1 (θ1 − θ2 ) < 0 and v2 (θ1 − θ2 ) > 0 or (2.13)

v1 (θ1 − θ2 ) > 0 and v2 (θ1 − θ2 ) < 0.

These relative constraints can be used to take into account the constraints above:

θ1 v1 −v1
λr el,1,2 ≤ 0 where λr el,1,2 = . (2.14)
θ2 −v2 v2

When linear equality and inequality constraints are defined on these prototypes,
quadratic programming (QP) has to be used instead of the least squares method. This
optimization problem still can be solved effectively compared to other constrained
nonlinear optimization algorithms.
Local linear constraints applied to fuzzy models can be grouped into the following
categories according to their validity region:
• Local constraints are valid only for the parameters of a regression model, λi θi ≤
ωi .
• Global constraints are related to all of the regression models, λgl θi ≤ ωgl , i =
1, . . . , c.
16 2 Interpretability of Hinging Hyperplanes

Global constraints

θ i,2
[ θ 3,1 , θ 3,2 ]

[ θ 4,1 , θ 4,2 ]

θ 1,2 < θ 4,2 [ θ 2,1 , θ 2,2 ]

Local constraints

[ θ 1,1 , θ 1,2 ]
Relative
constraints
θ i,1

Fig. 2.3 Hinging hyperplane model with four local constraints and two parameters

• Relative constraints define the relative magnitude of the parameters of two or

more regression models:
θ
λr el,i, j i ≤ ωr el,i, j . (2.15)
θj

An example of these types of constraints is illustrated in Fig. 2.3.

In order to handle relative constraints, the set of weighted optimization problems
has to be solved simultaneously. Hence, the constrained optimization problem is
formulated as follows:
1 T
min θ Hθ + c θ ,T
(2.16)
θ 2

with H = 2X
T ΦX
, c = −2X
T Φy
, where
⎡ ⎤ ⎡ ⎤
y θ1
⎢y⎥ ⎢ θ2 ⎥
⎢ ⎥ ⎢ ⎥
y
= ⎢ . ⎥ , θ = ⎢ . ⎥ , (2.17)
⎣ .. ⎦ ⎣ .. ⎦
y θc
⎡ ⎤ ⎡ ⎤
X1 0 ··· 0 Φ1 0 ··· 0
⎢ 0 X2 · · · 0 ⎥ ⎢ 0 Φ2 ··· 0 ⎥
⎢ ⎥ ⎢ ⎥
X
= ⎢ . .. . . .. ⎥ , Φ = ⎢ .. .. .. .. ⎥, (2.18)
⎣ .. . . . ⎦ ⎣ . . . . ⎦
0 0 · · · Xc 0 0 · · · Φc
2.1 Identification of Hinging Hyperplanes 17

where Φi contains local membership values and the constraints on θ :

θ ≤ ω, (2.19)

with ⎡ ⎤ ⎡ ⎤
λ1 0 ··· 0 ω1
⎢ 0 λ2 ··· 0 ⎥ ⎢ ω2 ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. .. .. .. ⎥ ⎢ .. ⎥
⎢ . . . . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0 · · · λc ⎥ ⎢ ωc ⎥
⎢ ⎥ ⎢ ⎥
λ=⎢
⎢ λgl 0 ··· 0 ⎥, ω = ⎢
⎥
⎢ ωgl ⎥.
⎥ (2.20)
⎢ 0 λgl ··· 0 ⎥ ⎢ ωgl ⎥
⎢ ⎥ ⎢ ⎥
⎢ . .. . . .. ⎥ ⎥ ⎢ .. ⎥
⎢ .. . . . ⎥ ⎢ . ⎥
⎢ ⎢ ⎥
⎣ 0 0 · · · λgl ⎦ ⎣ ωgl ⎦
{λr el } {ωr el }

Referring back to Fig. 2.1, it can be concluded that with this method both parts of
the intersected hyperplanes are described and the part (max|min) that describes
the training data in the most accurate way is selected.

2.2 Hinging Hyperplane-Based Binary Trees

So far, the hinging hyperplane function identification method has been presented.
The proposed technique can be used to determine the parameters of one hinging
hyperplane function. The classical hinging hyperplane approach can be interpreted
by identifying K hinging hyperplane models consisting of global model pairs, since
these operating regimes cover the whole N dataset. This representation leads to sev-
eral problems during model identification and also renders model interpretability
more difficult. To overcome this problem, a tree structure is proposed where the data
is recursively partitioned into subsets, while each subset is used to form models of
lower levels of the tree. The concept is illustrated in Fig. 2.4, where the membership
functions and the identified hinging hyperplane models are also shown.

During the identification the following phenomena can be taken into consideration
(and can be considered as benefits too):
• When using the hinging hyperplane function there is no need to find splitting
variables at the nonterminal nodes, since this procedure is based on the hinge.
• A populated tree is always a binary tree, either balanced or non-balanced, depend-
ing on the algorithm (greedy or non-greedy). It is based on a binary tree; the hinge
splitting the x data pertains to the left side of the hinge and θ1 always goes to
the left child; and the right side behaves the similarly. For example, given a sim-
ple symmetrical binary tree structure model, the first level contains one hinging
18 2 Interpretability of Hinging Hyperplanes

0.5

0
0 0.5 1

1 1

0.5
0.5
0

−0.5 0
0.4 0.6 0.8 1 0 0.1 0.2

2 1.5
Expected output
1 1 Calculated output
Left sided membership function
0 0.5 Right sided membership function

−1 0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 2.4 Hinging hyperplane-based regression tree for basic data sample in case of greedy algorithm

hyperplane function, the second level contains two hinging hyperplane functions,
the third level contains four hinges, and in general the kth level contains 2(k−1)
hinging hyperplane functions.
Concluding the above and obtaining the parameters θ during recursive identifica-
tion, the following cost function has to be minimized:

K
E({θi }, π ) = πi E m i (θi ), (2.21)
i=1

where K is the number of the hinge functions (nodes), and π is the binary (πi ∈ 0, 1)
terminal set, indicating that the given node is a final linear model (πi = 1), and can
be incorporated as a terminal node of the identified piecewise model.
A growing algorithm can be either balanced or greedy. In the balanced case the
identification algorithm builds the tree till the desired stopping criteria, while the
greedy one continues the tree building by choosing a node for splitting that performs
worst during the building procedure. Hence, this operating regime needs more local
models for better model performance. For a greedy algorithm, the crucial item is the
selection of a good stopping criterion. Any of the following can be used to determine
whether to continue the tree growing process or stop the procedure:
1. The loss function becomes zero. This corresponds to the situation where the size
of the data set is less than or equal to the dimension of the hinge. Since the hinging
hyperplanes are located by linear least squares, when the number of data points is
2.2 Hinging Hyperplane-Based Binary Trees 19

equal to the number of parameters to be determined, the result is exact, provided

the matrix is not singular.
N 2
2. E = E 1 + E 2 , where E i = k=1 μi,k yk − f i (xk ; θi ) represents the perfor-
mances of the left-and right-hand side models of the hinge. During the growth of
the binary tree, the loss function is always nonincreasing, so E should be always
smaller than the performance of the parent node. When no decrease is observed
in the loss function, the growing should be stopped.
3. The tree-building process reaches the predefined tree depth.
4. All of the identified terminal nodes’ performances meet an accuracy level (ε, the
error rate). In this case, it is not necessary to specify the depth of the tree, but it
can cause overfitting of the model.
The algorithm results are represented in Fig. 2.4, where L = 3, K = 5, and π =
[0, 0, 1, 1, 1]. In Fig. 2.5 a three-dimensional example is shown. The function

sin x12 + x22 + ε
y= (2.22)
x12 + x22 + ε

Fig. 2.5 Modeling a 3D function with hinging hyperplane-based tree

20 2 Interpretability of Hinging Hyperplanes

has been approximated by a hinging hyperplane-based tree. In Fig. 2.5 it is shown

how the approximation becomes much more smooth with applying one, two, and
four levels and greedy building method.
The generation of the binary tree structured model is not the only thing important;
to construct a greedy algorithm and to measure the identified model, node perfor-
mance must be determined during the identification procedure. It can be defined in
different ways:

• Modeling performance of the nodes

The well-known regression performance estimators can be used for node perfor-
mance measurement; in this work, root mean squared prediction error (RMSE)
was used:

1 N
RMSE = (yk − ŷk )2 . (2.23)
N
k=1

• Condition of the nodes

It is described that with two prototype clusters (c = 2) and apriori knowledge

(constraints), the FCRM method is able to identify hinging hyperplanes; hence,
membership degree μi,k has information at a node about how many data points
are going to the slitted prototypes. Comparing this data with the information about
the hinge-based node splitting rule (how many data samples are described by the
θ1 , θ2 parameter vectors), we can assign a certain condition (ρ) to a node:
Nn
m 1 − k=1 μ1,k
ρn = 1 − 2 N , (2.24)
k=1 μi,k
n
i=1

where m 1 is the cardinality based on θ + , while Nn represents the number of

samples at node πi .

We can consider ρ as a measurement of the FCRM hinging hyperplane identifi-

cation perfection. The closer ρ is to 1, the better the identification. The hinge does
not “override” the μi,k membership degrees. This measure is very similar to the one
that was introduced in [49] and was used for identifying parameter similarity. In
Figs. 2.6 and 2.7 RMSE and ρ results of identifying Eq. 2.22 node by node (axis x)
with the depth of four levels can be seen. In Fig. 2.6, the non-greedy case is examined,
while Fig. 2.7 shows the performance of the greedy algorithm. It is apparent that for
tree-building purposes, cardinality-based splitting is a very good approach.
2.2 Hinging Hyperplane-Based Binary Trees 21

0.9

0.8

0.7
Condition of the nodes

0 5 10 15

6
RMSE of the nodes

0
0 5 10 15

Fig. 2.6 Node by node ρ and RMSE results for non-greedy tree building

0.9

0.8

Condition of the nodes

0.7
1 2 3 4 5 6 7 8 9 10 11

6
RMSE of the nodes

0
1 2 3 4 5 6 7 8 9 10 11

Fig. 2.7 Node by node ρ and RMSE results for greedy tree building
22 2 Interpretability of Hinging Hyperplanes

2.3 Application Examples

Accuracy and transparency of the proposed algorithm are shown based on multiple
datasets, two real life and two synthetic ones, followed by examples from the area
of dynamic system identification.

2.3.1 Benchmark Data

All datasets have been used before, most of them have originated from well-known
data repositories. Performance of the models is measured by the root mean squared
prediction error (RMSE; see Eq. 2.23).
Real life datasets:
• Abalone Dataset from UCI machine learning repository1 used to predict the age of
abalone from physical measurements. Contains 4,177 cases with eight attributes
(one nominal and seven continuous).
• Kin8nm Data containing information on the forward kinematics of an eight link
robot arm from the DVELVE repository. Contains 8,192 cases with eight contin-
uous attributes.
Synthetic datasets:
• Fried Artificial dataset used by Friedman [50] containing ten continuous attributes
with independent values uniformly distributed in the interval [0, 1]. The value of
the output variable is obtained with the equation:

y = 10 sin(π x1 x2 ) + 20(x3 − 0.5)2 + 10x4 + 5x5 + σ (0, 1). (2.25)

• 3DSin Artificial dataset containing two continuous predictor attributes uniformly

distributed in interval [−3, 3], with the output defined as

y = 3 sin(x1 ) sin(x2 ). (2.26)

3,000 data points were generated using these equations.

For the robust testing of the performance of the model-building algorithm, ten-
fold cross-validation method is utilized with data normalized to zero mean and unit
variance. Table 2.1 shows the performance on these datasets compared to the results
of other algorithms and can be found in [38]. Moreover, Table 2.2 contains the min,
mean, and max error rates of the ten-fold cross-validation results for the hinging
hyperplane algorithm with the calculated standard deviation values. The compara-
tive algorithms are fuzzy-based (FRT-Fuzzy Regression Tree, FMID-Fuzzy Model

1 FTP address: ftp://ftp.ics.uci.edu/pub/machine-learning-databases/.

2.3 Application Examples 23

Table 2.1 Comparison of RMSE results of different algorithms

Data Sample HH CART FMID FRT
Fried Train 0.87 0.84 2.41(4) 0.69
Test 0.92(8) 2.12(495.6) 2.41(12) 0.7(15)
3Dsin Train 0.17 0.09 0.50(4) 0.18
Test 0.18(11) 0.17(323.1) 0.31(12) 0.18(12)
Abalone Train 2.62 1.19 2.20(4) 2.18
Test 2.88(8) 2.87(664.8) 2.19(12) 2.19(4)
Kinman Train 0.15 0.09 0.20(4) 0.15
Test 0.16(6) 0.23(453.9) 0.20(12) 0.15(20)
Numbers in brackets are the number of models

Table 2.2 10-fold cross validation report for hinging hyperplanes based tree
Data Sample MIN MEAN MAX Standard dev.
Fried Train 0.5822 0.8677 1.2107 0.227
Test 0.6226 0.9208 1.2673 0.2337
3Dsin Train 0.0906 0.1741 0.3162 0.0714
Test 0.0838 0.178 0.342 0.0801
Abalone Train 2.3496 2.6241 2.9256 0.1532
Test 2.3242 2.8803 3.451 0.3445
Kinman Train 0.1433 0.1515 0.1595 0.0054
Test 0.1464 0.1579 0.1729 0.0092

Identification) and classical regression tree-based (CART-Classification and Regres-

sion Tree). It can be concluded that the performance of the introduced algorithm is in
line with even the other methods, that of with a moderate number of terminal nodes
in the identified model tree. Results are consistent; even for the worst performance
of the ten-fold cross-validation.

2.3.2 Application to Dynamical Systems

The following subsection shows results on the identification first-order nonlinear

dynamic system and describes performance of the proposed technique in model
predictive control.
24 2 Interpretability of Hinging Hyperplanes

Fig. 2.8 Identification of the 1.2

Box-Jenkins gas furnace original
hinge
model with hinging 1 sjobergs
hyperplanes
0.8

0.6

0.4

0.2

−0.2
0 50 100 150 200 250 300

2.3.2.1 Identification of the Box-Jenkins Gas Furnace

The well-known Box-Jenkins furnace data benchmark is used to illustrate the pro-
posed modeling approach and to compare its effectiveness with other methods. The
data set consists of 296 pairs of input-output observations taken from a laboratory
furnace with a sampling time of nine seconds. The process input is the methane flow
rate and the output is the percentage of CO2 in the off-gas. A number of researchers
concluded that a proper structure of a dynamic model for this system is

y(k + 1) = f (y(k), u(k − 3)). (2.27)

The approximation power of the model can be seen in Fig. 2.8 and Table 2.3.
Comparing its results with those of other techniques in [51] can conclude that the
modeling performance is in line with that of other techniques with a moderate number
of identified hinging hyperplanes.
So far, a general nonlinear modeling technique was presented and a new iden-
tification approach was given for hinging hyperplane-based nonlinear models:

y = f (x(k), θ ), where f (.) represents the hinging hyperplane-based tree structured
model and x(k) represents the input vector of the model. To identify a discrete-time
input-output model for a dynamical system, the dynamic model structure has to be

Table 2.3 RMSE results of the generated models

Method Training Testing Free run HH
Proposed 0.0266 0.0311 0.0374 4
technique
Sjoberg model 0.0336 0.0342 0.0351 4
2.3 Application Examples 25

chosen or determined beforehand. A possible structure often applied is the nonlinear

autoregressive model with exogenous input (NARX), where the input vector of the
model x(k) contains the delayed inputs and outputs of the system to be modeled
[32]. In several practical cases a simpler and more specific model structure can be
used to approximate the behavior of the system, which fits better the structure of the
system. Therefore, it can be an advantage for the identification approach (models
with simpler structures can be identified), and this model can be more accurate. One
such special case of the NARX model is the Hammerstein model, where the same
static nonlinearity f is defined for all of the delayed control inputs (for the sake of
simplicity, SISO models are considered):

na
nb

y= ai y(k − i) + b j , f (u(k − j)) (2.28)
i=1 j=1

where y() and u() are the output and input of the system, respectively, and n a and
n b are the output and input orders of the model. The parameters of the blocks of
the Hammerstein model (static nonlinearity and linear dynamics) can be identified
by the proposed method simultaneously if the same linear dynamic behavior can be
guaranteed by all of the local hinging hyperplane-based models. It can be done in an
elegant way utilizing the flexibility of the proposed identification approach: global
constraints can be formulated for the ai and b j parameters of the local models (for
a detailed discussion on what constraints have to be formulated, see [32]). In the
following, the hinging hyperplane modeling technique is applied on a Hammerstein
type system. It will be shown why it is an effective tool for the above-mentioned
purpose.

2.3.2.2 Application to Model Predictive Control

The modeling of a simulated water heater (Fig. 2.9) is used to illustrate the advantages
of the proposed hinging hyperplane-based models. The water flows through a pair
of metal pipes containing a cartridge heater.
The outlet temperature, Tout , of the water can be varied by adjusting the heating
signal, u, of the cartridge heater (see [32] or Appendix D for details). The performance
of the cartridge heater is given by:

sin(2π u)
Q(u) = Q M u − (2.29)
2π

where Q M is the maximal power and u is the heating signal (voltage). As the equation
above shows, the heating performance is a static nonlinear function of the heating
signal. Hence, the Hammerstein model is a good match to this process. The aim
is to construct a dynamic prediction model from data for the output temperature
(the dependent variable, y = Tout ) as a function of the control input: the heating
26 2 Interpretability of Hinging Hyperplanes

Fig. 2.9 The water heater

signal. The parameters of the Hammerstein model were chosen as n a = n b = 2. The

performance of this modeling technique will be compared to that of linear and feed-
forward neural network models. The modeling performances can be seen in Table 2.4.
Modeling errors were also calculated based on RMSE (see Eq. (2.23)). In this exam-
ple, a hinging hyperplane function-based tree with four leaves was generated. For
robust testing of the model-building algorithm performance, ten-fold cross-validation
method was used. For comparison, a feedforward neural net and linear model was
also trained and tested using the same data. The neural net contains one hidden layer
with 4 neurons using tanh basis functions. As can be seen from the results, the train-
ing and test errors are comparable with the errors of the proposed method. A very
rigorous test of NARX models is free run simulation because the errors can be cumu-
lated. It can be also seen in Fig. 2.10 that the identified models (the proposed ones,
linear models and the neural nets) perform very good also in free run simulation (the
system output and the simulated ones can hardly be distinguished). Although the
neural net seems to be more robust in this example, the proposed hinging hyperplane
model is much more interpretable than the neural net [1]. This confirms that both

Table 2.4 RMSE results of the generated models

Method Training Testing Free run
Linear model 0.0393 0.0449 0.387
Neural network 0.0338 0.0403 0.356
Proposed method 0.0367 0.0417 0.359
2.3 Application Examples 27

34
Test data
32 Hinge h.
Neural network.
30 Linear model.

28
Temperature[°C]

22
20

14
0 50 100 150 200 250 300 350 400 450 500
Simulation time [sec]

Fig. 2.10 Free run simulation of the water heater (proposed hinging hyperplane model, neural
network, linear model)

the proposed clustering-based constrained optimization strategy and the hierarchical

model structure have advantages over the classical gradient-based optimization of
global hinging hyperplane models.
In the following, this model will be applied for model predictive control. Details of
model-based control of fuzzy and operating regime models can be found in [41, 42].
Among the wide range of possible solutions, a model predictive controller (MPC)
was designed. Figure 2.11 shows the structure of an MPC controller.

Linearized model parameters

Controller Hinge model

design linearization

Controller parameters

w y
u
Controller Process

Fig. 2.11 Structure of the MPC controller

28 2 Interpretability of Hinging Hyperplanes

Real-time control needs low computational complexity. Hence a time-varying

linear MPC has been designed based on time-varying parameters of a linear model
extracted at every time instant from the regression tree. This scheme has been widely
studied and similarities to the nonlinear optimization based control solutions includ-
ing convergence have also been shown. In [32] it was shown that there are two ways
to obtain a linear model from a nonlinear operating regime-based (fuzzy) model. Tay-
lor expansion-based linearization assumes global interpretation of the model, while
the linear parameter varying (LPV) model extraction approach considers the model
as an interpolating system between local linear time-invariant (LTI) models were
the dynamic effect of the interpolation is negligible. Fortunately, thanks to hinging
hyperplane models and tree-structured representation, the proposed model perfectly
supports the interpretation, that hinging hyperplanes define local linear models and
their operating regimes. Since these local models do not overlap, the negative effect
of the interpolating functions do not have to be taken into account.
The classical model predictive controller computes an optimal control sequence
by minimizing the following cost quadratic cost function:

Hp
2
Hc
J H p , Hc , λ = w (k + j) − ŷ (k + j) + λ Δu 2 (k + j − 1) , (2.30)
j=1 j=1

where ŷ (k + j) denotes the predicted process output, H p is the maximum costing

or prediction horizon, Hc is the control horizon, and λ is a weighting coefficient.
According to the receding horizon principle, only the first element of the optimized
control sequence is applied u(k), and this optimization is performed at every time
instant. This scheme allows real-time control, provides feedback on model errors,
handles unmeasured disturbances, and supports the previously mentioned iterative
liberalization scheme. Details about the convergence and possible extensions of this
control scheme can be found in [33].
The key equation of MPC is the prediction of the model:

ŷ = SΔū + p, (2.31)

where the model prediction equation is given in its vector-based form as Δū =
[Δu(k), . . . , Δu(k + Hc )] , and p = p1 , p2 , . . . , p H p and ŷ = [ ŷ
(k + 1), . . . , ŷ(k + H p )] and the S containing the parameters of a step-response
model is an H p × Hc matrix with 0 entries si, j for j − i > 1:
⎡ ⎤
s1 0 0 0
⎢ s2 s1 0 0 ⎥
⎢ ⎥
S=⎢ .. .. ⎥. (2.32)
⎣ . . ⎦
s H p s H p −1 · · · s H p −Hc
2.3 Application Examples 29

When constraints are considered, the minimum of the cost function can be found
by quadratic optimization with linear constraints:

minū (SΔū + p − w)T (SΔū + p − w) + λΔūT Δū , (2.33)

1 T
minū Δū HΔū + dΔū ,
2

with H = 2 ST S + λI , d = −2 ST (w − p) , where I is an (Hc × Hc ) unity
matrix.
The constraints defined on u and Δu can be formulated with the following inequal-
ity: ⎛ ⎞ ⎛ ⎞
IΔū umax − Iū u(k − 1)
⎜ −IΔū ⎟ ⎜ −u min + Iū u(k − 1) ⎟
⎜ ⎟ ⎜ ⎟,
⎝ I Hc ⎠ Δū ≤ ⎝ Δumax ⎠ (2.34)
−I Hc −Δu min

where I Hc and Iū is an Hc × Hc unity matrix, IΔū is an Hc × Hc lower triangular

matrix with all elements equal to 1, and Δumin , u max , umin , umax are Hc -vectors,
with the constraints Δu min , Δu max , u min , u max , respectively.

30
Temp [°C]

10
0 100 200 300 400 500 600 700 800 900 1000
Time [sec]

0.8

0.6
u [−]

0.4

0.2

0
0 100 200 300 400 500 600 700 800 900 1000
Time [sec]

Fig. 2.12 Performance of the MPC based on linear model

30 2 Interpretability of Hinging Hyperplanes

30
Temp [°C]

10
0 100 200 300 400 500 600 700 800 900 1000
Time [sec]

0.8

0.6
u [−]

0.4

0.2

0
0 100 200 300 400 500 600 700 800 900 1000
Time [sec]

Fig. 2.13 Performance of the MPC based on neural network model

To handle modeling error, the MPC is applied in the well-known internal model
control (IMC) scheme where the setpoint of the controller is shifted by the filtered
modeling error. For this purpose, a first-order linear filter is used:

em f (k) = αem (k) + (1 − α) em f (k − 1) , (2.35)

where 0 ≤ α < 1 is determined such that a compromise between performance and

robustness is achieved. Effective suppressing of the steady-state modeling error can
be achieved by proper tuning of this filter. The best parameters are found for the
controller: H p = 9, Hc = 2, λ = 20 and α = 0.95. Simulation results for the
hinging hyperplane based model, the affine neural network model and the linear
model are shown in Figs. 2.12, 2.13 and 2.14.
At the operation region edges, the MPC based on the linear model resulted in
undesirable overshoots and undershoots. This is a direct consequence of a bad esti-
mation of the nonlinear gain of the system in these regions. This overestimation of
the system gain by the linear model is also seen in the sluggish control action. In
contrast, the MPC based on the nonlinear models shows superior performance over
the whole operating region. Among these, the MPC based on the hinging hyperplane
model results in the smallest overshoot with the fastest change in the control signal.
2.3 Application Examples 31

30
Temp [°c]

15
0 200 400 600 800 1000
Time [sec]

0.8

0.6
u [−]

0.4

0.2

0
0 200 400 600 800 1000
Time [sec]

Fig. 2.14 Performance of the MPC based on hinging hyperplane

Table 2.5 Simulation results (SSE, sum squared tracking error; CE, sum square of the control
actions)
The applied model in GPC SSE CE
Linear model 1085 1.61
Neural network model 956 1.39
Hinging hyperplane model 966 0.58

Notice also that the oscillatory behavior of the neural network model-based MPC
is due to the bad prediction of the steady-state gain of the system around the middle
region. However, as can be seen from Table 2.5, both nonlinear models achieved
approximately the same summed squared tracking error (SSE), although a smaller
control effort (CE) was needed for the hinging hyperplane-based MPC.

2.4 Conclusions

A novel framework for hinging hyperplane-based data-driven modeling has been

developed. Fuzzy c-regression clustering can be used to identify the parameters of
two hyperplanes. A hierarchical regression tree is obtained by the recursive cluster-
32 2 Interpretability of Hinging Hyperplanes

ing of the data. The complexity of the model is controlled by the proposed model
performance measure. The resulting piecewise linear model can be effectively used
to represent nonlinear dynamical systems. The resulting linear parameter varying
(LPV) model can be easily utilized in model-based control.
To illustrate the advantages of the proposed approach, benchmark datasets were
modeled and a simulation example presented for the identification and model pre-
dictive control of a laboratory water heater.
The results show that, with the use of the proposed modeling framework, accurate
and transparent nonlinear models can be identified since the complexity and the
accuracy of the model can be easily controlled. The local linear models can be
easily interpreted and utilized to represent operating regimes of nonlinear dynamical
systems. Based on this interpretation, effective model-based control applications can
be designed.
http://www.springer.com/978-3-319-21941-7

Multivariable Functional Interpolation and Adaptive Networks
No ratings yet
Multivariable Functional Interpolation and Adaptive Networks
35 pages
Sigma-Point Kalman Filters For Probabilistic Inference in Dynamic State-Space Models - Van Der Merve
No ratings yet
Sigma-Point Kalman Filters For Probabilistic Inference in Dynamic State-Space Models - Van Der Merve
398 pages
Jerome H. Friedman
No ratings yet
Jerome H. Friedman
44 pages
Bayesian Inference On Change Point Problems
No ratings yet
Bayesian Inference On Change Point Problems
71 pages
Gaussian Processes For Machine
No ratings yet
Gaussian Processes For Machine
62 pages
2023 - Barber Et Al - Conformal Prediction Beyond Exchangeability
No ratings yet
2023 - Barber Et Al - Conformal Prediction Beyond Exchangeability
63 pages
Function Approximation: A Gradient Boosting Machine.
No ratings yet
Function Approximation: A Gradient Boosting Machine.
45 pages
Multi-Task Gaussian Processes
No ratings yet
Multi-Task Gaussian Processes
49 pages
Fuzzy Clustering To Classify Several Time Series Models With Fractional Brownian Motion Errors
No ratings yet
Fuzzy Clustering To Classify Several Time Series Models With Fractional Brownian Motion Errors
9 pages
RAMPFIT Fortran Technical Report Mudelsee 1999
No ratings yet
RAMPFIT Fortran Technical Report Mudelsee 1999
66 pages
A Recursive Local Polynomial Approximation Method Using Dirichlet Clouds and Radial Basis Functions
No ratings yet
A Recursive Local Polynomial Approximation Method Using Dirichlet Clouds and Radial Basis Functions
26 pages
Conformalized Adaptive Forecasting of Heterogeneous Trajectories
No ratings yet
Conformalized Adaptive Forecasting of Heterogeneous Trajectories
52 pages
January 19, 2010 16:20 WSPC/244-AADA 00035: A Sparse Greedy Self-Adaptive Algorithm For Classification of Data
No ratings yet
January 19, 2010 16:20 WSPC/244-AADA 00035: A Sparse Greedy Self-Adaptive Algorithm For Classification of Data
18 pages
Chapter 3. Radial Basis Function Networks
No ratings yet
Chapter 3. Radial Basis Function Networks
24 pages
Document 2
No ratings yet
Document 2
24 pages
Signal Segmentations
No ratings yet
Signal Segmentations
39 pages
Set Membership Adaptive Filtering
No ratings yet
Set Membership Adaptive Filtering
41 pages
Switching Regression Models and Fuzzy Clustering: Richard C
No ratings yet
Switching Regression Models and Fuzzy Clustering: Richard C
10 pages
Support Vector Machines: Comprehensive Problem Set: KFG December 19, 2024
No ratings yet
Support Vector Machines: Comprehensive Problem Set: KFG December 19, 2024
35 pages
Paper I
No ratings yet
Paper I
12 pages
Identification Methods of Nonlinear Systems Based On The Kernel Functions
No ratings yet
Identification Methods of Nonlinear Systems Based On The Kernel Functions
16 pages
Stats 205 Notes
No ratings yet
Stats 205 Notes
99 pages
ISATIS 2013 Technical References
100% (1)
ISATIS 2013 Technical References
260 pages
Adaptive Conformal Regression With Jackknife+ Rescaled Scores
No ratings yet
Adaptive Conformal Regression With Jackknife+ Rescaled Scores
24 pages
Sowell Maximum Likelihood Estimation
No ratings yet
Sowell Maximum Likelihood Estimation
24 pages
Module 2
No ratings yet
Module 2
97 pages
Isatis Techrefs
No ratings yet
Isatis Techrefs
290 pages
FMPC: A Fast Implementation of Model Predictive Control M. Canale M. Milanese
No ratings yet
FMPC: A Fast Implementation of Model Predictive Control M. Canale M. Milanese
6 pages
1336 On Perfect Clustering For
No ratings yet
1336 On Perfect Clustering For
31 pages
Engineering Applications of Arti Ficial Intelligence: Zeineb Lassoued, Kamel Abderrahim
No ratings yet
Engineering Applications of Arti Ficial Intelligence: Zeineb Lassoued, Kamel Abderrahim
9 pages
Parameter Estimation of Switching Systems
No ratings yet
Parameter Estimation of Switching Systems
11 pages
Fitting The Nonlinear Systems Based On The Kernel Functions Through Recursive Search
No ratings yet
Fitting The Nonlinear Systems Based On The Kernel Functions Through Recursive Search
12 pages
CHP 1curve Fitting
No ratings yet
CHP 1curve Fitting
21 pages
Change Point Detection in Time Series Data With Random Forests
No ratings yet
Change Point Detection in Time Series Data With Random Forests
13 pages
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
No ratings yet
The Annals of Statistics 10.1214/009053606000000830 Institute of Mathematical Statistics
22 pages
Mathematics 07 00558
No ratings yet
Mathematics 07 00558
18 pages
RBF NN Examples
No ratings yet
RBF NN Examples
7 pages
Fuzzy C-Regression Model With A New Cluster Validity Criterion
No ratings yet
Fuzzy C-Regression Model With A New Cluster Validity Criterion
6 pages
Learning Multidimensional Fourier Series With Tensor Trains
No ratings yet
Learning Multidimensional Fourier Series With Tensor Trains
6 pages
Datascience
No ratings yet
Datascience
14 pages
Qin 等 - 2024 - Distribution-Free Prediction Intervals Under Covariate Shift, With an Application to Causal Inferenc
No ratings yet
Qin 等 - 2024 - Distribution-Free Prediction Intervals Under Covariate Shift, With an Application to Causal Inferenc
14 pages
313 Identifying Homogeneous and in
No ratings yet
313 Identifying Homogeneous and in
15 pages
Machine Learning Based System Identification With
No ratings yet
Machine Learning Based System Identification With
9 pages
Mathematical Problems in Engineering - 2014 - Wang - Filtering Based Recursive Least Squares Algorithm For Multi Input
No ratings yet
Mathematical Problems in Engineering - 2014 - Wang - Filtering Based Recursive Least Squares Algorithm For Multi Input
10 pages
Modeling Systems With Machine Learning Based Differential Equations
No ratings yet
Modeling Systems With Machine Learning Based Differential Equations
12 pages
Data Filtering Based Multi-Innovation Gradient Identification Methods For Feedback Nonlinear Systems
No ratings yet
Data Filtering Based Multi-Innovation Gradient Identification Methods For Feedback Nonlinear Systems
2 pages
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
No ratings yet
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
15 pages
Alexandridis 2015
No ratings yet
Alexandridis 2015
5 pages
A Rapid Learning and Dynamic Stepwise Updating Algor
No ratings yet
A Rapid Learning and Dynamic Stepwise Updating Algor
11 pages
Limits On Bandlimited Signals
No ratings yet
Limits On Bandlimited Signals
10 pages
Wu2019 W-NOISE
No ratings yet
Wu2019 W-NOISE
4 pages
IPAM Splines
No ratings yet
IPAM Splines
48 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Model Predictive Control Based On Adaptive Hinging Hyperplanes Model
No ratings yet
Model Predictive Control Based On Adaptive Hinging Hyperplanes Model
11 pages
Heart Disease Prediction Using Machine Learning Report
50% (2)
Heart Disease Prediction Using Machine Learning Report
45 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Nptel Bia All
No ratings yet
Nptel Bia All
42 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
AIML Domestic Executive Brochure Dec 10 2024
No ratings yet
AIML Domestic Executive Brochure Dec 10 2024
25 pages
Nptel Assignment Answers
No ratings yet
Nptel Assignment Answers
52 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
(MCQ) Data
No ratings yet
(MCQ) Data
8 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
Intro To Data Minning
No ratings yet
Intro To Data Minning
24 pages
Librosa: Audio and Music Signal Analysis in Python: Jones01 Vanderwalt11
No ratings yet
Librosa: Audio and Music Signal Analysis in Python: Jones01 Vanderwalt11
8 pages
B Lab Manual Machine Learning SEM-7 CSE 2024
No ratings yet
B Lab Manual Machine Learning SEM-7 CSE 2024
49 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
No ratings yet
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
12 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Chapter - 5 Machine Learning
0% (1)
Chapter - 5 Machine Learning
25 pages
Minor Project Report
No ratings yet
Minor Project Report
50 pages
Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
No ratings yet
Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
8 pages
Intrusion Detection Using Big Data and Deep Learning Techniques
No ratings yet
Intrusion Detection Using Big Data and Deep Learning Techniques
9 pages
A Data Mining Architecture For Distributed Environments: Lecture Notes in Computer Science June 2002
No ratings yet
A Data Mining Architecture For Distributed Environments: Lecture Notes in Computer Science June 2002
13 pages
01 Ml-Overview Slides
No ratings yet
01 Ml-Overview Slides
58 pages
Report Assignment TIS3151 (The Nanobots)
No ratings yet
Report Assignment TIS3151 (The Nanobots)
21 pages
Developing Prediction Model of Loan Risk in Banks Using Data Mining
No ratings yet
Developing Prediction Model of Loan Risk in Banks Using Data Mining
9 pages
AIMA Decision Trees
No ratings yet
AIMA Decision Trees
11 pages
DWDM-unit-4 Ch-8
No ratings yet
DWDM-unit-4 Ch-8
29 pages
Classification and Prediction: Data Mining Concepts and Techniques
No ratings yet
Classification and Prediction: Data Mining Concepts and Techniques
18 pages
Robot Learning Driven by Emotions: Adaptive
No ratings yet
Robot Learning Driven by Emotions: Adaptive
23 pages
Instruction Set
No ratings yet
Instruction Set
22 pages
Fine-Tuning A Tabu Search Algorithm With Statistical Tests: PII: S0969-6016 (98) 00017-3
No ratings yet
Fine-Tuning A Tabu Search Algorithm With Statistical Tests: PII: S0969-6016 (98) 00017-3
12 pages
Multi-Criteria ABC Analysis Using Artificial-Intelligence Based Classification Techniques
No ratings yet
Multi-Criteria ABC Analysis Using Artificial-Intelligence Based Classification Techniques
3 pages
A Simple and Efficient Eye Detection Method in Color Images: D. Sidibe, P. Montesinos, S. Janaqi
No ratings yet
A Simple and Efficient Eye Detection Method in Color Images: D. Sidibe, P. Montesinos, S. Janaqi
5 pages
Journal On Decision Tree
No ratings yet
Journal On Decision Tree
5 pages

Interpretability of Hinging Hyperplanes

Uploaded by

Interpretability of Hinging Hyperplanes

Uploaded by

Chapter 2

Interpretability of Hinging Hyperplanes

2.1 Identification of Hinging Hyperplanes

2.1.1 Hinging Hyperplanes

where max | min means max or min.

yk = xkT θ1 , yk = xkT θ2 , (2.4)

Fig. 2.1 Basic hinging

yk = max(xkT θ1 , xkT θ2 ) or yk = min(xkT θ1 , xkT θ2 ). (2.5)

1. High computational requirement. The Gauss-Newton method is computationally

2.1.2 Improvements in Hinging Hyperplane Identification

The proposed identification algorithm applies a much simpler optimization method,

where m ∈ 1, ∞) denotes a weighting exponent which determines the fuzziness of

until ||U(l) − U(l−1) || < ε.

The optimal parameters θi are then computed by

θi = [XT Φi X]−1 XT Φi y. (2.11)

Applying c = 2 during FCRM identification, these models can be used as base

Fig. 2.2 Hinging hyperplane y

v1 (θ1 − θ2 ) < 0 and v2 (θ1 − θ2 ) > 0 or (2.13)

θ 1,2 < θ 4,2 [ θ 2,1 , θ 2,2 ]

• Relative constraints define the relative magnitude of the parameters of two or

An example of these types of constraints is illustrated in Fig. 2.3.

where Φi contains local membership values and the constraints on θ :

2.2 Hinging Hyperplane-Based Binary Trees

equal to the number of parameters to be determined, the result is exact, provided

Fig. 2.5 Modeling a 3D function with hinging hyperplane-based tree

has been approximated by a hinging hyperplane-based tree. In Fig. 2.5 it is shown

• Modeling performance of the nodes

• Condition of the nodes

It is described that with two prototype clusters (c = 2) and apriori knowledge

where m 1 is the cardinality based on θ + , while Nn represents the number of

We can consider ρ as a measurement of the FCRM hinging hyperplane identifi-

Condition of the nodes

2.3 Application Examples

2.3.1 Benchmark Data

y = 10 sin(π x1 x2 ) + 20(x3 − 0.5)2 + 10x4 + 5x5 + σ (0, 1). (2.25)

• 3DSin Artificial dataset containing two continuous predictor attributes uniformly

y = 3 sin(x1 ) sin(x2 ). (2.26)

3,000 data points were generated using these equations.

1 FTP address: ftp://ftp.ics.uci.edu/pub/machine-learning-databases/.

Table 2.1 Comparison of RMSE results of different algorithms

Identification) and classical regression tree-based (CART-Classification and Regres-

2.3.2 Application to Dynamical Systems

The following subsection shows results on the identification first-order nonlinear

Fig. 2.8 Identification of the 1.2

2.3.2.1 Identification of the Box-Jenkins Gas Furnace

y(k + 1) = f (y(k), u(k − 3)). (2.27)

Table 2.3 RMSE results of the generated models

chosen or determined beforehand. A possible structure often applied is the nonlinear

2.3.2.2 Application to Model Predictive Control

Fig. 2.9 The water heater

signal. The parameters of the Hammerstein model were chosen as n a = n b = 2. The

Table 2.4 RMSE results of the generated models

the proposed clustering-based constrained optimization strategy and the hierarchical

Linearized model parameters

Controller Hinge model

Fig. 2.11 Structure of the MPC controller

Real-time control needs low computational complexity. Hence a time-varying

where ŷ (k + j) denotes the predicted process output, H p is the maximum costing

where I Hc and Iū is an Hc × Hc unity matrix, IΔū is an Hc × Hc lower triangular

Fig. 2.12 Performance of the MPC based on linear model

Fig. 2.13 Performance of the MPC based on neural network model

em f (k) = αem (k) + (1 − α) em f (k − 1) , (2.35)

where 0 ≤ α < 1 is determined such that a compromise between performance and

Fig. 2.14 Performance of the MPC based on hinging hyperplane

A novel framework for hinging hyperplane-based data-driven modeling has been

You might also like

where max | min means max or min.

where m ∈ 1, ∞) denotes a weighting exponent which determines the fuzziness of