0% found this document useful (0 votes)
10 views

Machine Learning Based System Identification With

Uploaded by

Gary Rey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Machine Learning Based System Identification With

Uploaded by

Gary Rey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Machine Learning Based System

Identification with Binary Output Data


Using Kernel Methods
Rachid Fateh1 , Hicham Oualla2 , Es-said Azougaghe1 , Anouar Darif1 , Ahmed Boumezzough1 ,
Said Safi1 , Mathieu Pouliquen3 , and Miloud Frikel3
1
Sultan Moulay Slimane University, Beni Mellal, Morocco,
2
Akkodis, Paris, France,
3
Normandie University, Caen, France

https://doi.org/10.26636/jtit.2024.1.1430

Abstract  Within the realm of machine learning, kernel meth- structure that aligns with the statistical properties of the sig-
ods stand out as a prominent class of algorithms with widespread nal further enhances its effectiveness.
applications, including but not limited to classification, regres- Linear adaptive filters are widely used in various fields, such
sion, and identification tasks. Our paper addresses the chal- as communications, control systems, and biomedical signal
lenging problem of identifying the finite impulse response (FIR)
processing, due to their versatility and effectiveness in han-
of single-input single-output nonlinear systems under the in-
fluence of perturbations and binary-valued measurements. To dling complex signals. They are often used in conjunction
overcome this challenge, we exploit two algorithms that leverage with other signal processing techniques, such as Fourier anal-
the framework of reproducing kernel Hilbert spaces (RKHS) to ysis and wavelet transforms, to enable more sophisticated
accurately identify the impulse response of the Proakis C chan- signal processing approaches [1]. For instance, in the field
nel. Additionally, we introduce the application of these kernel of biomedical signal processing, adaptive filtering has been
methods for estimating binary output data of nonlinear systems. utilized to remove noise and artifacts from electroencephalo-
We showcase the effectiveness of kernel adaptive filters in identi- gram (EEG) signals [8]. In communication systems, adaptive
fying nonlinear systems with binary output measurements, as filters have been applied to mitigate channel impairments and
demonstrated through the experimental results presented in this
improve signal quality. In control systems, adaptive filtering
study.
has been used to identify and estimate system parameters,
Keywords  finite impulse response, kernel adaptive filtering, and to compensate for time-varying disturbances [9].
nonlinear systems identification, Proakis C channel Machine learning (ML) algorithms, falling within the cat-
egory of kernel methods, find extensive application across
various tasks, such as classification, regression, channel iden-
1. Introduction tification, and more [10], [11]. These methods function by
mapping data into a higher-dimensional space, allowing for
Linear adaptive filters represent a distinct category of digi- more effective separation and analysis, without the need to
tal filters known for their capacity to adapt their parameters explicitly calculate the coordinates in that space [12]–[15].
based on input data. These filters are widely employed in sig- These methods are based on the concept that a decision bound-
nal processing applications, including tasks such as noise ary in the reproducing kernel Hilbert space (RKHS) [12] can
reduction, echo cancellation, and equalization [1]–[4]. The be represented as a linear boundary in a lower-dimensional
fundamental concept behind linear adaptive filters revolves space, making it possible to capture complex, non-linear rela-
around employing an algorithm that continually updates the tionships between input features and the output variable [16].
filter coefficients in response to variations in the input sig- Support vector machines (SVMs) are one of the most widely
nal [5], [6]. used kernel methods, especially for classification tasks [17].
The widely adopted algorithm for this purpose is the least Kernel ridge regression, on the other hand, is often used for
mean squares (LMS) algorithm [7], which iteratively refines regression tasks due to its ability to capture non-linear rela-
the filter coefficients to minimize the mean squared error tionships between variables. Additionally, kernel principal
between the filter’s output and the desired output. The per- component analysis (PCA) is used for data analysis, allowing
formance of linear adaptive filters hinges on various factors, for non-linear feature extraction and dimensionality reduc-
such as the selection of the filter structure, the choice of the tion [18].
algorithm for coefficient updates, and the design of the input In addition to their performance, kernel methods are also
signal. In general, these filters demonstrate optimal perfor- highly interpretable, allowing users to understand how the
mance when the input signal remains relatively stationary or algorithm is making predictions and adjust the model ac-
changes gradually over time. Additionally, selecting a filter cordingly. For example, SVMs can be visualized by plotting
JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024 This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
For more information, see https://creativecommons.org/licenses/by/4.0/
17
Rachid Fateh, Hicham Oualla, Es-said Azougaghe, Anouar Darif, Ahmed Boumezzough, Said Safi, Mathieu Pouliquen, and Miloud Frikel

the decision boundary in the input feature space, which can


provide insight into the characteristics of the data and the x(n) FIR v(n) d(n)
g (x(n) )
model [19]. {h(i)}
Recently, kernel methods have been successfully applied in b(n)
channel identification tasks, particularly in the context of
blind channel identification, where the channel parameters
are estimated without any prior knowledge of the channel. Fig. 1. Block diagram of the Hammerstein system.
Kernel-based blind channel identification methods typically
use a kernel function to map the received signal into a high- observed input-output data. In this context, we focus on
dimensional space, where the channel parameters can be the Hammerstein system depicted in Fig. 1. This system
estimated using linear regression techniques. The estimated comprises a nonlinear static function followed by a FIR filter
parameters can then be used to equalize the received signal with a known order. This structure is chosen for its ability
and improve the accuracy of the communication system [20]– to effectively represent both nonlinear and linear dynamics
[24]. in a system, offering sufficient flexibility and interpretability
At present, there are many adaptive kernel filtering algorithms during the identification process.
that have been exploited for channel identification in wireless As shown in Fig. 1, the desired system output can be obtained
communication systems. Some of them are described below. using the following expression:
Kernel least mean squares (KLMS) is a kernel-based adaptive  L−1
v(n) = P h(i) g x(n − i)
filter that can be used for channel identification in wireless
i=0 , (1)
communication systems [25]. In KLMS, a kernel function is 
d(n) = v(n) + b(n), n = 0, 1, 2, . . . , N
used to map the input data into a higher-dimensional space,
where the linear regression problem is easier to solve. The where x(k) is the input signal, h(i), i = 0, 1, . . . , L − 1 rep-
KLMS algorithm updates the filter coefficients based on the resents the channel impulse response, L refers to the FIR
difference between the predicted output and the actual out- system order, g(.) denotes the nonlinearity and b(k) is the
put. The Gaussian kernel is a popular choice for this purpose. measurement noise.
Kernel normalized least mean squares (KNLMS) is a variant The Hammerstein system was adopted under the following
of KLMS that includes a normalization factor in the update fundamental assumptions:
rule. This helps prevent the filter coefficients from becoming • the input sequence x(n) is an independent and identically
too large and unstable [26]. KNLMS can be used for channel distributed (i.i.d.) bounded random process characterized
identification in wireless communication systems, and it has by a zero mean,
been proven to be effective in reducing computational com- • the additive noise, represented as b(n), is proposed to be
plexity of KLMS. Gaussian and independent of both x(n) and d(n) (both are
Kernel extended improved proportionate NLMS (KE- bounded),
IPNLMS) [27] is an algorithm that employs a radial basis
• function g(.) is both invertible and continuous for any finite
function (RBF) kernel to perform an implicit mapping of the
value of x.
data using the kernel trick to estimate the impulse response
The hypotheses listed above are formulated to simplify the
parameters for single-input single-output (SISO) nonlinear
system analysis process and to achieve the best results in
system identification.
terms of mean square error. The primary objective of this
In this paper, we investigate a non-linear system identifica-
paper is to present a comparison of the kernel methods that
tion problem in the presence of noise. Section 2 provides
have been proposed in the literature for identifying the output
a detailed description of the problem. In Section 3, we in-
d generated by Eq. (1).
troduce fundamental notations of kernel methods, followed
by a discussion of four algorithms: LMS, NLMS, KLMS,
and KNLMS. We then evaluate the effectiveness of kernel
methods using binary-valued output by analyzing simulation
3. Kernel Methods
results in Section 4. Our findings are summarized in Section
Here, we introduce kernel methods, a category of techniques
5 which concludes the paper.
that empower us to extend traditional linear algorithms to
handle non-linear data. The fundamental concept underly-
ing kernel methods is the application of linear algorithms
2. System Descriptions to a transformed representation of the data within a higher-
dimensional space. This transformation facilitates the separa-
In this section, we introduce some notations and assumptions tion of data points into classes that were not linearly separable
that will be used throughout the paper. in the original space.
The Hammerstein system, a distinctive nonlinear model, is Kernel methods constitute a category of ML algorithms that
frequently employed in the realm of system identification. leverage kernel functions to map input data into a higher-
System identification aims to create precise mathematical dimensional space. They find primary application in tackling
models that mirror the behavior of real-world systems using classification and regression tasks.

18
JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024
Machine Learning Based System Identification with Binary Output Data Using Kernel Methods

Definition 1. A function κ : X × X 7−→ R is a similarity Gaussian kernel


measure if the following conditions are satisfied: 1
• x, y ∈ X κ(x, y) ­ 0, 0.8
0.6

κ(x)
• x, y ∈ X κ(x, y) = κ(y, x), 0.4
• ∀y ∈ X , y ̸= x κ(x, y) > κ(x, x), 0.2
• κ(x, y) = κ(x, x) ⇔ x = y. −5 −4 −3 −2 −1 0 1 2 3 4
x
3.1. Positive Definite Kernel exp (Gaussian kernel)
3
Theorem 1. Let X be a compact in R (compact = closed and 2.5

κ(x)
bounded) and K : X × X 7−→ R a symmetric function. We 2
also assume that ∀f ∈ L2 (X ): 1.5
Z
1
K(x, y)f (x)f (y)dxdy ⩾ 0 (Mercer condition) . (2) −5 −4 −3 −2 −1 0 1 2 3 4
X x
Then there exists a Hilbert space H and Φ : X −→ H such
that ∀(x, y) ∈ X 2 : Fig. 2. The exponential of a kernel is a kernel.

K(x, y) = Φ(x), Φ(y) . (3)


Gaussian kernel
The function K(x, y) is called positive definite kernel. 1
An equivalent condition for the function K : X × X 7−→ R
κ(x)
to be a definite positive kernel is the following: 0.5
• ∀n ∈ N and {xi } i = 1, . . . , n ⊂ X the Gramm matrix
 
K = [Ki,j ] i = 1, . . . , n = K(xi , xj ) i = 1, . . . , n (4) 0
−5 −4 −3 −2 −1 0 1 2 3 4 5
is positive definite, that is: x
cosh(Gaussian kernel)
∀c ∈ Rn , c ̸= 0, we have c⊤ Kc > 0 . (5) 1.8

Therefore, a valid kernel ensures the existence of H and can 1.6


κ(x)

be expressed as a scalar product in Hilbert space H. A good 1.4


kernel also guarantees the convexity of the quadratic opti- 1.2
mization problem under inequality constraints encountered 1
for SVM. −5 −4 −3 −2 −1 0 1 2 3 4 5
x
3.2. Conditionally Positive Definite Kernel
Fig. 3. The cosh of a kernel is a kernel.
A kernel is conditionally positive definite (CPD) if ∀n ∈ N
and {xi } i = 1, . . . , n ⊂ X the Gramm matrix: • Direct construction (using the Φ projection):
    Direct definition of H, Φ : X 7−→ H and then construc-
K = Ki,j i = 1, . . . , n = K(xi , xj ) i = 1, . . . , n (6)
tion of the kernel:
is conditionally positive definite, i.e.
n
K : X × X 7−→ R by K(x, y) = Φ(x), Φ(y) (9)
n
X ⊤
∀c ∈ R , c ̸= 0 such as ci = 0, we have c Kc > 0. (7)
Example 1. Let X be a compact in R. We consider
i=1
Φ : X 7−→ R, then K : X × X 7−→ R defined by:
This definition extends the class of kernel functions for which
the SVM optimization problem is guaranteed to be convex. K(x, y) = Φ(x) · Φ(y) (10)
Given a positive conditionally defined symmetric kernel, there
is a positive definite kernel.
exists:
Note that these conforming kernels cannot be interpreted
• a vector space V,
as similarities.
• a transformation Φ : X −→ V, Particular cases:
• a bilinear form Q : V × V 7−→ R • f : R 7−→ R, Φ(x) = x : K(x, y) = x · y,
such as: • f : R 7−→ R, Φ(x) = ex : K(x, y) = ex+y .

K(x, y) = Q Φ(x), Φ(y) , (8)
– Transformation of existing kernels:
• if K is not defined positive, then Q is not a scalar product. 1) If K : X × X 7−→ R is positive definite, then the
definition of exp(K) is also positive definite.
3.3. Construction of Positive Definite Kernels
2) If K : X × X 7−→ [−1, 1] is positive definite then
There are several approaches to obtain kernel functions. cosh(K) is positive definite too.
JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024 19
Rachid Fateh, Hicham Oualla, Es-said Azougaghe, Anouar Darif, Ahmed Boumezzough, Said Safi, Mathieu Pouliquen, and Miloud Frikel

• Combination of existing kernels: The basic idea behind the LMS algorithm is to adjust the
If K1 , K2 : X × X 7−→ R are positive definite and weights of a linear filter iteratively, based on the difference
α1 , α2 > 0 then the following kernels are also positive between the predicted output of the filter and the actual
definite: output. The algorithm uses a measure of the error between
– linear combination: K(x, y) = α1 K1 (x, y) + the predicted and actual outputs, called the “error signal”, to
α2 K2 (x, y), update the filter weights in the direction that minimizes the
– simple product: K(x, y) = α1 K1 (x, y) · α2 K2 (x, y). error. The LMS weight update recursion is [7]:
Obviously, K is defined on X × X with values in R. θ(n + 1) = θ(n) + µ e(n) x(n) , (11)
If K1 : X × X 7−→ R and K2 : X × X 7−→ R are positive
where µ is the step size or learning rate, and e(n) is the error
definite then are also positive definite:
L at time n given by:
– direct sum : K1 K2 = K1 + K2 ,
N e(n) = d(n) − θ(n)⊤ x(n) , (12)
– tensor product : K1 K2 = K1 .K2 .
where d(n) is the desired output at time n.
3.4. Examples of Kernels
3.6. NLMS Algorithm
In this subsection, we aim to illustrate several examples of
kernels that are widely used across diverse applications. Ker- The NLMS algorithm is a variation of the LMS algorithm
nels play a crucial role in various machine learning and that improves its performance by normalizing the weight
statistical techniques, contributing to their flexibility and ef- update step based on the power of the input signal. This
fectiveness. makes the algorithm more robust to changes in the input
signal power, and it can converge faster and more accurately
1) Linear kernel: than the standard LMS algorithm.
– Definition: K(xi , xj ) = x⊤ The basic idea behind the NLMS algorithm is similar to
i xj = ⟨xi , xj ⟩.
that of the LMS algorithm. It adjusts the weights of a linear
– Explanation: A linear kernel computes the inner prod-
filter iteratively based on the difference between the predicted
uct between input vectors, providing a measure of simi-
output of the filter and the actual output. However, the weight
larity based on their alignment. It serves as a foundation-
update step in the NLMS algorithm is normalized based on
al choice, particularly in scenarios where the underlying
the power of the input signal, which helps prevent the weight
relationships are expected to be linear.
update from becoming too large or too small.
2) Polynomial kernel: p The formula for the weight update in the NLMS algorithm
– Definition: K(xi , xj ) = ⟨xi , xj ⟩ + c . is [28]:
– Explanation: A polynomial kernel introduces non- µ
θ(n + 1) = θ(n) + e(n) x(n) , (13)
linearity by raising the dot product to a certain power p, ∥x(n)∥2
with an optional constant term c. This kernel is effective
where θ(n) is the filter weights at iteration n, µ is the step-size
in capturing higher-order relationships in the data.
parameter, e(n) is the error signal at iteration n, x(n) is the
3) Gaussian radial basis function (RBF) kernel: input signal at iteration n, and ∥x(n)∥2 is the power of the
– Definition: K(xi , xj ) = exp(−γ∥xi − xj ∥2 ). input signal.
– Explanation: An RBF kernel measures similarity based The NLMS algorithm has several advantages over the LMS
on the Euclidean distance between vectors. It is widely algorithm, including its faster convergence rate, better tracking
employed for its ability to capture complex, non-linear of time-varying signals, and improved robustness to changes
relationships and is a key component in support vector in the input signal power. However, it can be sensitive to noise
machines (SVMs). and can still suffer from slow convergence or local minima if
4) Sigmoid kernel: the step-size parameter is set too high.

– Definition: K(xi , xj ) = tgh a⟨xi , xi ⟩ + c .
– Explanation: A sigmoid kernel is useful for capturing 3.7. KLMS Algorithm
relationships characterized by sigmoidal shapes. It is Kernel least mean square (KLMS) is an online ML algorithm
employed in neural networks and logistic regression, that is designed for non-linear regression problems. KLMS
providing flexibility in modeling. uses a kernel function to transform the input data into a high-
dimensional feature space, where a linear relationship between
3.5. LMS Algorithm inputs and outputs can be learned using a simple linear model.
A description of the KLMS algorithm is given below [25]:
The LMS algorithm is a specific type of an adaptive filter used
in digital signal processing and machine learning applications. θ(0) = 0
It is widely used to tackle tasks such as system estimation, e(n) = d(n) − θ(n − 1)⊤ Φ x(n)

channel radio equalization, noise cancellation, and adaptive 
beamforming. θ(n) = θ(n − 1) + µ e(n)Φ x(n) . (14)

20
JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024
Machine Learning Based System Identification with Binary Output Data Using Kernel Methods

The KLMS algorithm works by transforming the input data


into a high-dimensional feature space using a kernel function. 0.7
estimated using KNLMS
The kernel function measures the similarity between two data 0.6 measured Proakis C
points and maps them into a high-dimensional space, where estimated using KLMS

they can be linearly separable. The output θ(n − 1)⊤ Φ x(n) 0.5

Amplitude
is then predicted using the inner product of the kernel function 0.4
and the weight vector θ. The error e(n) is computed as the
0.3
difference between the predicted output θ(n − 1)⊤ Φ x(n)
and the actual output d(n). 0.2
The KLMS algorithm updates the weight vector w using the
0.1
error e and the kernel function κ to compute the update. The
update rule follows the same principle as the LMS algorithm, 0
1 2 3 4 5
where the weight vector is updated in the direction that reduces Time [samples]
the error. The update rule includes the kernel function to
account for the nonlinear relationship between inputs and
Fig. 4. Estimation of the Proakis C channel impulse response, for
outputs. a data length of N = 1000 and SN R = 20 dB.

binary-valued output, utilizing the Gaussian kernel.


3.8. KNLMS Algorithm  
2
∥x − y∥
κ(x, y) = exp − , ∀(x, y) ∈ X 2 , (16)
2σ 2
The KNLMS algorithm is a variation of the NLMS algo-
rithm that uses a kernel function to map the input signal to where σ > 0 represents the smoothing parameter.
a higher-dimensional space, where it is easier to separate lin- The simulations involve passing a signal x(n) from a normal
early. This makes the algorithm more powerful and versatile, distribution with mean 0 and variance 1 through a Hammer-
with applications in nonlinear filtering and prediction. stein system. This system consists of a nonlinearity tgh(x)
The basic idea behind the KNLMS algorithm is to apply a non- followed by a linear finite impulse response (FIR) channel.
linear mapping to the input signal using a kernel function, The linear channel uses an impulse response h of length
such as a Gaussian or polynomial function. The mapped sig- L = 5 known as the Proakis C channel, with coefficients of
nal is then used as the input to the standard NLMS algorithm, [0.227, 0.460, 0.688, 0.460, 0.227]. Additionally, Gaussian
which updates the filter weights based on the difference be- white noise with a power of 20 dB is added to the chan-
tween the predicted output and the actual output. The weight nel output during each of the 1024 iterations. Finally, with
update step is normalized based on the power of the mapped the aid of a binary detector I[.] that employs a predefined
signal, similar to the NLMS algorithm. threshold C ∈ R, the system’s output d(k) becomes mea-
The formula for the weight update in the KNLMS algorithm surable. The quantized output data s(k) can be represented
is [26]: mathematically as follows:
(
µ  1 if d(n) ­ C
θ(n + 1) = θ(n) +  2 e(n) Φ x(n) , (15) s(k) = I[d(n)­C] = (17)
ε + Φ x(n) −1 otherwise .

where θ(n) is the filter weights at iteration n, µ is the


step-size parameter, e(n) is the error signal at iteration n, 4.1. Proakis C Channel Identification
Φ x(n) is the mapped input signal at iteration n, ε refers to
a small constant mobilized to avoid numerical problems, and Figure 4 shows the impulse response parameters of the Proakis

∥Φ x(n) ∥2 is the power of the mapped input signal. C channel estimated using both algorithms (KLMS and
The KNLMS algorithm has several advantages over the KNLMS). The estimates were obtained for SN R = 15 dB
NLMS algorithm, including its ability to handle nonlinear and N = 1024 input signal samples, using 50 Monte Carlo it-
signals and its improved performance in high-dimensional erations. The results indicate that the kernel normalized least
spaces. However, it can be computationally expensive and mean squares (KNLMS) algorithm accurately estimates the
may require careful selection of the kernel function and its response parameters, while the KLMS algorithm produces
parameters to achieve good performance. estimated values that differ significantly from the measured
values. To assess the frequency domain performance of both
algorithms, we visualize the estimated amplitude and phase
response of the Proakis C channel’s impulse response for
4. Simulation Results and Discussion a sample size of N = 1000 and SN R = 20 dB. Figure 5
highlights the estimates of the amplitude and phase of the
To validate the efficacy of the presented algorithms in the Proakis C channel, using the KLMS and KNLMS algorithms.
presence of Gaussian additive noise, simulations were per- Based on these results, the KNLMS algorithm proves to be
formed, focusing on nonlinear system identification with more effective than the KLMS algorithm, as it allows to ob-
JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024 21
Rachid Fateh, Hicham Oualla, Es-said Azougaghe, Anouar Darif, Ahmed Boumezzough, Said Safi, Mathieu Pouliquen, and Miloud Frikel

20 a)
2 measured output
Magnitude [dB]

0 estimated output

Amplitude
1
−20
0
−40
−1
−60
−80 590 595 600 605 610 615 620 625 630 635
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time [samples]
Normalized frequency [×π rad/sample]
b)
4
0

Amplitude
2
−200
Phase [o]

0
−400
−2
measured Proakis C
−600 estimated using KLMS −4
estimated using KNLMS 0 100 200 300 400 500 600 700 800 900
−800 Time [samples]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Normalized frequency [×π rad/sample]
Fig. 7. Output d(k) estimation using KNLMS algorithm: a) zoomed-
Fig. 5. Estimation of the Proakis C amplitude for a data length of in between 590–640 samples, b) full 1000 samples.
N = 3000 and SN R = 20 dB.
when estimating the output for small sample sizes N < 100,
tain the same shapes of the estimated amplitude and phase and we have a significant difference between the shape of the
values as those measured. estimated and measured output (Fig. 6).
To illustrate the performance of adaptive kernel filter algo-
4.2. Output Data Estimation rithms (KLMS and KNLMS) based on binary data output,
In Figs. 6 and 7, the estimation of the output d(k) for nonlinear the identification is applied to Hammerstein models with dif-
system identification without binary-valued output observa- ferent complexities and binary output.
tions is demonstrated using KLMS and KNLMS. The lower The estimation of the binary output s(k) as a function of
graphs depict the complete signal form for a data length of the number of samples is presented in Figs. 8 and 9, using
N = 1000, while the upper graphs focus on data lengths be- KNLMS and KLMS for a SNR of 20 dB. It appears that the
tween 590 and 640 to provide a more detailed view of the estimated binary output takes the same form as the measured
processed signals. output data. The estimation of the binary output is done with
It should be noted that with the KNLMS algorithm, the esti- high accuracy using the KNLMS algorithm. In the case of
mated output d(k) follows the true model in perfect agree- the KLMS algorithm, we observe a difference in some sam-
ment with the measured data (Fig. 7). In comparison, with ples (Fig. 8).
the KLMS algorithm, we observe that the estimated output The KNLMS algorithm is often considered more effective
d(k) follows the variations of the real output with some fluc- than the KLMS algorithm for system identification because
tuations. The performance of the KLMS algorithm degrades it can produce more accurate estimates of the output data.

a) a)
2 measured output 3 measured output
estimated output estimated output
Amplitude

2
Amplitude

1
1
0
0
−1
−1
−2
590 595 600 605 610 615 620 625 630 635 410 420 430 440 450 460 470 480 490
Time [sample] Time [samples]
b)
4 b) 1
Amplitude

2 0.5
Amplitude

0 0
−2 −0.5
−4 −1
0 100 200 300 400 500 600 700 800 900 0 100 200 300 400 500 600 700 800 900
Time [sample] Time [samples]

Fig. 6. Output d(k) estimation using KLMS algorithm: a) zoomed-in Fig. 8. Binary output s(k) estimation using KLMS algorithm: a)
between 590–640 samples, b) full 1000 samples. zoomed-in between 410–590 samples, b) full 1000 samples.

22
JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024
Machine Learning Based System Identification with Binary Output Data Using Kernel Methods

The KNLMS algorithm’s proficiency in handling the unique


a) challenges posed by binary output data and accurately es-
measured output
2 estimated output timating the channel impulse response parameters in the
Amplitude

1
Hammerstein system underscores its efficacy in scenarios
0 where nonlinearities and binary responses are prevalent. Our
−1 findings contribute valuable insights to the field of system
410 420 430 440 450 460 470 480 490 identification, especially in applications where binary out-
Time [samples] put data and the Hammerstein system model are prominent,
b) offering researchers and practitioners a promising tool for en-
1
0.5 hancing the accuracy and reliability of their models. These
Amplitude

results pave the way for further exploration and refinement of


0
adaptive kernel filtering algorithms in the realm of nonlinear
−0.5 system identification.
−1 Moving forward, our primary emphasis will be on enhancing
0 100 200 300 400 500 600 700 800 900
Time [samples] the kernel algorithm to enable the identification of measurable
frequency-selective fading radio channels.
Fig. 9. Binary output s(k) estimation using KNLMS algorithm: a)
zoomed-in between 410–590 samples, b) full 1000 samples.

The main reason for this is that the KNLMS algorithm in-
References
cludes a normalization step that improves the stability and
convergence of the algorithm. This is particularly important [1] S. Haykin, Adaptive Filter Theory, 4th ed., Hoboken: Prentice Hall,
in noisy environments, where the impact of noise on the esti- 920 p., 2002 (ISBN: 9780130901262).
[2] M.M. Sondhi, “The History of Echo Cancellation”, IEEE Signal
mated impulse response can be reduced, leading to improved Processing Magazine, vol. 23, no. 5, pp. 95–102, 2006 (https//doi
accuracy in the channel identification process. .org/10.1109/MSP.2006.1708416).
Based on the results obtained, it can be concluded that kernel [3] A.H. Sayed, Fundamentals of Adaptive Filtering, Hoboken: John
adaptive filters are effective in identifying nonlinear systems Wiley & Sons, 1168 p., 2003 (ISBN: 9780471461265).
with binary output measurements. Nonlinear systems with [4] P.S.R. Diniz, Adaptive Filtering: Algorithms and Practical Imple-
mentations, 3rd ed., New York: Springer, 652 p., 2008 (https:
binary output can offer various advantages. Firstly, binary out-
//doi.org/10.1007/978-1-4614-4106-9).
put is easier to process and interpret compared to continuous [5] R. Fateh, A. Darif, and S. Safi, “Kernel and Linear Adaptive Methods
output due to its simplicity, which results in lower computa- for the BRAN Channels Identification”, in: International Conference
tional and storage requirements. Secondly, binary output is on Advanced Intelligent Systems for Sustainable Development, Tangier,
more robust to noise and interference than continuous output. Morocco, pp. 579–591, 2020 (https://doi.org/10.1007/978-
3-030-90639-9_47).
Binary signals are less affected by small variations or fluctu-
[6] R. Fateh, A. Darif, and S. Safi, “Identification of the Linear Dy-
ations in the input signal, thus enhancing the accuracy and namic Parts of Wiener Model Using Kernel and Linear Adaptive”,
reliability of the system. in: International Conference on Advanced Intelligent Systems for
Sustainable Development, Tangier, Morocco, pp. 387–400, 2020
(https://doi.org/10.1007/978-3-030-90639-9_31).
[7] E. Ferrara, “Fast Implementations of LMS Adaptive Filters”, IEEE
5. Conclusion Transactions on Acoustics, Speech, and Signal Processing, vol. 28,
no. 4, pp. 474–475, 1980 (https://doi.org/10.1109/tassp.19
80.1163432).
In our research paper, we delved into the intricate task of iden- [8] S. Sanei and J.A. Chambers, EEG Signal Processing, Hoboken: John
tifying nonlinear systems characterized by binary output data, Wiley & Sons, 289 p., 2013 (https://doi.org/10.1002/978047
0511923).
and we addressed this challenge through the implementation [9] M. Krstic, I. Kanellakopoulos, and P.V. Kokotovic, Nonlinear and
of adaptive kernel filtering algorithms. Our specific applica- Adaptive Control Design, Hoboken: John Wiley & Sons, 592 p., 1995
tion honed in on the estimation of parameters associated with (ISBN: 9780471127321).
the Proakis channel, a scenario with inherent complexities. [10] R. Fateh, A. Darif, and S. Safi, “Channel Identification of Non-linear
The Proakis channel is known for its practical relevance in Systems with Binary-Valued Output Observations Based on Positive
Definite Kernels”, E3S Web of Conferences, vol. 297, art. no. 01020,
communication systems, and accurate estimation of its pa-
2021 (https://doi.org/10.1051/e3sconf/202129701020).
rameters is crucial for optimizing system performance. [11] R. Fateh, A. Darif, and S. Safi, “Hyperbolic Functions Impact Evalua-
In the course of our investigations, we focused on the Ham- tion on Channel Identification Based on Recursive Kernel Algorithm”,
merstein system identification problem, aiming to discern the 2022 8th International Conference on Optimization and Applications
most effective algorithm for binary output data estimation (ICOA), Genoa, Italy, 2022 (https://doi.org/10.1109/ICOA556
59.2022.9934118).
and channel impulse response parameter estimation. Notably,
[12] J.W. Xu, A.R. Paiva, I. Park, and J.C. Principe, “A Reproducing Kernel
simulations unveiled compelling results, showcasing that, in Hilbert Space Framework for Information-theoretic Learning”, IEEE
this context, the KNLMS algorithm exhibited superior per- Transactions on Signal Processing, vol. 56, no. 12, pp. 5891–5902,
formance when compared to the KLMS algorithm. 2008 (https://doi.org/10.1109/TSP.2008.2005085).

JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024 23
Rachid Fateh, Hicham Oualla, Es-said Azougaghe, Anouar Darif, Ahmed Boumezzough, Said Safi, Mathieu Pouliquen, and Miloud Frikel

[13] W. Liu, J.C. Principe, and S. Haykin, Kernel Adaptive Filtering: A [31] X. Zhang, F. Ding, F.E. Alsaadi, and T. Hayat, “Recursive Parameter
Comprehensive Introduction, Hoboken: John Wiley & Sons, 240 p., Identification of the Dynamical Models for Bilinear State Space
2010 (ISBN: 9780470447536). Systems”, Nonlinear Dynamics, vol. 89, pp. 2415–2429, 2017 (http
[14] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Anal- s://doi.org/10.1007/s11071-017-3594-y).
ysis, Cambridge University Press, 462 p., 2004 (https://doi.org/ [32] L. Xu, “Application of the Newton Iteration Algorithm to the Pa-
10.1017/CBO9780511809682). rameter Estimation for Dynamical Systems”, Journal of Computa-
[15] N. Aronszajn, “Theory of Reproducing Kernels”, Transactions of the tional and Applied Mathematics, vol. 288, pp. 33–43, 2015 (https:
American Mathematical Society, vol. 68, no. 3, pp. 337–404, 1950 //doi.org/10.1016/j.cam.2015.03.057).
(https://doi.org/10.2307/1990404). [33] Q. Song, “Recursive Identification of Systems with Binary-valued
[16] B. Scholkopf and A. Smola, Learning with Kernels: Support Vector Outputs and with ARMA Noises”, Automatica, vol. 93, pp.106–113,
Machines, Regularization, Optimization, and Beyond, Cambridge: 2018 (https://doi.org/10.1016/j.automatica.2018.03.0
MIT Press, 2002 (https://doi.org/10.7551/mitpress/4175. 59).
001.0001). [34] J. Guo, X. Wang, W. Xue, and Y. Zhao, “System Identification with
[17] C. Cortes and V. Vapnik, “Support-vector Networks”, Machine Learn- Binary-valued Observations under Data Tampering Attacks”, IEEE
ing, vol. 20, no. 3, pp. 273–297, 1995 (https://doi.org/10.100 Transactions on Automatic Control, vol. 66, no. 8, pp. 3825–3832,
7/bf00994018). 2020 (https://doi.org/10.1109/TAC.2020.3029325).
[18] B. Scholkopf, A. Smola, and K.R. Muller, “Kernel Principal Com- [35] L. Li, F. Wang, H. Zhang, and X. Ren, “A Novel Recursive Learn-
ponent Analysis”, in: International Conference on Artificial Neu- ing Estimation Algorithm of Wiener Systems with Quantized Ob-
ral Networks (ICANN), Lausanne, Switzerland, pp. 583–588, 1997 servations”, ISA Transactions, vol. 112, pp. 23–34, 2021 (https:
(https://doi.org/10.1007/bfb0020217). //doi.org/10.1016/j.isatra.2020.11.032).
[19] M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, and B. Scholkopf, [36] L. Zhang, Y. Zhao, and L. Guo, “Identification and Adaptation
“Support Vector Machines”, IEEE Intelligent Systems and Their with Binary-valued Observations under Non-persistent Excitation
Applications, vol. 13, no. 4, pp. 18–28, 1998 (https://doi.or Condition”, Automatica, vol. 138, art. no. 110158, 2022 (https:
g/10.1109/5254.708428). //doi.org/10.1016/j.automatica.2022.110158).
[20] M. Zidane and R. Dinis, “A New Combination of Adaptive Channel [37] R. Fateh and A. Darif, “Mean Square Convergence of Reproducing
Estimation Methods and TORC Equalizer in MC-CDMA Systems”, Kernel for Channel Identification: Application to Bran D Channel
International Journal of Communication Systems, vol. 33, no. 11, art. Impulse Response”, in: International Conference on Business Intelli-
no. 4429, 2020 (https://doi.org/10.1002/dac.4429). gence, Beni-Mellal, Morocco, 2021 (https://doi.org/10.1007/
[21] M. Zidane, S. Safi, and M. Sabri, “Measured and Estimated Data 978-3-030-76508-8_20).
of Non-linear BRAN Channels Using HOS in 4G Wireless Com- [38] W. Liu and J.C. Principe, “Kernel Affine Projection Algorithms”,
munications”, Data in Brief, vol. 17, pp. 1136–1148, 2018 (https: EURASIP Journal on Advances in Signal Processing, vol. 2008, art.
//doi.org/10.1016/j.dib.2018.02.005). no. 784292, 2008 (https://doi.org/10.1155/2008/784292).
[22] R. Fateh, A. Darif, and S. Safi, “Performance Evaluation of MC-
CDMA Systems with Single User Detection Technique using Kernel
and Linear Adaptive Method”, Journal of Telecommunications and
Information Technology, no. 4, pp. 1–11, 2021 (https://doi.org/ Rachid Fateh, Ph.D.
10.26636/jtit.2021.151621). Laboratory of Innovation in Mathematics, Applications, and
[23] M. Zidane, S. Safi, and M. Sabri, “Compensation of Fading Channels Information Technologies, Polydisciplinary Faculty
Using Partial Combining Equalizer in MC-CDMA Systems”, Journal
of Telecommunications and Information Technology, no. 1, pp. 5–11, https://orcid.org/0000-0002-0574-2105
2017 (http://dlibra.itl.waw.pl/dlibra-webapp/Content/ E-mail: [email protected]
1962/ISSN_1509-4553_1_2017_5.pdf). Sultan Moulay Slimane University, Beni Mellal, Morocco
[24] S. Safi, M. Frikel, A. Zeroual, and M. M’Saad, “Higher Order Cumu-
lants for Identification and Equalization of Multicarrier Spreading
https://www.usms.ac.ma
Spectrum Systems”, Journal of Telecommunications and Information
Technology, no. 1, pp. 74–84, 2011 (https://doi.org/10.26636
Hicham Oualla, Ph.D.
/jtit.2022.161122). E-mail: [email protected]
[25] W. Liu, P.P. Pokharel, and J.C. Principe, “The Kernel Least-mean- Akkodis, Paris, France
square Algorithm”, IEEE Transactions on Signal Processing, vol. 56, http://www.akkodis.com
no. 2, pp. 543–554, 2008 (https://doi.org/10.1109/tsp.2007
.907881). Es-said Azougaghe, Ph.D.
[26] C. Richard, J. Bermudez, and P. Honeine, “Online Prediction of Time
Series Data with Kernels”, IEEE Transactions on Signal Processing, Laboratory of Information Processing and Decision Support
vol. 57, no. 3, pp. 1058–1067, 2009 (https://doi.org/10.1109/ https://orcid.org/0000-0002-0233-1132
TSP.2008.2009895). E-mail: [email protected]
[27] R. Fateh, A. Darif, and S. Safi, “An Extended Version of the Pro-
portional Adaptive Algorithm Based on Kernel Methods for Chan-
Sultan Moulay Slimane University, Beni Mellal, Morocco
nel Identification with Binary Measurements”, Journal of Telecom- https://www.usms.ac.ma
munications and Information Technology, no. 3, pp. 47–58, 2022
(https://doi.org/10.26636/jtit.2022.161122). Anouar Darif, Ph.D.
[28] S. Ciochina, C. Paleologu, and J. Benesty, “An Optimized NLMS Laboratory of Innovation in Mathematics, Applications, and
Algorithm for System Identification”, Signal Processing, vol. 118, Information Technologies, Polydisciplinary Faculty
pp. 115–121, 2016 (https://doi.org/10.1016/j.sigpro.201
5.06.016). https://orcid.org/0000-0001-8026-9189
[29] L. Ljung, System Identification: Theory for the User, 2nd ed., Hobo- E-mail: [email protected]
ken: Prentice Hall, 640 p., 1999 (ISBN: 9780136566953). Sultan Moulay Slimane University, Beni Mellal, Morocco
[30] F. Ding, System Identification-Performances Analysis for Identification
Methods, Beijing: Science Press, 2014.
https://www.usms.ac.ma

24
JOURNAL OF TELECOMMUNICATIONS
AND INFORMATION TECHNOLOGY 1/2024
Machine Learning Based System Identification with Binary Output Data Using Kernel Methods

Ahmed Boumezzough, Ph.D. Mathieu Pouliquen, Ph.D.


Laboratory of Research in Physics and Engineering Sciences, Laboratory of Engineering Systems, UNICAEN, ENSICAEN
Polydisciplinary Faculty E-mail: [email protected]
E-mail: [email protected] Normandie University, Caen, France
Sultan Moulay Slimane University, Beni Mellal, Morocco https://www.unicaen.fr
https://www.usms.ac.ma
Said Safi, Ph.D.
Laboratory of Innovation in Mathematics, Applications, and Miloud Frikel, Ph.D.
Information Technologies, Polydisciplinary Faculty Laboratory of Engineering Systems, UNICAEN, ENSICAEN
https://orcid.org/0000-0003-3390-9037 https://orcid.org/0000-0003-2178-1814
E-mail: [email protected] E-mail: [email protected]
Sultan Moulay Slimane University, Beni Mellal, Morocco Normandie University, Caen, France
https://www.usms.ac.ma https://www.unicaen.fr

You might also like