Sci 04 00040
Sci 04 00040
1 Department of Mechanical Engineering, Pennsylvania State University, University Park, PA 16802, USA
2 Department of Mechanical Engineering, University of Kerbala, Kerbala 56001, Iraq
3 Department of Mathematics, Pennsylvania State University, University Park, PA 16802, USA
4 Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802, USA
* Correspondence: [email protected] (N.F.G.); [email protected] (A.R.); [email protected] (W.K.J.)
1. Introduction
The concept of functional analysis is built upon normed vector spaces and particularly
Copyright: © 2022 by the authors.
inner product spaces, which are merged with diverse notions of topology and geometry,
Licensee MDPI, Basel, Switzerland.
linear algebra, probability theory, and real and complex analysis (see, e.g., [1–5]). Topics
This article is an open access article
in functional analysis include various concepts such as Banach spaces and Hilbert spaces,
distributed under the terms and
and linear operators and their spectral theory, as well as group and semigroup theory.
conditions of the Creative Commons
Attribution (CC BY) license (https://
Knowledge of these mathematical structures is often essential for understanding and
creativecommons.org/licenses/by/
solving a variety of analytical problems in signal processing and related fields, as well as
4.0/). in mathematics itself [5]. For example, in functional analysis, objects like functions are
considered as elements or points in a space of functions [6], and hence the name functional
analysis.
Results, generated from functional analysis, form key concepts in the frameworks of
advanced scientific and engineering disciplines that include the fields of statistical signal
processing and adaptive signal processing. Although adaptive signal processing can be
viewed as a branch of statistical signal processing [7], the special properties of this field and
their roles in engineering applications have led many specialists to consider them as two
separate fields. Therefore, in many universities and research institutions around the world,
statistical signal processing and adaptive signal processing are taught as independent
graduate courses in engineering and applied sciences, and many textbooks have been
devoted to study these important fields individually (e.g., see [8,9] and references therein).
Nevertheless both statistical signal processing and adaptive signal processing form the
backbone of the so-called modern signal processing, in which signals are generally considered
as random processes. Modern signal processing covers many topics of current interest, such
as signal modeling and estimation, signal prediction, signal compression, adaptive lattice
filtering, adaptive joint process estimation, recursive least squares lattice filtering, and
spectrum estimation. The issues, related to processing of both deterministic and random
signals, are further discussed below.
While an estimation error may typically converge to zero for deterministic signals,
this is generally not the case for random signals [8]. Therefore, in statistical and adaptive
signal processing, it is a common practice for random signals to make them unbiased (i.e.,
expectation of the estimation error converging to zero). As explained later, this type of
convergence is of a special kind, which is known in functional analysis as weak convergence
(see, for example, [10,11]). Therefore, many important results in functional analysis are
obtained in terms of weak convergence and weak topology, which potentially have significant
implications to the subfields of statistical signal processing and adaptive signal processing.
Moreover, it is usually desirable in estimation theory to identify optimal filters, which
bridges the discipline of signal processing to that of optimization theory. To this end,
researchers in modern signal processing often deal with random processes for which
optimization problems become more challenging, and the usage of advanced mathematical
tools is justified.
From a historical perspective, the names of some of the spaces used in functional
analysis are those of early-time mathematicians who had originally developed the theories
of these spaces. Indeed much of the theoretical work has been associated with the names
of eminent mathematicians (e.g., Gauss, Lagrange, Euler, and Kolmogorov). In fact, the
Hilbert space, which is a central topic in functional analysis, is one of the most commonly
used mathematical frameworks of signal processing and the associated optimization [12].
The unique features of Hilbert spaces are explained in the paper from these perspectives.
However, the names of other well-known spaces (e.g., metric spaces and normed spaces)
were given based on the technical properties of these spaces; many of the spaces, fre-
quently used in functional analysis, have been named based on quite different historical
backgrounds.
We have presented a concise and focused review of key concepts of functional anal-
ysis in this paper, which have strong relevance to modern signal processing. The most
important spaces from the perspectives of functional analysis, considered in this paper, are
metric/topological spaces, Banach spaces, and Hilbert spaces. The relations among these
and other spaces are illustrated in Figure 1. Other relevant vector spaces like summable
(` p ), Lebesgue-integrable (L p ), and Hardy (H p ) spaces are also introduced in the paper.
Sci 2022, 4, 40 3 of 28
The paper is organized in four sections, including the current section, and an Appendix A.
Section 2 introduces Banach spaces and their relevant theorems, where special emphases
are laid on the ` p /L p spaces, H p spaces, spectral factorization, and weak topology in the
setting of Banach spaces. Section 3 presents Hilbert spaces and their relevant features (e.g.,
Fourier series expansion and the orthogonality principle) along with some applications to
signal processing and detection theory, such as wavelets, Karhunen-Loéve (KL) expansion,
and reproducing-kernel Hilbert spaces (RKHS). Section 4 summarizes and concludes the
paper. The Appendix A in this paper introduces elementary concepts and definitions
in real analysis, probability theory, and topological spaces, which should be helpful for
understanding the fundamental principles of functional analysis as applied to various
concepts of signal processing; however the readers, who are familiar with these concepts,
may only selectively refer to the Appendix A.
Definition 1 (Banach Spaces). Let a vector space X be defined over a field K, where examples of
K are the field of real numbers (R, +, ·) and the field of complex numbers (C, +, ·). Let a function
k • k : X → R, called norm and denoted as x 7→ k x k, have the following properties:
• (positivity) ∀ x ∈ X, k x k ≥ 0, ∀ x ∈ X, and k x k = 0 if and only if x = 0.
• (homogeneity) ∀ x ∈ X and ∀c ∈ K, kcx k = |c|k x k.
• (triangular inequality) ∀ x, y ∈ X, k x + yk ≤ k x k + kyk.
Then, X, k • k is called a normed vector space, where the norm k • k serves as a metric.
A real (resp. complex) normed linear space that is complete (i.e., where every Cauchy sequence
converges in the space) is called a real (resp. complex) Banach space.
Sci 2022, 4, 40 4 of 28
∞ 1/p
k x k` p , ∑ | xn | p
if 1 ≤ p < ∞
n=−∞
k x k`∞ , sup | xn | if p = ∞
n ∈Z
Some of the theorems on ` p spaces [2], which are extensively used in the analyses of
discrete-time signals, are presented below.
Proof. Let us assume that limn→∞ | xn | 6= 0 is true. Then, there exists a subsequence {| xn j |}
bounded below by a real number ε > 0, which implies that {| xn j | p } is bounded below by
ε p so that ∑n | xn | p → ∞ as n → ∞. This contradicts the assertion { xn } ∈ ` p .
Let a linear discrete-time dynamical system with an impulse response h[n, k ] be excited
by an input signal u to yield an output signal y.
For a linear shift-invariant (LSI) system, the impulse response h[n, k] takes the form
h[n − k], where the output is given by the convolution y = h ? u as [14]:
∞ ∞
y[n] = ∑ h[n − k]u[k] = ∑ u[n − k]h[k] (1)
k =−∞ k =−∞
Using Theorem 3, if h ∈ `1 and u ∈ ` p for some p ∈ [1, ∞], then it follows that [13]:
Sci 2022, 4, 40 5 of 28
k y k ` p ≤ k h k `1 k u k ` p
It is noted that h ∈ `1 is a sufficient condition for the system to be ` p -stable. Further-
more, using Lemma 1, it follows that if y ∈ ` p for some p ∈ [1, ∞), then y[n] → 0 as n → ∞.
This information is useful, for example, in the design of a linear shift-invariant estimation
system, where the output signal represents the estimation error. If the system impulse
response is h ∈ `1 , then the estimation error is bounded and converges asymptotically to
zero if the input signal u ∈ ` p for some p ∈ [1, ∞).
Example 2 (Adaptive Filtering). In a general setting, let us consider an adaptive filtering problem
T
in Figure 2, where a measurement vector x[n] , x1 [n], x2 [n], . . . , x N [n] is used to construct an
estimate, d̂[n] , (h ? x)[n], of the desired signal d[n] by a linear shift-variant filter h[n] [7]. Then,
an adaptive algorithm to update the filter h[n] such that the estimation error
the task is to synthesize
e[n] , d[n] − d̂[n] → 0 as n → ∞. Using Lemma 1, this could be achieved if e ∈ ` p for some
p ∈ [1, ∞) in the adaptive algorithm.
Figure 2. An adaptive filter consisting of a shift-variant filter h with an adaptive algorithm for
updating the filter coefficients.
If a dynamical system at any time n does not depend on the future (i.e., the system
is only dependent on the past and the present) input(s), then the system is said to be
causal [15] and the convolution in Equation (1) reduces to
n
y[n] = ∑ h[n − k]u[k] (2)
k =−∞
which is known as the system transfer function (The one-dimensional z-transform of the
N (z)
discrete-time impulse response h[k ] is the ratio of two polynomials: H (z) , D(z) , where
the degree of N (z) is less than or equal to that of D (z) for physically realizable systems.
However, for the multi (i.e., n)-dimensional z-transform, where n ∈ N/{1}, the resulting
transfer function is given as the ratio of the numerator and denominator multinomials:
N ( z1 , · · · , z n )
H (z) ,
D ( z1 , · · · , z n )
Given p ∈ [1, ∞], the Hardy space H p is a set of analytic functions f (reiθ ) with
bounded H p -norm defined as:
Z 2π
1 1
k f k H p , sup dθ | f (reiθ )| p for p ∈ [1, ∞)
r ∈(0,1) 2π 0
The following theorem, due to Paley and Wiener [14], presents a fundamental result
in the H1 -space, which is important for spectral factorization in signal processing and for
innovation representation of random processes.
where the superscript “her" indicates the Hermitian, i.e., complex conjugate of transpose of a
vector/matrix, and z is the complex conjugate of z. If, in addition, S(z) is a rational polynomial,
the above factors Hca (z) and Hca her (1/z ) are minimum-phase and maximum-phase components,
Proof. The proof of the Paley-Wiener Theorem is given in details by Therrian [14].
It follows from Equation (1) that, for a linear shift-invariant stable system with a
deterministic LSI impulse response h[n] and a wide sense stationary (WSS) input signal
u[n], the expected value of the output y[n] is:
∞
E[y[n]] = ∑ h[n − k] E[u[k]] (6)
k =−∞
Sci 2022, 4, 40 7 of 28
Since the input u is WSS, expected values, my and mu , of the output y and input u,
respectively, are related as:
∞
my = ∑ h[k ] mu (7)
k =−∞
Autocorrelation of a random vector x[k] is denoted as r xx [k] , E x[k]xher [k] , and the
The above equation leads to the following important relations between correlation
functions [14]:
ryu [`] = h[`] ∗ ruu [`] and ryy [`] = h[`] ∗ ruu [`] ∗ hher [−`] (9)
where the superscript her indicates the Hermitian, i.e., the complex conjugate of transpose
of a vector/matrix.
The Fourier transform of r xx [k] for a WSS random sequence x[k ] is called the power
spectral density function [7], defined as:
∞
Sxx (eiω ) , ∑ e−iωk r xx [k] (10)
k =−∞
and its inverse Fourier transform, which is equal to the autocorrelation function, is
obtained as:
1
Z π
r xx [k] = dω eiωk Sxx (eiω ) (11)
2π −π
The z-transform of the autocorrelation function for a WSS random sequence x[n] is
called the complex spectral density function and is defined as:
∞
Sxx (z) , ∑ r xx [k]z−k (12)
k =−∞
1
I
r xx [k ] = dz zk−1 Sxx (z) (13)
2πi C
Since the autocorrelation function of a zero-mean white noise with variance σw2 is given
by rw , σw2 δ[k], the power spectral density is a constant σw2 for a stationary white noise.
Using the property that the convolution in the time domain is a product in the Fourier
transform domain and using Equation (9), it follows that
where H (eiω ) is the system transfer function (i.e., the Fourier transform of h[k ]). A few
algebraic computations yield the following relation [14]:
In a similar manner, the following relations are obtained for the complex spectral
density
Let us consider a WSS random sequence { x [k]} whose complex spectral density
satisfies the Paley-Wiener condition:
Z π
`n(Sxx ) ∈ H 1 , i.e., |`n Sxx (eiω )| dω < ∞ (17)
−π
Remark 1. A process, whose (complex) spectral density satisfies Equation (17), is called a regular
process (see [7,14]). The spectral density factorization given by Equation (18) has important
applications in signal processing. This includes what is called innovations representation of the
random process [14], in view of which, any regular process can be realized as the output of a causal
linear filter Hca (z) driven by a white noise with variance K0 as shown in Figure 3.
It is worth-mentioning that this type of process covers a wide range of random processes. In
particular, any process whose complex spectral density is a rational function of z is a regular process.
Figure 3. Innovations representation of a random process. (a) Signal model. (b) Inverse filter.
Example 3 ([14]). Consider a random sequence x [n] with a complex spectral density function:
−(1/a)
Sxx (z) =
z − ( a + 1/a) + z−1
which could be re-written as:
1
Sxx (z) =
− az + (1 + a2 ) − az−1
1 1
= .
(1 − az−1 ) (1 − az)
Using Paley-Wiener Theorem, x[n] can be realized as the output of a causally stable system,
given by:
1
Hca (z) = (19)
1 − az−1
excited by a zero-mean white noise with unit variance σ2 = 1. It is important to note that since
Sx (z) is a rational polynomial, Hca (z) should be minimum-phase. This is the case for the one given
by Equation (19).
Since the function can be factored as:
1 1 1
Sxx (z) = 2 − 1
= . −1
− az + (1 + a ) − az ( z − a ) (z − a)
a possible pitfall here is to choose
Sci 2022, 4, 40 9 of 28
1 z −1
Hca (z) = = (20)
z−a 1 − az−1
The term in Equation (20) is not minimum-phase because it has a zero at |z| = ∞. Moreover,
−1 ( z ) = z − a is not causal. Therefore, the spectral factorization with H ( z ) given
the inverse Hca ca
by Equation (20) is not physically realizable for the given random sequence { x [k]}.
As mentioned before, any random process whose complex spectral density is a rational
polynomial is a regular process, and therefore it satisfies the Paley-Wiener condition.
However, this is not a necessary condition for being a regular process as seen in the
following example.
−1
Example 4 ([14]). Let a random sequence x [n] have a complex spectral density Sx (z) = ez+z .
Then, the corresponding power spectral density Sx (eiω ) = e2 cos ω satisfies the Paley-Wiener
condition that is given as:
−π dω |` n S x ( e )| = −π dω |2 cos ω | < ∞
iω
Rπ Rπ
Therefore, the given random sequence is regular and has an innovations representation. The
spectral factorization can be done as follows:
−1
Sx (z) = 1 . ez . ez
So, the given random sequence can be realized as the output of a system, with a transfer
function given by Equation (21), which is driven by a zero-mean white noise with a unit variance
(i.e., σ2 = 1).
In fact, a regular process is related to the corresponding predictable process that can
be predicted with zero error. The relation between these two processes are given by the
following fundamental theorem [7].
Theorem 5 (Wold Decomposition Theorem). A general random sequence x[n] can be written
as the sum of two processes as:
x [ n ] = xr [ n ] + x p [ n ] (22)
where xr [n] is a regular process and x p [n] is a predictable process, with xr [n] being orthogonal to
x p [n], i.e., E{xr [m]xher
p [ n ]} = 0 ∀ m, n.
where the center g is a vector/function in X and the radius r is a positive real number.
In this topology, convergence of a sequence, { f n }, of functions in X to a limit g in X
is referred to as strong convergence, which implies that k g − f n k → 0 and is denoted by
s
f n → g. Besides strong convergence, other notions of convergence (e.g., weak convergence
and uniform convergence) have been introduced in the literature, which play significant roles
in the theory of Banach algebra [1].
We now introduce the notions of weak convergence and weak topology. Given a
Banach space X over a field K, let F , { F1 , F2 , · · · } be a set of bounded linear functionals
(A functional is a mapping of a vector space X into its field K. Then, the set of all linear
bounded (equivalently, linear continuous) functionals in X is called the dual space X ∗ .) on
X, i.e., each Fi is an element in the dual space X ? and hence F ⊂ X ? . Given an ε > 0 and a
vector/function f 0 ∈ X, let us define the set:
Definition 6 (Weak convergence). Let T k ∈ BL(V, V ) be a bounded linear operator from V into
V. Then, the sequence { T k } converges weakly to some T ∈ BL(V, V ) if
∀ F ∈ V ? ∀ x ∈ V, lim | F ( Tx ) − F ( T k x ) | = 0,
k→∞
w
which is denoted as T k → T.
We demonstrate the falsity of the converse by two counterexamples, one for each case.
(Strong convergence) ; (Convergence in operator norm): Let us define x , {ξ n : n ∈ N}
and a sequence of bounded linear operators T k : `2 → `2 ∀k ∈ N as:
k xk
However, the limit may not converge in the induced norm, limk→∞ supk xk`
2
=1 k T `2 =1
as seen by choosing x = {0, 0, 0, · · · , 0, ξ k+1 , ξ k+2 , · · · } with k x k`2 = 1 ⇒ limk→∞ kTk k 6=u
| {z }
f irst k terms
0 BL(`2 ,`2 ) .
Therefore, (Strong convergence) ; (Convergence in operator norm).
T k x = {0, 0, 0, · · · , 0, ξ 1 , ξ 2 , · · · }
| {z }
f irst k terms
where x , {ξ n : n ∈ N}. It is given that { T k } is a sequence of bounded linear operators, i.e., each
T k ∈ BL(`2 , `2 ). Furthermore, in this Hilbert space setting, it follows from the Riesz Representation
Theorem that every f ∈ `2? can be represented as:
∞
f ( x ) = h x, yi`2 = ∑ ξ n ηn , where y = {ηk : k ∈ N}
n =1
However,
k T k x k`2 = k x k`2 ∀k ∈ N ⇒ ∃ x 6= 0`2 such that
lim k T k x k`2 6= 0 ⇒ lim T k 6=s 0 BL(`2 ,`2 )
k→∞ k→∞
Remark 3. It is noted that, for finite-dimensional vector spaces, the notions of strong convergence
and weak convergence are indistinguishable. Equivalently, we make the following statement:
In a finite-dimensional Banach space V, the weak topology generated by V ? is the same
as the strong topology generated by V.
However, in the analysis of stochastic processes, we deal with infinite-dimensional spaces of
signal functions, which may not have the same criteria for weak convergence and strong convergence.
This is especially applicable to statistical signal processing, where the expectation of the estimation
error is required to weakly converge to zero without having the strong convergence of the error signal
itself to zero.
Definition 7 (Convergence in weak topology). Given a Banach space X, let there be a class of
bounded linear functionals F ⊆ X ? , and let =(F ) be the topology in X generated by F . Then,
for a given vector/function g ∈ X, a sequence { f n } ⊂ X is said to converge to g in the weak
w
topology =(F ), denoted as f n → g in =(F ), provided that Fα ( f n ) converges strongly to Fα ( g),
s
denoted as Fα ( f n ) → Fα ( g) ∀ Fα ∈ F .
Sci 2022, 4, 40 12 of 28
Remark 4. The concept of topological spaces and weak topology are important for learning using
statistical invariants (LUSI). In a machine learning paradigm, learning machines often compute
statistical invariants for specific problems with the objective of reducing the expected values of
errors in a such way that preserves these invariants. In contrast to classical machine learning that
employs the mechanism of strong convergence for approximations to the desired function, LUSI can
significantly increase the rate of convergence by combining the mechanisms of strong convergence
and weak convergence [17]. Furthermore, the notion of weak topology is also important when dealing
with shift spaces for signal analysis that uses symbolic dynamics, as explained in [18,19].
Definition 8 (Hilbert Spaces). Let a vector space X be defined over a field K, which is either R or
C. A function h•, •i : X × X → K is called an inner product if, for ∀ x, y, z ∈ X and ∀α ∈ K, the
following conditions hold:
1. (positive definiteness) h x, x i > 0 when x 6= 0;
2. (additivity) h( x + y), zi = h x, zi + hy, zi;
3. (homogeneity) hαx, yi = αh x, yi;
4. (symmetry) h x, yi = hy, x i
Then, X, h•, •i is called a inner product space or a pre-Hilbert space, and a complete
inner product space (i.e., where every Cauchy sequence converges in the space) is called a real (resp.
complex) Hilbert space, depending on whether the vector space is defined over R (resp. C).
The following two properties are immediate consequences of the four properties in
Definition 8:
• h x, (y + z)i = h x, yi + h x, zi;
• h x, αyi = αh x, yi;
It p
is also noted that every inner product space is a normed space with the norm
k x k , h x, x i ∀ x ∈ X [2].
Hilbert spaces have many common interesting properties that make them to be im-
portant in optimization theory [12]. As we will see in the sequel, these properties form the
core of many fundamental results in adaptive and statistical signal processing, and they are
established through the following theorem.
Theorem 6 (Riesz Representation Theorem [2,5]). Let H be a a Hilbert space. Then, for every
bounded linear functional f : H → C, there exists a unique y ∈ H such that f ( x ) = hy, x i H
∀ x ∈ H.
Remark 5. For the Hilbert spaces `2 (resp. L2 ), this result can be obtained by using a theorem [5],
which states that, given p ∈ [1, ∞), `q (resp. Lq ) is isometrically isomorphic to the dual space of ` p
1 1
(resp. L p ) provided that + = 1, where q is called the conjugate of p. Since the conjugate of
p q
∗
p = 2 is q = 2, it follows that `2 is isometrically isomorphic to `2 , and similar relations hold
for L2 and ( L2 )∗ (for example, see [3]); hence, `2 and L2 are reflexive. Generalization of this fact is
stated as the following theorem.
Theorem 7. Every Hilbert space is reflexive, i.e., H is isometrically isomorphic to its dual space
H∗.
Remark 6. It follows from Theorem 8 that the decomposition in Equation (22) in Section 2 is indeed
unique. Based on this fact, any random process generally consists of two unique orthogonal compo-
nents; a predictable component and an unpredictable component. That is, if one wants to predict x [n]
by using N past observations { xn− N , xn− N +1 , . . . , xn−1 }, then let x̂ [n] = ∑kN=1 ak x [n − k] denote
an optimal linear prediction of x [n]. Such a prediction can be obtained by applying the orthogonality
principle, where the prediction error is given by e[n] , x [n] − x̂ [n] = x [n] − ∑kN=1 ak x [n − k ],
and the process x [n] can be expressed as:
N
x [n] = x̂ [n] + e[n] = x̂ [n] + x [n] − ∑ ak x [n − k ]
k =1
Hence, the part x̂ [n] represents the predictable part of x [n], which corresponds to x p [n] in
Wold decomposition Theorem in Equation (22), while the error e[n] represents the unpredictable part
of x [n], which corresponds to xr [n] in Wold decomposition Theorem. That is, the regular process
represents the difference between the random process and its optimal prediction. Therefore, the
output of Hca−1 ( z ) represents only the new part of information, brought by x [ n ], which cannot be
−1 ( z ) is called innovations process
extracted from the past observations. Therefore, the output of Hca
as depicted in Figure 3b.
Another interesting result on Hilbert spaces is stated in the following theorem [2,10].
eiωn t
ϕn (t) , √ ; ωn , 2πn/T ∀n ∈ Z (28)
T
(
1 1 if m = n
Z π
h ϕm , ϕn i = dt eimt eint =
2π −π 0, if m 6= n
lim N →∞ k f − ∑kN=− N ck ϕk k L2 = 0
That is,
∞
ms
f (t) = ∑ ck ϕk (t) (30)
k =−∞
fact that, for each component ck , the value |ck |2 represents a part of the signal’s energy
contributed by the component ck . This fact plays a central role in signal compression, where a
signal f ∈ L2 ([t0 , t0 + T ]) is approximated by using as few Fourier coefficients as possible;
this is accomplished with a minimum approximation error by considering those values of
{ck } with large magnitudes and by discarding those coefficients with small magnitudes.
Now we summarize the main results of Fourier series expansion of periodic functions
as a theorem.
Now, we have Fourier transform of a signal f ∈ L2 (R). Since L2 (R) is the completion
L1 (R) ∩ L2 (R), we impose a mild restriction: f (t) to be both absolute-value integrable and
square-integrable. Nevertheless, this restriction is satisfied if f is an analytic function [21].
To obtain the inverse Fourier transform, we substitute Equations (28) and (35) into
Equation (30), which yields
∞
ms 1
f (t) =
T ∑ eiωn t fˆT (ωn ) (37)
n=−∞
n 1 1
By defining ωn , , we have ∆ωn , ωn+1 − ωn = . Then, substitution of into
T T T
Equation (37) for ∆ωn yields
∞
ms
f (t) = ∑ ∆ωn fˆT (ωn ) eiωn t (38)
n=−∞
Sci 2022, 4, 40 16 of 28
In the limits T → ∞ and n → ∞, Equation (38) becomes the inverse Fourier transform
by using the Riemann sum [4]:
Z ∞
ms
f (t) = dω eiωt fˆ(ω ) (39)
−∞
This formula shows that a signal f (t) ∈ L2 (R) has, at any given time t, (possibly)
uncountably many harmonic components distributed over the frequency range −∞ <
ω < ∞, and the magnitude of the harmonic component at a frequency ω is given by the
signal’s Fourier transform fˆ(ω ). By taking the limits T → ∞ and n → ∞, it follows from
Equations (33) and (34) that
Z ∞ Z ∞
k f k2L2 = dt | f (t)|2 = dω | fˆ(ω )|2 = k fˆk2L2 (41)
−∞ −∞
The above relation is known as Plancherel’s theorem [3], which implies that the total
energy of the signal, obtained in the time domain t ∈ R is re-distributed over the frequency
domain ω ∈ R such that the energy density at each frequency ω is | fˆ(ω )|2 . It is worth-
mentioning that the inner products of two functions f and g in the time domain and the
frequency domain is related by:
where g(·) is the complex conjugate of g(·), and supp( g) ⊂ [− T, 0] for a positive real
number T. Hence f t is a localized version of f and supp( f t ) ⊆ [t − T, t] [21]. Then, the
windowed Fourier transform (WFT) of f is the Fourier transform of f t , which is given as:
Z ∞
f˜(ω, t) , fˆt (ω ) = du e−iωu f t (u) (48)
−∞
(
1 + cos(πu), −1 ≤ u ≤ 1
g(u) =
0, otherwise
N −1
f˜[k] = ∑ f [n]e−i2πkn/N (52)
n =0
Sci 2022, 4, 40 18 of 28
N −1
1
f [n] =
N ∑ f˜[k ]ei2πkn/N (53)
k =0
u−t
−p
ψs,t (u) = |s| ψ where s 6= 0 (55)
s
of what is called a mother (or basic) wavelet ψ , ψ1,0 ; and ψs,t is the complex conjugate of
ψs,t .
It is noted from Equation (55) that when |s| > 1, ψs,t is a stretched version of ψ, and
when |s| < 1, ψs,t is a compressed version of ψ. Moreover, if s < 0 then ψs,t is a reflected
version of ψ. For example, these stretching, compression, and reflection processes can be
conveniently done on the time axis. The exponent term p in Equation (55) is a real number
that stretches or compresses ψ along the vertical axis. The idea of using p in Equation (55) is
to keep a desired norm unchanged when scaling the wavelet ψs,t . For example, if p = 1,
then both ψ and ψs,t have the same L1 norm; and if p = 1/2, then ψ and ψs,t have the same
L2 norm [21].
Using Parseval’s identity, Equation (54) can be written as
where ψ̂s,t is the Fourier transform of ψs,t . This equality shows that wavelets transform
localizes signals in both time and frequency domains, where sharpness of these localizations
is controlled by the scaling factor s and the choice of the mother wavelet ψ.
Example 7. Morlet wavelet is a (frequency-modulated) mother wavelet which is given in the time
domain as:
2
ψ(u) = e−i2πξ 0 u e−u /2 (57)
whose Fourier transform is
2 /2
ψ̂(ξ ) = e−(ξ −ξ 0 ) (58)
where ξ 0 is the center frequency around which the signal is localized in the frequency domain.
Sci 2022, 4, 40 19 of 28
Various forms of the mother wavelet ψ have been reported in the wavelet literature [21,22].
All of these wavelet forms should satisfy the admissibility condition:
Z ∞
|ψ̂(ξ )|2
Cψ , dξ <∞ (59)
−∞ |ξ |
where Cψ is a constant depending on the wavelet ψ, and Equation (60) shows that any
signal f ∈ L2 can be represented as a superposition of shifted and dilated wavelets [24].
For a discrete signal f : Z → C, the discrete wavelet transform (DWT) is used with a
discrete wavelet as:
u−t
ψs,t [u] = |s|− p ψ (61)
s
where s is the scaling parameter and t is the shifting parameter. The most commonly used
discrete wavelets have the following values of the parameters:
s = 2 j , t = k2 j , and p = 1/2
where j is an integer that controls the scaling parameter and specifies the level of wavelet
decomposition of the signal, and k is another integer which controls the shifting parameter.
Substitution of these values into Equation (61) yields the most common form of the discrete
wavelet
n − k2 j
1
ψj,k [n] = √ ψ (62)
2j 2j
Notice that large values of j result in large scaling parameters which stretch the wavelet
function and let the DWT capture low-frequency features in the signal. On the other hand,
small values of j would make the DWT more capable of capturing high-frequency features
by decreasing the scaling parameter [21,22].
Given a wavelet level j, the DWT of a sequence { f [n]} consists of the following two
parts:
The average coefficients { A j [k2 j ]} are given by:
∞
A j [k2 j ] , ∑ f [n]φj,k [n]
n=−∞
∞
n − k2 j
1
= ∑ f [n] √ φ
2j
(63)
n=−∞ 2j
∞
D j [k2 j ] , ∑ f [n]ψj,k [n]
n=−∞
∞
n − k2 j
1
= ∑ f [n] √ ψ
2j
(64)
n=−∞ 2j
where the scaling function φj,k [n] is associated with the wavelet function ψj,k [n]; full details
are given in [21].
Sci 2022, 4, 40 20 of 28
Let us now consider a special case of DWT, where the analyses (i.e., computation of
f˜(ω, t) (see Equation (50)) or f˜(s, t) (see Equation (54)) or their discrete samples) are made
directly from relevant integration with necessary values of time-frequency or time-scale
parameters, Around 1980, a new method for performing DWT was created, which is known
as Multiresolution Analysis (MRA). This method is completely recursive and is therefore
ideal for computation, as succinctly described below.
In MRA, we may think of level-1 DWT of f [n] as the output of two filters connected
in parallel, consisting of a low-pass filter with the impulse response g and a high-pass
filter with the impulse response h, as seen in Figure 4. This is known as the filter bank
implementation of DWT, consisting of different levels j. The cutoff frequency of each filter
in the filter bank equals to a half of the bandwidth of the respective input signal. Hence,
the output of each filter has a half of the bandwidth of the original sequence f [n] so that it
is subsampled by 2. That is,
∞
The average A1 [n] , ∑ f [k] g[2n − k] (65)
k =−∞
∞
The detail D1 [n] , ∑ f [k ] h[2n − k] (66)
k =−∞
Therefore, given a level-j DWT of a discrete-time signal f [n], if A j [k2 j ] in the sequence
of average coefficients is passed through a parallel combination of identically structured
filters g and h, then the output is a sequence of level-( j + 1) DWT of f [n] as seen in Figure 4.
The features associated with different frequency components of the signal f [n] can be
captured by using a multilevel wavelet decomposition of f [n] via iterative implementation
of filter banks in the setting of time and frequency localization (see, for example, [21,23,24]).
Example 8. Let us consider a function f (t) having a wavelet transform f˜(s, t), which can be
interpreted as the “details" contained at fixed scales s 6= 0. This interpretation is especially useful
in the discrete case for understanding the principles of MRA as seen below.
Let φ(u) be a zero-mean unit-variance probability density function, which has the following
properties:
• φ ( u ) ≥ 0 ∀ u ∈ R;
R∞
• du φ(u) = 1;
R−∞∞
• du φ(u) u = 0;
R−∞∞ 2
• −∞ du φ ( u ) u = 1;
Assuming that φ ∈ C n , i.e., φ is at least n times differentiable, where n ∈ N., it follows that
limn→±∞ φ(n−1) (u) = 0. Now letting ψn (u) , (−1)n φ(n) (u), we have
Z ∞ h i
du ψn (u) = (−1)n φ(n−1) (∞) − φ(n−1) (−∞) = 0
−∞
Sci 2022, 4, 40 21 of 28
Thus, ψn satisfies the admissibility condition in Equation (59) and hence can be used to define
a CWT.
For s 6= 0 and t ∈ R, let φs,t (u) = |s|−1 φ u− s
t
and ψ n ( u ) = | s |−1 ψn u−t . Then, φ
s,t s s,t
n is qualified to be a wavelet
is a probability density with mean t and standard deviation |s|; and ψs,t
n
family {ψ } by setting p = 1 in Equation (61).
As a numerically explicit example, let φ represent the zero-mean unit-variance Gaussian
exp(−u2 /2)
density, i.e., φ(u) = √ . Since φ ∈ C ∞ , n can be taken to be any positive integer. For
2π
u exp(−u2 /2) (u2 −1) exp(−u2 /2)
instance, ψ1 (u) = −φ(1) (u) = √ and ψ2 (u) = φ(2) (u) = √ , and so
2π 2π
2
on. Because of the shape of the graph, −ψ is popularly known as the Mexican hat mother wavelet,
which is often used in engineering applications.
where the (countable) sequence of (deterministic) functions {φn (t)} is a complete orthonormal set
of solutions to the following integral equation:
Z T/2
dτ KXX (t, τ )φn (τ ) = λn φn (t) ∀t ∈ [− T/2, T/2] (68)
− T/2
R T/2
and the random coefficients Xn , − T/2 dt X (t)φnher (t) are mutually statistically orthogonal, i.e.,
her
E [ Xn Xm ] = λn δmn with the Kronecker delta δmn
Remark 7. The deterministic functions φn (t) are orthonormal in the following sense:
Z T/2
her
dt φn (t)φm (t) = λn δmn
− T/2
Example 9 (K-L expansion of white noise). Let the covariance function of zero-mean stationary
white noise w(t) be Kww (t, τ ) = σ2 δ(t − τ ). Then, the orthonormal functions φn (t) satisfy the
K-L integral equation, for all n ∈ N, as:
Z T/2 Z T/2
2
dτ Kww (t, τ )φn (τ ) = σ dτ δ(t − τ )φn (τ )
− T/2 − T/2
R T/2
It is also true that −T/2 dτ Kww (t, τ )φn (τ ) = λn φn (t), which implies that λn φn (t) = σ2 φn (t)
∀n ∈ N. Thus, the choice of these orthonormal functions is arbitrary and all λn ’s are identically
equal to σ2 . It is concluded that, for any zero-mean white noise, the K-L expansion functions
{φn (t)} can be any set of orthonormal functions with all eigenvalues λn = σ2 .
Sci 2022, 4, 40 22 of 28
Example 10 (K-L expansion as an application to detection theory [25]). Let us assume that a
waveform X (t) is observed over a finite time interval [− T/2, T/2] to decide whether it contains a
recoverable signal buried in noise, or the signal is completely noise-corrupted (i.e., the signal cannot
be recovered). In this regard, we formulate a binary hypothesis testing problem with the hypothesis
H1 of having a recoverable signal and the hypothesis H0 of complete noise capture, i.e.,
s(t) + w(t) : if H1 is true
X (t) =
w(t) : if H0 is true
where the signal s(t) is a deterministic function of time, and the noise w(t) is modeled as zero-mean,
unit-variance, white Gaussian. Using the K-L expansion, we simplify the above decision problem
by replacing the waveform X (t) with a sequence { Xn }, which reduces to a sequence of simpler
problems as:
sn + ωn : if H1 is true
Xn =
ωn : if H0 is true
where sn and ωn are the respective (at most countably many) K-L coefficients of the signal s(t) and
noise w(t).
Now we take the K-L transform (instead of Fourier transform) of the received signal X (t),
where the transform space is the space of sequences of K-L coefficients that are mutually statistically
orthogonal random variables. By taking advantage of the facts that the noise is zero-mean Gaussian
and that the K-L coefficients are mutually statistically orthogonal, the random variables ωn become
jointly independent., i.e., {ωn } is a sequence of independent and identically distributed (iid) random
variables. By selecting the first orthonormal function as:
s(t)
φ1 (t) = qR
T/2 2
− T/2 dθ s ( θ )
we can complete the rest of the orthonormal set {φn (t)} in a valid way. We also notice that all of the
random coefficients sk , with the exception of s1 , will be zeros, i.e., Xt is affected by the presence or
absence of the recoverable signal. Thus, the distributed detection problem is reduced to the following
scalar detection problem:
( qR
T/2
X1 = − T/2 dθ s2 (θ ) + ω1 : if H1 is true
ω1 : if H0 is true
which is commonly referred to as a matching operation. In fact, this operation can be performed by
sampling the output of a filter whose impulse response is:
s( T − t)
h(t) = qR
T/2 2
− T/2 dθ s ( θ )
where the parameter T should be chosen sufficiently large to make the impulse response causal. The
output of the physically realizable filter at time T is then Xt . This filter is called a matched filter and
is widely used in the disciplines of communications and pattern recognition.
functional. The continuity (or boundedness) of linear functions implies that if two linear
bounded functions f and g are close to each other (i.e., k f − gk is small in the function
space), then f and g are also close to each other pointwise, i.e., | f (t) − g(t)| is also small
for all t.
The RKHS has many engineering and scientific applications, including those in har-
monic analysis, wavelet analysis, and quantum mechanics. In particular, functions from
RKHS have special properties that make them useful for function estimation problems in
high-dimensional spaces, which is critically important in the fields of statistical learning
theory and machine learning [17]. In fact, every function in RKHS that minimizes an
empirical risk functional can be expressed as a linear combination of the kernel functions
evaluated at the training points. This procedure potentially simplifies the handling of the
problem from infinite-dimensional to finite-dimensional.
We now present a formal definition of reproducing kernel Hilbert spaces (RKHS). The
presented theory is often applied to real-valued Hilbert spaces and can be extended to
complex-valued Hilbert spaces; examples of complex-valued RKHS are spaces of analytic
functions.
Definition 9 (Reproducing Kernel Hilbert Spaces). Let T be an arbitrary non-empty set (e.g.,
the time domain or the spatial domain of a function) and let H be a Hilbert space of real-valued
(resp. complex-valued) functions on T, equipped with pointwise vector addition and pointwise scalar
multiplication, and the continuous functions in H are evaluated at each point t ∈ T. Then, H
is defined to be a reproducing kernel Hilbert space (RKHS) if there exist a positive real Mt and a
continuous linear functional Lt on H such that |Lt ( f )| = | f (t)| ≤ Mt k f k H ∀t ∈ T ∀ f ∈ H.
Note: Although Mt is constrained to be a positive real, it is possible that supt∈T Mt = ∞.
Remark 8. Definition 9 is rather a weak condition to ensure the existence of an inner product
and the evaluation of every functional on H at every point in the domain T. From the application
perspectives, a more useful definition would be to construct an inner product of a given function
f ∈ H with another function Kt ∈ H, which is the so-called reproducing kernel function for the
Hilbert space H; the RKHS has taken its name from here.
To make Definition 9 more useful for many applications, we make use of Reisz rep-
resentation theorem (Theorem 6) which states that there exists a unique Kt ∈ H with the
following reproducing property for each f ∈ H, which takes values at any given t ∈ T as:
f ( t ) = Lt ( f ) = h Kt , f i H
Since, for a given t ∈ T, the function Kt ∈ H takes values in R (resp. C) and having
another Kτ ∈ H associated with the parameter τ ∈ T and a corresponding functional Lτ on
H, it follows that
Kt ( τ ) = Lτ ( Kt ) = h Kτ , Kt i H
The above situation can be interpreted as follows: Kτ is a time translation of Kt from t
to τ if the set T is the time domain of the functions in the Hilbert space. This allows us to
redefine the reproducing kernel of the Hilbert space H as a function K : T × T → R (resp. C)
as: K (t, τ ) , hKτ , Kt i H .
Example 11 (Bandlimited approximation of Dirac delta function in the RKHS setting). Let
us consider the space of continuous signals that are also band-limited with frequencies under the
compact support, i.e., in the range of [−2πΩ, 2πΩ], where the cutoff frequency Ω ∈ (0, ∞). It is
noted that Kt (•) is a bandlimited version of the Dirac delta function, because Kt (τ ) converges to
the delta distribution, expressed as δ(τ − t) in the weak sense, as the cutoff frequency Ω tends to
infinity.
Let us define T = R and H = f ∈ C0 ( T ) : supp( fˆ) ⊆ [−Ω, Ω] , where C0 ( T ) is the
space of continuous functions whose domain is T, and the Fourier transform of f is: fˆ(ξ ) ,
Sci 2022, 4, 40 24 of 28
exp(−i2πξt) f (t) and the inverse Fourier transform of fˆ(ξ ) is: f (t) , R dξ exp(i2πξt)
R R
R dt
fˆ(ξ ). Then, it follows by Cauchy-Schwarz inequality and Plancherel theorem that:
Z Ω Z Ω
| f (t)|2 ≤ dξ |ei2πξ |2 dξ | fˆ(ξ )|2 = 2Ωk f k2H
−Ω −Ω
√
i.e., | f (t)| ≤ 2Ω k f k H .
It follows from the relation: f (t) = Lt ( f ) = hKt , f i H , established earlier, that the functional
Lt and the RKHS kernel function Kt are bounded. Therefore, H is indeed an RKHS.
sin 2πΩ(τ −t)
By choosing the kernel function in this case as: Kt (τ ) = sinc 2πΩ (τ − t) , 2πΩ(τ −t) ,
and by taking limΩ→∞ Kt (τ ) = δ(τ − t), it follows that as Ω → ∞, the Fourier transform of the
kernel Kt (τ ) becomes
Z ∞
K̂t (ξ )) = dτ exp(−i2πξτ ) Kt (τ )
−∞
Z ∞
= dτ exp(−i2πξτ ) δ(τ − t)) = exp(−i2πξt)
−∞
Thus, the reproducing property of the kernel is established as the cutoff frequency Ω → ∞.
Author Contributions: Conceptualization, N.F.G., A.R. and W.K.J.; methodology, N.F.G., A.R. and
W.K.J.; software, N.F.G. and A.R.; formal analysis, N.F.G. and A.R.; model preparation and validation,
N.F.G. and A.R.; data curation, N.F.G. and A.R.; writing—original draft preparation, N.F.G., A.R. and
W.K.J.; writing—review and editing, N.F.G., A.R. and W.K.J.; funding acquisition, A.R. All authors
have read and agreed to the published version of the manuscript.
Funding: The reported work has been supported in part by the U.S. Air Force Office of Scientific
Research under Grant No. FA9550-15-1-0400, by the U.S. Army Research Office under Grant No.
W911NF-20-1-0226, and by the U.S. National Science Foundation under Grant no. CNS-1932130.
Findings and conclusions or recommendations, expressed in this publication, are those of the authors
and do not necessarily reflect the views of the sponsoring agencies.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Sci 2022, 4, 40 25 of 28
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or
in the decision to publish the results.
Example A1. A well-known example of a metric q space is the n-dimensional Euclidean space
(Rn , ρ), where n is a positive integer, ρ( x, y) , ∑in=1 | xi − yi |2 for every vector x = [ x1 , . . . , xn ] T
and y = [y1 , . . . , yn ] T in Rn .
Remark A1. The set X, upon which the metric ρ can operate, is an arbitrary nonempty set. The
conditions (i)–(iii) in Definition A1 are obvious if ρ operates on Rn . However, in general, there
can be other types of metric operators and X may not be Rn ; an example is the Hamming distance
defined on sets of symbol sequences, which is widely used in error correction theory to measure the
distance between two code words.
Definition A2 (Open and Closed Sets). A set E ⊆ X in a metric space ( X, ρ) is called open if,
for all y ∈ E, there exists an open ball Bε (y) , { x ∈ E : ρ( x, y) < ε}, which is of radius ε > 0
with center at y. A set F ⊆ X is called closed if the complement X \ F is open in ( X, ρ).
Then, the pair ( X, =) is called a topological space, and the members of = are called open
sets of ( X, =); if B is an open set in ( X, =), the complement of B (i.e., X \ B) is called a closed
set in ( X, =). If there is no confusion regarding =, then = is often omitted from ( X, =) and
only X is referred to as a topological space.
Remark A2. The largest σ-algebra of a nonempty set Ω is the collection of all subsets of Ω, which
is the power set 2Ω . On the other hand, the smallest σ-algebra consists of two sets φ and Ω, i.e.,
the indiscrete σ-algebra {φ, Ω}.
measurable space is a pair (Ω, F ), and a measure space is a triple (Ω, F , µ), where Ω is a
non-empty set, F is a σ-algebra of subsets of Ω, and µ is a measure on F . The sets in F are called
measurable sets.
Example A2. Let Ω = Rn , where n ∈ N and the Borel set B(Rn ) is the associated σ-algebra.
Then, µ : B(Rn ) → [0.∞] is called the n-dimensional Lebesgue measure, and Rn , B(Rn ), µ is
called the n-dimensional Lebesgue measure space. For n = 1, i.e., in the 1-dimensional real space
R, given an interval S ∈ B(R), the measure µ(S) is the length of the interval S. Similarly, for
two-dimensional (i.e., n = 2) and three-dimensional (i.e., n = 3) Lebesgue measures, µ(S) denotes
the area and volume measures, respectively.
Definition A12 (Measurable Functions). Let (Ω1 , F1 ) and (Ω2 , F2 ) be two measurable spaces.
A function f : (Ω1 , F1 ) → (Ω2 , F2 ) is called (F1 − F2 ) measurable if the inverse image
f −1 ( A) ∈ F1 ∀ A ∈ F2 . If Ω2 = R and F2 = B(R) then f is said to be Borel measurable.
where the superscript her, called Hermitian, indicates the complex conjugation of a complex
variable, or the complex conjugation of transpose of a complex vector/matrix.
A random process x(t) is called stationary (in the strict sense) if its statistics are not
affected by a time translation [25], i.e., x(t) and x(t + ε) have the same statistics for any real
number ε. A random process x(t) is said to be wide-sense stationary [7,25] if
1. The expected value E[ x (t)] is a constant for all t;
2. The autocorrelation r x (t, τ ) depends only on the difference (t − τ ), not explicitly on
both t and τ.
References
1. Bachman, G.; Narici, L. Functional Analysis; Academic Press: New York, NY, USA, 1966.
2. Naylor, A.; Sell, G. Linear Operator Theory in Engineering and Science, 2nd ed.; Springer-Verlag: New York, NY, USA, 1982.
Sci 2022, 4, 40 28 of 28
3. Rudin, W. Real and Complex Analysis; McGraw-Hill: Boston, MA, USA, 1987.
4. Royden, H. Real Analysis, 3rd ed.; Macmillan: New York, NY, USA, 1989.
5. Kreyszig, E. Introductory Functional Analysis with Applications; John Wiley & Sons: Hoboken, NJ, USA, 1978.
6. Bobrowski, A. Functional Analysis for Probability and Stochastic Processes; Cambridge University Press: Cambridge, UK, 2005.
7. Hayes, M. Statistical Digital Signal Processing and Modeling, 1st ed.; Wiley: Hoboken, NJ, USA, 1996.
8. Haykin, S. Adaptive Filter Theory, 4th ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2002.
9. Farhang-Boroujeny, B. Adaptive Filters Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2003.
10. Bressan, A. Lecture Notes on Functional Analysis with Applications to Linear Partial Differential Equations; American Mathematical
Society: Providence, RI, USA, 2013.
11. Reed, M.; Simon, B. Methods of Modern Mathematical Physics Part 1: Functional Analysis; Academic Press: Cambridge, MA,
USA, 1980.
12. Luenberger, D. Optimization by Vector Space Methods; John Wiley & Sons: Hoboken, NJ, USA, 1969.
13. Desoer, C.; Vidyasagar, M. Feedback Systems: Input-Output Properties; Academic Press: Cambridge, MA, USA, 1975.
14. Therrien, C. Discrete Random Signals and Statistical Signal Processing; Prentice Hall: Englewood Cliffs, NJ, USA, 1992.
15. Proakis, J.; Manolakis, D. Digital Signal Processing: Principles, Algorithms, and Applications, 3rd ed.; Macmillan Publishing Company:
New York, NY, USA, 1998.
16. Oppenheim, A.; Schafer, R. Discrete-Time Signal Processing; Prentice Hall: Englewood Cliffs, NJ, USA, 1989.
17. Vapnik, V.; Izmailov, R. Rethinking statistical learning theory: Learning using statistical invariants. Mach. Learn. 2019,
108, 381–423. [CrossRef]
18. Ghalyan, N.F.; Ray, A. Symbolic Time Series Analysis for Anomaly Detection in Measure-invariant Ergodic Systems. J. Dyn. Syst.
Meas. Control. 2020, 142, 061003. [CrossRef]
19. Ghalyan, N.F.; Ray, A. Measure invariance of symbolic systems for low-delay detection of anomalous events. Mech. Syst. Signal
Process. 2021, 159, 107746. [CrossRef]
20. Lorch, E. Spectral Analysis; Oxford University Press: New York, NY, USA, 1962.
21. Kaiser, G. A Friendly Guide to Wavelets; Birkhauser: Boston, MA, USA, 1994.
22. Mallat, S. A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed.; Academic Press: Amsterdam, The Netherlands, 2009.
23. Ray, A. On State-space Modeling and Signal Localization in Dynamical Systems. ASME Lett. Dyn. Syst. Control. 2022, 2, 011006.
[CrossRef]
24. Vetterli, M.; Kovacevic, J. Wavelets and Subband Coding; Prentice-Hall, Inc.: Hoboken, NJ, USA, 1995.
25. Stark, H.; Woods, J. Probability and Random Processes with Applications to Signal Processing; Prentice-Hall: Upper Saddle River, NJ,
USA, 2002.
26. Helstrom, C. Elements of Signal Detection and Estimation; Prentice Hall: Englewood Cliffs, NJ, USA, 1995.
27. Ash, R. Real Analysis and Probability; Academic Press: Boston, MA, USA, 1972.
28. Munkres, J. Topology, 2nd ed.; Prentice-Hall: Upper Saddle River, NJ, USA, 2000.
29. Shilov, G. Elementary Real and Complex Analysis; Dover Publication Inc.: Mineola, NY, USA, 1996.
30. Papoulis, A. Probability, Random Variables, and Stochastic Processes, 2nd ed.; McGraw-Hill, Inc.: Boston, MA, USA, 1984.