0% found this document useful (0 votes)
10 views28 pages

Sci 04 00040

This paper provides a concise tutorial on functional analysis and its applications in signal processing, emphasizing the importance of mathematical frameworks such as Hilbert and Banach spaces. It aims to make complex concepts accessible to non-specialists in the field, highlighting their relevance to statistical and adaptive signal processing. The tutorial covers essential topics, including spectrum estimation and linear prediction, and aims to bridge the gap between functional analysis and practical signal processing applications.

Uploaded by

Umangh Nagal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

Sci 04 00040

This paper provides a concise tutorial on functional analysis and its applications in signal processing, emphasizing the importance of mathematical frameworks such as Hilbert and Banach spaces. It aims to make complex concepts accessible to non-specialists in the field, highlighting their relevance to statistical and adaptive signal processing. The tutorial covers essential topics, including spectrum estimation and linear prediction, and aims to bridge the gap between functional analysis and practical signal processing applications.

Uploaded by

Umangh Nagal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Article

A Concise Tutorial on Functional Analysis for Applications to


Signal Processing
Najah F. Ghalyan 1,2, *, Asok Ray 1,3, * and William Kenneth Jenkins 4, *

1 Department of Mechanical Engineering, Pennsylvania State University, University Park, PA 16802, USA
2 Department of Mechanical Engineering, University of Kerbala, Kerbala 56001, Iraq
3 Department of Mathematics, Pennsylvania State University, University Park, PA 16802, USA
4 Department of Electrical Engineering, Pennsylvania State University, University Park, PA 16802, USA
* Correspondence: [email protected] (N.F.G.); [email protected] (A.R.); [email protected] (W.K.J.)

Abstract: Functional analysis is a well-developed field in the discipline of Mathematics, which


provides unifying frameworks for solving many problems in applied sciences and engineering. In
particular, several important topics (e.g., spectrum estimation, linear prediction, and wavelet analysis)
in signal processing had been initiated and developed through collaborative efforts of engineers
and mathematicians who used results from Hilbert spaces, Hardy spaces, weak topology, and other
topics of functional analysis to establish essential analytical structures for many subfields in signal
processing. This paper presents a concise tutorial for understanding the theoretical concepts of
the essential elements in functional analysis, which form a mathematical framework and backbone
for central topics in signal processing, specifically statistical and adaptive signal processing. The
applications of these concepts for formulating and analyzing signal processing problems may often
be difficult for researchers in applied sciences and engineering, who are not adequately familiar with
the terminology and concepts of functional analysis. Moreover, these concepts are not often explained
in sufficient details in the signal processing literature; on the other hand, they are well-studied in
textbooks on functional analysis, yet without emphasizing the perspectives of signal processing ap-
plications. Therefore, the process of assimilating the ensemble of pertinent information on functional
analysis and explaining their relevance to signal processing applications should have significant
Citation: Ghalyan, N.F.; Ray, A.;
importance and utility to the professional communities of applied sciences and engineering. The
Jenkins, W.K. A Concise Tutorial on
Functional Analysis for Applications
information, presented in this paper, is intended to provide an adequate mathematical background
to Signal Processing. Sci 2022, 4, 40. with a unifying concept for apparently diverse topics in signal processing. The main objectives of
https://doi.org/10.3390/sci4040040 this paper from the above perspectives are summarized below: (1) Assimilation of the essential
information from different sources of functional analysis literature, which are relevant to developing
Academic Editor: Ercan Kuruoglu
the theory and applications of signal processing. (2) Description of the underlying concepts in a
Received: 1 September 2022 way that is accessible to non-specialists in functional analysis (e.g., those with bachelor-level or
Accepted: 13 October 2022 first-year graduate-level training in signal processing and mathematics). (3) Signal-processing-based
Published: 21 October 2022 interpretation of functional-analytic concepts and their concise presentation in a tutorial format.
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in Keywords: statistical signal processing; adaptive signal processing; functional analysis
published maps and institutional affil-
iations.

1. Introduction
The concept of functional analysis is built upon normed vector spaces and particularly
Copyright: © 2022 by the authors.
inner product spaces, which are merged with diverse notions of topology and geometry,
Licensee MDPI, Basel, Switzerland.
linear algebra, probability theory, and real and complex analysis (see, e.g., [1–5]). Topics
This article is an open access article
in functional analysis include various concepts such as Banach spaces and Hilbert spaces,
distributed under the terms and
and linear operators and their spectral theory, as well as group and semigroup theory.
conditions of the Creative Commons
Attribution (CC BY) license (https://
Knowledge of these mathematical structures is often essential for understanding and
creativecommons.org/licenses/by/
solving a variety of analytical problems in signal processing and related fields, as well as
4.0/). in mathematics itself [5]. For example, in functional analysis, objects like functions are

Sci 2022, 4, 40. https://doi.org/10.3390/sci4040040 https://www.mdpi.com/journal/sci


Sci 2022, 4, 40 2 of 28

considered as elements or points in a space of functions [6], and hence the name functional
analysis.
Results, generated from functional analysis, form key concepts in the frameworks of
advanced scientific and engineering disciplines that include the fields of statistical signal
processing and adaptive signal processing. Although adaptive signal processing can be
viewed as a branch of statistical signal processing [7], the special properties of this field and
their roles in engineering applications have led many specialists to consider them as two
separate fields. Therefore, in many universities and research institutions around the world,
statistical signal processing and adaptive signal processing are taught as independent
graduate courses in engineering and applied sciences, and many textbooks have been
devoted to study these important fields individually (e.g., see [8,9] and references therein).
Nevertheless both statistical signal processing and adaptive signal processing form the
backbone of the so-called modern signal processing, in which signals are generally considered
as random processes. Modern signal processing covers many topics of current interest, such
as signal modeling and estimation, signal prediction, signal compression, adaptive lattice
filtering, adaptive joint process estimation, recursive least squares lattice filtering, and
spectrum estimation. The issues, related to processing of both deterministic and random
signals, are further discussed below.
While an estimation error may typically converge to zero for deterministic signals,
this is generally not the case for random signals [8]. Therefore, in statistical and adaptive
signal processing, it is a common practice for random signals to make them unbiased (i.e.,
expectation of the estimation error converging to zero). As explained later, this type of
convergence is of a special kind, which is known in functional analysis as weak convergence
(see, for example, [10,11]). Therefore, many important results in functional analysis are
obtained in terms of weak convergence and weak topology, which potentially have significant
implications to the subfields of statistical signal processing and adaptive signal processing.
Moreover, it is usually desirable in estimation theory to identify optimal filters, which
bridges the discipline of signal processing to that of optimization theory. To this end,
researchers in modern signal processing often deal with random processes for which
optimization problems become more challenging, and the usage of advanced mathematical
tools is justified.
From a historical perspective, the names of some of the spaces used in functional
analysis are those of early-time mathematicians who had originally developed the theories
of these spaces. Indeed much of the theoretical work has been associated with the names
of eminent mathematicians (e.g., Gauss, Lagrange, Euler, and Kolmogorov). In fact, the
Hilbert space, which is a central topic in functional analysis, is one of the most commonly
used mathematical frameworks of signal processing and the associated optimization [12].
The unique features of Hilbert spaces are explained in the paper from these perspectives.
However, the names of other well-known spaces (e.g., metric spaces and normed spaces)
were given based on the technical properties of these spaces; many of the spaces, fre-
quently used in functional analysis, have been named based on quite different historical
backgrounds.
We have presented a concise and focused review of key concepts of functional anal-
ysis in this paper, which have strong relevance to modern signal processing. The most
important spaces from the perspectives of functional analysis, considered in this paper, are
metric/topological spaces, Banach spaces, and Hilbert spaces. The relations among these
and other spaces are illustrated in Figure 1. Other relevant vector spaces like summable
(` p ), Lebesgue-integrable (L p ), and Hardy (H p ) spaces are also introduced in the paper.
Sci 2022, 4, 40 3 of 28

Figure 1. Relationship among different spaces in functional analysis.

The paper is organized in four sections, including the current section, and an Appendix A.
Section 2 introduces Banach spaces and their relevant theorems, where special emphases
are laid on the ` p /L p spaces, H p spaces, spectral factorization, and weak topology in the
setting of Banach spaces. Section 3 presents Hilbert spaces and their relevant features (e.g.,
Fourier series expansion and the orthogonality principle) along with some applications to
signal processing and detection theory, such as wavelets, Karhunen-Loéve (KL) expansion,
and reproducing-kernel Hilbert spaces (RKHS). Section 4 summarizes and concludes the
paper. The Appendix A in this paper introduces elementary concepts and definitions
in real analysis, probability theory, and topological spaces, which should be helpful for
understanding the fundamental principles of functional analysis as applied to various
concepts of signal processing; however the readers, who are familiar with these concepts,
may only selectively refer to the Appendix A.

2. Banach Spaces for Signal Analysis


This section deals with Banach spaces for general applications to signal processing;
it also introduces the concepts of Hardy spaces especially for digital signal processing.
Further details on Banach spaces are provided in standard books on functional analysis
such as Bachman and Narici [1] and Naylor and Sell [2].

2.1. Introduction to Banach Spaces


We start this subsection with the definition of a Banach space which is a complete
normed space as defined below.

Definition 1 (Banach Spaces). Let a vector space X be defined over a field K, where examples of
K are the field of real numbers (R, +, ·) and the field of complex numbers (C, +, ·). Let a function
k • k : X → R, called norm and denoted as x 7→ k x k, have the following properties:
• (positivity) ∀ x ∈ X, k x k ≥ 0, ∀ x ∈ X, and k x k = 0 if and only if x = 0.
• (homogeneity) ∀ x ∈ X and ∀c ∈ K, kcx k = |c|k x k.
• (triangular inequality) ∀ x, y ∈ X, k x + yk ≤ k x k + kyk.

Then, X, k • k is called a normed vector space, where the norm k • k serves as a metric.
A real (resp. complex) normed linear space that is complete (i.e., where every Cauchy sequence
converges in the space) is called a real (resp. complex) Banach space.
Sci 2022, 4, 40 4 of 28

Example 1. The spaces of ` p sequences, 1 ≤ p ≤ ∞, form an important class of Banach spaces,


which are extensively used in digital signal processing. These are linear vector spaces of all real
(resp. complex) sequences x , { xn } such that ∑∞
n=−∞ | xn | < ∞, where the ` -norm is defined as:
p p

 ∞ 1/p
k x k` p , ∑ | xn | p
if 1 ≤ p < ∞
n=−∞

k x k`∞ , sup | xn | if p = ∞
n ∈Z

Some of the theorems on ` p spaces [2], which are extensively used in the analyses of
discrete-time signals, are presented below.

Theorem 1 (Hölder Inequality [2]). Let 1 < p < ∞ and 1


p + 1
q = 1. If f ∈ ` p and g ∈ `q , then
f · g ∈ `1 and k f · gk`1 ≤ k f k` p k gkl q .

Proof. See pp. 550–551 in Naylor and Sell [2].

It is noted that Hölder Inequality also holds for p = 1 and p = ∞.

Theorem 2 (Minkowski inequality [2]). If 1 ≤ p ≤ ∞ and f , g ∈ ` p , then k f + gk` p ≤


k f k` p + k gk` p .

Proof. See pp. 550–551 in Naylor and Sell [2].

It is noted that Lebesgue-integrable versions of ` p spaces, for applications to con-


tinuous signal processing, are called L p spaces [4]. In L p spaces, Hölder inequality and
Minkowski inequality are similar to their respective ` p -versions in Theorems 1 and 2.

Next we focus on a few systems-theoretic applications of Banach spaces, which would


require the operation of convolution.
Theorem 3 (Convolution inequality [13]). For the sequences u ∈ ` p for p ∈ [1, ∞] and h ∈ `1 ,
the convolution product h ∗ u ∈ ` p and kh ∗ uk` p ≤ khk`1 kuk` p .

Proof. See p. 241 in Desoer and Vidyasagar [13].

Lemma 1 (Barbalat Lemma). If { xn } ∈ l p for some p ∈ [1, ∞), then limn→∞ | xn | = 0.

Proof. Let us assume that limn→∞ | xn | 6= 0 is true. Then, there exists a subsequence {| xn j |}
bounded below by a real number ε > 0, which implies that {| xn j | p } is bounded below by
ε p so that ∑n | xn | p → ∞ as n → ∞. This contradicts the assertion { xn } ∈ ` p .

Let a linear discrete-time dynamical system with an impulse response h[n, k ] be excited
by an input signal u to yield an output signal y.

Definition 2 (BIBO-stability). A system is said to be bounded-input-bounded-output (BIBO)-


stable if every u ∈ `∞ ⇒ y ∈ `∞ . More generally, the system is called ` p -stable if u ∈ ` p ⇒ y ∈
` p , where p ∈ [1, ∞].

For a linear shift-invariant (LSI) system, the impulse response h[n, k] takes the form
h[n − k], where the output is given by the convolution y = h ? u as [14]:
∞ ∞
y[n] = ∑ h[n − k]u[k] = ∑ u[n − k]h[k] (1)
k =−∞ k =−∞

Using Theorem 3, if h ∈ `1 and u ∈ ` p for some p ∈ [1, ∞], then it follows that [13]:
Sci 2022, 4, 40 5 of 28

k y k ` p ≤ k h k `1 k u k ` p
It is noted that h ∈ `1 is a sufficient condition for the system to be ` p -stable. Further-
more, using Lemma 1, it follows that if y ∈ ` p for some p ∈ [1, ∞), then y[n] → 0 as n → ∞.
This information is useful, for example, in the design of a linear shift-invariant estimation
system, where the output signal represents the estimation error. If the system impulse
response is h ∈ `1 , then the estimation error is bounded and converges asymptotically to
zero if the input signal u ∈ ` p for some p ∈ [1, ∞).

Example 2 (Adaptive Filtering). In a general setting, let us consider an adaptive filtering problem
 T
in Figure 2, where a measurement vector x[n] , x1 [n], x2 [n], . . . , x N [n] is used to construct an
estimate, d̂[n] , (h ? x)[n], of the desired signal d[n] by a linear shift-variant filter h[n] [7]. Then,
 an adaptive algorithm to update the filter h[n] such that the estimation error
the task is to synthesize
e[n] , d[n] − d̂[n] → 0 as n → ∞. Using Lemma 1, this could be achieved if e ∈ ` p for some
p ∈ [1, ∞) in the adaptive algorithm.

Figure 2. An adaptive filter consisting of a shift-variant filter h with an adaptive algorithm for
updating the filter coefficients.

If a dynamical system at any time n does not depend on the future (i.e., the system
is only dependent on the past and the present) input(s), then the system is said to be
causal [15] and the convolution in Equation (1) reduces to
n
y[n] = ∑ h[n − k]u[k] (2)
k =−∞

If, in addition, u[k] = 0 ∀k < 0, then it follows that


n
y[n] = ∑ h[n − k]u[k] (3)
k =0

2.2. Hardy Spaces and Spectral Factorization for Signal Processing


This subsection introduces the concept of Hardy spaces H p , 1 ≤ p ≤ ∞, which
constitute a class of Banach spaces with a special structure; this structure is very useful
for digital signal processing [3]. In particular, H2 and H∞ spaces are of importance in
robust control theory and it will be seen later in this section that the H1 space also plays an
important role for power spectrum factorization in digital signal processing.
Recalling that, for a linear shift-invariant system with an impulse response h[n] and
input u[n], the output y[n] is obtained by convolution [14] as: y = h ∗ u. Then, by setting
z , eiω where ω is the frequency in radians, the z-transform of the impulse response h is
defined as:

H (z) = ∑ h [ n ] z−n (4)
n=−∞
Sci 2022, 4, 40 6 of 28

which is known as the system transfer function (The one-dimensional z-transform of the
N (z)
discrete-time impulse response h[k ] is the ratio of two polynomials: H (z) , D(z) , where
the degree of N (z) is less than or equal to that of D (z) for physically realizable systems.
However, for the multi (i.e., n)-dimensional z-transform, where n ∈ N/{1}, the resulting
transfer function is given as the ratio of the numerator and denominator multinomials:

N ( z1 , · · · , z n )
H (z) ,
D ( z1 , · · · , z n )

The analysis of multi-dimensional z-transform (e.g., in signal processing of spatio-


temporal processes) is significantly more complicated than that of one-dimensional z-
transform, because the fundamental theorem of algebra may not be applicable to multino-
mials while it is always applicable to polynomials.) in the z-domain.
The system H (z) is stable if the sum in Equation (4) converges, and the region of
convergence (ROC) is called the stablity region, where all poles of H (z) are located inside
the unit circle with its center at zero in the complex z-plane. The system is said to be
minimum-phase if all zeros of H (z) are located inside the unit circle. If all zeros of H (z) are
located outside the unit circle, then the system is called maximum-phase [16], and the system
is called non-minimum-phase if at least one zero of H (z) is located outside the unit circle.

Definition 3 (Analytic Functions). Let Dr (z0 ) , {z ∈ C : |z − z0 | < r } be the open disc of


radius r > 0 with center at z0 ∈ C. A complex-valued function f (reiθ ), where θ ∈ [0, 2π ), is said
to be analytic in Dr (z0 ) if the derivative of f (reiθ ) exists at each point of Dr (z0 ).

Given p ∈ [1, ∞], the Hardy space H p is a set of analytic functions f (reiθ ) with
bounded H p -norm defined as:
Z 2π
1 1
k f k H p , sup dθ | f (reiθ )| p for p ∈ [1, ∞)
r ∈(0,1) 2π 0

k f k H ∞ , sup | f (z)|, for p = ∞


|z|<1

The following theorem, due to Paley and Wiener [14], presents a fundamental result
in the H1 -space, which is important for spectral factorization in signal processing and for
innovation representation of random processes.

Theorem 4 (Paley-Wiener). Let S(z) be a complex-valued function of the complex variable z. If


ln(S) ∈ H 1 , then there exists a real positive constant K0 and a complex-valued function Hca (z)
corresponding to a causal stable system with a causal stable inverse such that
her
S(z) = K0 Hca (z) Hca (1/z) (5)

where the superscript “her" indicates the Hermitian, i.e., complex conjugate of transpose of a
vector/matrix, and z is the complex conjugate of z. If, in addition, S(z) is a rational polynomial,
the above factors Hca (z) and Hca her (1/z ) are minimum-phase and maximum-phase components,

respectively. This is called the Paley-Wiener condition.

Proof. The proof of the Paley-Wiener Theorem is given in details by Therrian [14].

It follows from Equation (1) that, for a linear shift-invariant stable system with a
deterministic LSI impulse response h[n] and a wide sense stationary (WSS) input signal
u[n], the expected value of the output y[n] is:

E[y[n]] = ∑ h[n − k] E[u[k]] (6)
k =−∞
Sci 2022, 4, 40 7 of 28

Since the input u is WSS, expected values, my and mu , of the output y and input u,
respectively, are related as:
 ∞ 
my = ∑ h[k ] mu (7)
k =−∞

Autocorrelation of a random vector x[k] is denoted as r xx [k] , E x[k]xher [k] , and the
 

cross-correlation between the output y and the input u is given by



ryu [n1 , n0 ] = ∑ h[n1 − k] ruu [k − n0 ] (8)
k =−∞

The above equation leads to the following important relations between correlation
functions [14]:

ryu [`] = h[`] ∗ ruu [`] and ryy [`] = h[`] ∗ ruu [`] ∗ hher [−`] (9)

where the superscript her indicates the Hermitian, i.e., the complex conjugate of transpose
of a vector/matrix.
The Fourier transform of r xx [k] for a WSS random sequence x[k ] is called the power
spectral density function [7], defined as:

Sxx (eiω ) , ∑ e−iωk r xx [k] (10)
k =−∞

and its inverse Fourier transform, which is equal to the autocorrelation function, is
obtained as:
1
Z π
r xx [k] = dω eiωk Sxx (eiω ) (11)
2π −π
The z-transform of the autocorrelation function for a WSS random sequence x[n] is
called the complex spectral density function and is defined as:

Sxx (z) , ∑ r xx [k]z−k (12)
k =−∞

and its inverse is given by the contour integral

1
I
r xx [k ] = dz zk−1 Sxx (z) (13)
2πi C

Since the autocorrelation function of a zero-mean white noise with variance σw2 is given
by rw , σw2 δ[k], the power spectral density is a constant σw2 for a stationary white noise.
Using the property that the convolution in the time domain is a product in the Fourier
transform domain and using Equation (9), it follows that

Syx (eiω ) = H (eiω )Sxx (eiω ) (14)

where H (eiω ) is the system transfer function (i.e., the Fourier transform of h[k ]). A few
algebraic computations yield the following relation [14]:

Syy (eiω ) = H (eiω )Sxx (eiω ) H her (eiω ) (15)

In a similar manner, the following relations are obtained for the complex spectral
density

Syx (z) = H (z)Sxx (z)


and Syy (z) = H (z)Sxx (z) H her (1/z) (16)
Sci 2022, 4, 40 8 of 28

Let us consider a WSS random sequence { x [k]} whose complex spectral density
satisfies the Paley-Wiener condition:
Z π
`n(Sxx ) ∈ H 1 , i.e., |`n Sxx (eiω )| dω < ∞ (17)
−π

Then, by Theorem 4, there exists a real positive constant K0 and a complex-valued


transfer function Hca (z) of a causal stable system with a causal stable inverse such that
her
r xx (z) = K0 Hca (z) Hca (1/z) (18)

Remark 1. A process, whose (complex) spectral density satisfies Equation (17), is called a regular
process (see [7,14]). The spectral density factorization given by Equation (18) has important
applications in signal processing. This includes what is called innovations representation of the
random process [14], in view of which, any regular process can be realized as the output of a causal
linear filter Hca (z) driven by a white noise with variance K0 as shown in Figure 3.
It is worth-mentioning that this type of process covers a wide range of random processes. In
particular, any process whose complex spectral density is a rational function of z is a regular process.

Figure 3. Innovations representation of a random process. (a) Signal model. (b) Inverse filter.

Example 3 ([14]). Consider a random sequence x [n] with a complex spectral density function:
−(1/a)
Sxx (z) =
z − ( a + 1/a) + z−1
which could be re-written as:
1
Sxx (z) =
− az + (1 + a2 ) − az−1
1 1
= .
(1 − az−1 ) (1 − az)

Using Paley-Wiener Theorem, x[n] can be realized as the output of a causally stable system,
given by:
1
Hca (z) = (19)
1 − az−1
excited by a zero-mean white noise with unit variance σ2 = 1. It is important to note that since
Sx (z) is a rational polynomial, Hca (z) should be minimum-phase. This is the case for the one given
by Equation (19).
Since the function can be factored as:
1 1 1
Sxx (z) = 2 − 1
= . −1
− az + (1 + a ) − az ( z − a ) (z − a)
a possible pitfall here is to choose
Sci 2022, 4, 40 9 of 28

1 z −1
Hca (z) = = (20)
z−a 1 − az−1
The term in Equation (20) is not minimum-phase because it has a zero at |z| = ∞. Moreover,
−1 ( z ) = z − a is not causal. Therefore, the spectral factorization with H ( z ) given
the inverse Hca ca
by Equation (20) is not physically realizable for the given random sequence { x [k]}.

As mentioned before, any random process whose complex spectral density is a rational
polynomial is a regular process, and therefore it satisfies the Paley-Wiener condition.
However, this is not a necessary condition for being a regular process as seen in the
following example.

−1
Example 4 ([14]). Let a random sequence x [n] have a complex spectral density Sx (z) = ez+z .
Then, the corresponding power spectral density Sx (eiω ) = e2 cos ω satisfies the Paley-Wiener
condition that is given as:

−π dω |` n S x ( e )| = −π dω |2 cos ω | < ∞

Rπ Rπ

Therefore, the given random sequence is regular and has an innovations representation. The
spectral factorization can be done as follows:
−1
Sx (z) = 1 . ez . ez

Then, the causal factor is given by


−1
Hca (z) = ez (21)
1
which converges everywhere except at z = 0. The impulse response of the filter is: hca [n] = n! U [ n ],
where U [n] is the unit (discrete) step function, because

−1 1 −k
Hca (z) = ez = ∑ k!
z
k =0

So, the given random sequence can be realized as the output of a system, with a transfer
function given by Equation (21), which is driven by a zero-mean white noise with a unit variance
(i.e., σ2 = 1).

In fact, a regular process is related to the corresponding predictable process that can
be predicted with zero error. The relation between these two processes are given by the
following fundamental theorem [7].

Theorem 5 (Wold Decomposition Theorem). A general random sequence x[n] can be written
as the sum of two processes as:
x [ n ] = xr [ n ] + x p [ n ] (22)
where xr [n] is a regular process and x p [n] is a predictable process, with xr [n] being orthogonal to
x p [n], i.e., E{xr [m]xher
p [ n ]} = 0 ∀ m, n.

Proof. The proof is given in [7].

2.3. Weak Topology in a Banach Space


It follows from the Appendix A that an appropriate collection of open sets in a metric
space defines its topology, and such a topology is called a metric topology or strong topology.
In fact, a base for the strong topology on a Banach space X is the collection of all open balls,
i.e., sets of the form:
{ f ∈ X : k f − g k < r }, (23)
Sci 2022, 4, 40 10 of 28

where the center g is a vector/function in X and the radius r is a positive real number.
In this topology, convergence of a sequence, { f n }, of functions in X to a limit g in X
is referred to as strong convergence, which implies that k g − f n k → 0 and is denoted by
s
f n → g. Besides strong convergence, other notions of convergence (e.g., weak convergence
and uniform convergence) have been introduced in the literature, which play significant roles
in the theory of Banach algebra [1].
We now introduce the notions of weak convergence and weak topology. Given a
Banach space X over a field K, let F , { F1 , F2 , · · · } be a set of bounded linear functionals
(A functional is a mapping of a vector space X into its field K. Then, the set of all linear
bounded (equivalently, linear continuous) functionals in X is called the dual space X ∗ .) on
X, i.e., each Fi is an element in the dual space X ? and hence F ⊂ X ? . Given an ε > 0 and a
vector/function f 0 ∈ X, let us define the set:

Ω(F ; f 0 , ε) , { f ∈ X; | Fi ( f ) − Fi ( f 0 )| < ε, ∀ Fi ∈ F } (24)

A class of such sets is obtained by varying ε → 0+ in Equation (24) to establish the


notions of weak convergence and weak topology. Some of these convergence concepts in
the space of linear bounded operators are briefly explained in the following definitions,
which are introduced for different notions of convergence of sequences { T k } of bounded
linear operators in Banach spaces.

Definition 4 (Convergence in operator norm or uniform convergence). Let T k ∈ BL(V, V )


be a bounded linear operator from V into V. Then, the sequence { T k } converges to some T ∈
BL(V, V ) in the operator norm (also called uniform convergence) if the induced norm k( T −
u
T k )kind , limk→∞ supk xkV =1 k( T − T k ) x kV = 0, which is denoted as: T k → T.

Definition 5 (Strong convergence). Let T k ∈ BL(V, V ) be a bounded linear operator from V


into V. Then, the sequence { T k } converges strongly to some T ∈ BL(V, V ) if limk→∞ k( T −
s
T k ) x kV = 0 ∀ x ∈ V, which is denoted as T k → T.

Definition 6 (Weak convergence). Let T k ∈ BL(V, V ) be a bounded linear operator from V into
V. Then, the sequence { T k } converges weakly to some T ∈ BL(V, V ) if

∀ F ∈ V ? ∀ x ∈ V, lim | F ( Tx ) − F ( T k x ) | = 0,

k→∞

w
which is denoted as T k → T.

Remark 2 (Convergence in operator norm). ⇒ (Strong convergence) ⇒ (Weak Convergence).


The converse is not true, in general.
u s
To show (T k → T) ⇒ (T k → T), we proceed as:
u
∀ x ∈ V, k( T − T k ) x kV ≤ k( T − T k )kind k x kV implies that, given T k → T, i.e., limk→∞ k( T −
s
T k )kind = 0, it follows that limk→∞ k( T − T k ) x kV = 0 ∀ x ∈ V, i.e., T k → T.
s w
To show (T k → T) ⇒ (T k → T):, we proceed as:
s
T k → T ⇒ limk→∞ k( T − T k ) x kV = 0 ∀ x ∈ V. Let f ∈ V ? ; then, it follows from linearity
and boundedness of the functional f that f ( x ) ≤ k f kind k x kV . Therefore, ∀ x ∈ V ∀ f ∈
w
V ? , limk→∞ f ( Tx ) − f ( T k x ) = 0 ⇒ T k → T


We demonstrate the falsity of the converse by two counterexamples, one for each case.
(Strong convergence) ; (Convergence in operator norm): Let us define x , {ξ n : n ∈ N}
and a sequence of bounded linear operators T k : `2 → `2 ∀k ∈ N as:

T k x , {0, 0, 0, · · · , 0, ξ k+1 , ξ k+2 , · · · }


| {z }
f irst k terms
Sci 2022, 4, 40 11 of 28

Therefore, T k is a bounded linear operator, i.e., T k ∈ BL(`2 , `2 ). Since x ∈ `2 , it follows that

lim k T k x k`2 = 0 ⇒ lim k T k k =s 0 BL(`2 ,`2 )


k→∞ k→∞

k xk
However, the limit may not converge in the induced norm, limk→∞ supk xk`
2
=1 k T `2 =1
as seen by choosing x = {0, 0, 0, · · · , 0, ξ k+1 , ξ k+2 , · · · } with k x k`2 = 1 ⇒ limk→∞ kTk k 6=u
| {z }
f irst k terms
0 BL(`2 ,`2 ) .
Therefore, (Strong convergence) ; (Convergence in operator norm).

(Weak convergence) ; (Strong convergence): Let us define a sequence of bounded linear


operators T k : `2 → `2 ∀k ∈ N as:

T k x = {0, 0, 0, · · · , 0, ξ 1 , ξ 2 , · · · }
| {z }
f irst k terms

where x , {ξ n : n ∈ N}. It is given that { T k } is a sequence of bounded linear operators, i.e., each
T k ∈ BL(`2 , `2 ). Furthermore, in this Hilbert space setting, it follows from the Riesz Representation
Theorem that every f ∈ `2? can be represented as:

f ( x ) = h x, yi`2 = ∑ ξ n ηn , where y = {ηk : k ∈ N}
n =1

It follows by Cauchy-Schwarz inequality that, as k → ∞


∞ ∞
| f ( T k x )|2 = |h T k x, yi|2 ≤ ∑ | ξ n |2 ∑ | ηm |2 → 0
n =1 m = k +1

However,
k T k x k`2 = k x k`2 ∀k ∈ N ⇒ ∃ x 6= 0`2 such that
lim k T k x k`2 6= 0 ⇒ lim T k 6=s 0 BL(`2 ,`2 )
k→∞ k→∞

Therefore, (Weak convergence) ; (Strong convergence).

Remark 3. It is noted that, for finite-dimensional vector spaces, the notions of strong convergence
and weak convergence are indistinguishable. Equivalently, we make the following statement:
In a finite-dimensional Banach space V, the weak topology generated by V ? is the same
as the strong topology generated by V.
However, in the analysis of stochastic processes, we deal with infinite-dimensional spaces of
signal functions, which may not have the same criteria for weak convergence and strong convergence.
This is especially applicable to statistical signal processing, where the expectation of the estimation
error is required to weakly converge to zero without having the strong convergence of the error signal
itself to zero.

Based on the concept of weak convergence, weak topology is defined as follows:

Definition 7 (Convergence in weak topology). Given a Banach space X, let there be a class of
bounded linear functionals F ⊆ X ? , and let =(F ) be the topology in X generated by F . Then,
for a given vector/function g ∈ X, a sequence { f n } ⊂ X is said to converge to g in the weak
w
topology =(F ), denoted as f n → g in =(F ), provided that Fα ( f n ) converges strongly to Fα ( g),
s
denoted as Fα ( f n ) → Fα ( g) ∀ Fα ∈ F .
Sci 2022, 4, 40 12 of 28

Weak convergence in Definition 7 is a generalization of weak convergence as in-


troduced in the functional analysis literature, which implies that a sequence { f n } ⊂ X
s
converges weakly to some g ∈ X if G ( f n ) → G ( g) ∀ G ∈ X ? [10].

Remark 4. The concept of topological spaces and weak topology are important for learning using
statistical invariants (LUSI). In a machine learning paradigm, learning machines often compute
statistical invariants for specific problems with the objective of reducing the expected values of
errors in a such way that preserves these invariants. In contrast to classical machine learning that
employs the mechanism of strong convergence for approximations to the desired function, LUSI can
significantly increase the rate of convergence by combining the mechanisms of strong convergence
and weak convergence [17]. Furthermore, the notion of weak topology is also important when dealing
with shift spaces for signal analysis that uses symbolic dynamics, as explained in [18,19].

3. Hilbert Spaces for Signal Processing


This section introduces the concept of Hilbert spaces, which forms the backbone in
the disciplines of signal processing and other fields of engineering. Details are provided in
many textbooks such as Naylor and Sell [2].

Definition 8 (Hilbert Spaces). Let a vector space X be defined over a field K, which is either R or
C. A function h•, •i : X × X → K is called an inner product if, for ∀ x, y, z ∈ X and ∀α ∈ K, the
following conditions hold:
1. (positive definiteness) h x, x i > 0 when x 6= 0;
2. (additivity) h( x + y), zi = h x, zi + hy, zi;
3. (homogeneity) hαx, yi = αh x, yi;
4. (symmetry) h x, yi = hy, x i

Then, X, h•, •i is called a inner product space or a pre-Hilbert space, and a complete
inner product space (i.e., where every Cauchy sequence converges in the space) is called a real (resp.
complex) Hilbert space, depending on whether the vector space is defined over R (resp. C).

The following two properties are immediate consequences of the four properties in
Definition 8:
• h x, (y + z)i = h x, yi + h x, zi;
• h x, αyi = αh x, yi;
It p
is also noted that every inner product space is a normed space with the norm
k x k , h x, x i ∀ x ∈ X [2].

Example 5. An example of a Hilbert space is the `2 space of square-summable sequences. Given


two sequences x = { xn } and y = {yn } in `2 , the inner product is given by h x, yi , ∑∞
n=−∞ x n yn .
Two vectors x and y in a Hilbert space H are said to be orthogonal if h x, yi = 0. Given a subspace
V ⊂ H, its orthogonal complement is denoted as: V ⊥ , {u ∈ H : hu, vi = 0 ∀v ∈ V };
consequently, V ⊕ V ⊥ = H

Hilbert spaces have many common interesting properties that make them to be im-
portant in optimization theory [12]. As we will see in the sequel, these properties form the
core of many fundamental results in adaptive and statistical signal processing, and they are
established through the following theorem.

Theorem 6 (Riesz Representation Theorem [2,5]). Let H be a a Hilbert space. Then, for every
bounded linear functional f : H → C, there exists a unique y ∈ H such that f ( x ) = hy, x i H
∀ x ∈ H.

Proof. The proof is given in [2] in pp. 345–346.


Sci 2022, 4, 40 13 of 28

Remark 5. For the Hilbert spaces `2 (resp. L2 ), this result can be obtained by using a theorem [5],
which states that, given p ∈ [1, ∞), `q (resp. Lq ) is isometrically isomorphic to the dual space of ` p
1 1
(resp. L p ) provided that + = 1, where q is called the conjugate of p. Since the conjugate of
p q
∗
p = 2 is q = 2, it follows that `2 is isometrically isomorphic to `2 , and similar relations hold
for L2 and ( L2 )∗ (for example, see [3]); hence, `2 and L2 are reflexive. Generalization of this fact is
stated as the following theorem.

Theorem 7. Every Hilbert space is reflexive, i.e., H is isometrically isomorphic to its dual space
H∗.

The proof of this theorem is given in many textbooks on functional analysis


(e.g., [5,10,20]).
Another important property of Hilbert spaces, which is widely used in signal process-
ing in combination with the previous two properties (see Theorens 6 and 7), is given as the
following theorem [5]:

Theorem 8 (Orthogonal Projections). Let H be a Hilbert space, and let V ⊂ H be a closed


subspace of H, implying that V is also a Hilbert space. Then, it follows that
(i) H = V ⊕ V ⊥ . That is, given x ∈ H, there exists a unique pair v ∈ V and u ∈ V ⊥ such that
x = u + v.
(ii) vV ( x ) is the unique vector in V having minimal distance from a vector x ∈ H, while uV ⊥ ( x ) is
the unique vector in V ⊥ having minimal distance from x.
(iii) The orthogonal projections x 7→ vV ( x ) and x 7→ uV ⊥ are linear continuous operators, with
norm ≤ 1.

Proof. The proof is given in [2] in pp. 300–305.

Remark 6. It follows from Theorem 8 that the decomposition in Equation (22) in Section 2 is indeed
unique. Based on this fact, any random process generally consists of two unique orthogonal compo-
nents; a predictable component and an unpredictable component. That is, if one wants to predict x [n]
by using N past observations { xn− N , xn− N +1 , . . . , xn−1 }, then let x̂ [n] = ∑kN=1 ak x [n − k] denote
an optimal linear prediction of x [n]. Such a prediction can be obtained by applying the orthogonality
principle, where the prediction error is given by e[n] , x [n] − x̂ [n] = x [n] − ∑kN=1 ak x [n − k ],
and the process x [n] can be expressed as:
 N 
x [n] = x̂ [n] + e[n] = x̂ [n] + x [n] − ∑ ak x [n − k ]
k =1

Hence, the part x̂ [n] represents the predictable part of x [n], which corresponds to x p [n] in
Wold decomposition Theorem in Equation (22), while the error e[n] represents the unpredictable part
of x [n], which corresponds to xr [n] in Wold decomposition Theorem. That is, the regular process
represents the difference between the random process and its optimal prediction. Therefore, the
output of Hca−1 ( z ) represents only the new part of information, brought by x [ n ], which cannot be
−1 ( z ) is called innovations process
extracted from the past observations. Therefore, the output of Hca
as depicted in Figure 3b.

Another interesting result on Hilbert spaces is stated in the following theorem [2,10].

Theorem 9 (Bessel Inequality). Let H be a Hilbert space and let V = span{e1 , e2 , . . .} be a


subspace of H. If PV : H → V denotes the orthogonal projection of elements in H into V, then, the
Bessel inequality
∑ |(x, ek )|2 = k PV xk2 ≤ k xk2 (25)
k ≥1
Sci 2022, 4, 40 14 of 28

holds for every x ∈ H. Moreover,


∑ (x, ek )ek = PV x (26)
k ≥1

This theorem has an important implication to signal processing as explained in the


following subsection.

3.1. Fourier Series Expansion in a Hilbert Space


In the Hilbert space L2 ([t0 , t0 + T ]), which is the space of all square-integrable periodic
functions f : [t0 , t0 + T ] → C with t0 ∈ R with a period T ∈ (0, ∞), an inner product is
defined as: Z t0 + T
h f , gi , dt f (t) g(t) ∀ f , g ∈ L2 ([t0 , t0 + T ]) (27)
t0

Let the set of functions S , { ϕn (t), n ∈ Z}, where

eiωn t
ϕn (t) , √ ; ωn , 2πn/T ∀n ∈ Z (28)
T

Then, it follows by setting T = 2π that

(
1 1 if m = n
Z π
h ϕm , ϕn i = dt eimt eint =
2π −π 0, if m 6= n

and thus h ϕm , ϕn i = δmn ∀m, n ∈ Z (29)


where δmn is called the Kronecker delta. Moreover, it turns out that span(S) is dense in
L2 ([t0 , t0 + T ]), i.e., the completion span(S) = L2 ([t0 , t0 + T ]) (see [10]). Therefore, given
any f ∈ L2 ([t0 , t0 + T ]), there exists a sequence of scalars {ck } ⊂ C such that

lim N →∞ k f − ∑kN=− N ck ϕk k L2 = 0

That is,

ms
f (t) = ∑ ck ϕk (t) (30)
k =−∞

Therefore, { ϕn (t); n ∈ Z} is in fact an orthonormal basis of L2 ([t0 , t0 + T ]). The


infinite sum ∑∞ k =−∞ ck ϕk is called the Fourier series expansion of f , where { ck } are the Fourier
coefficients. Using Equation (29) and taking the inner product of both sides of Equation (30)
by ϕn , for any n ∈ Z, yields
ms
cn = h ϕn , f (t)i (31)
Hence, Equation (30) can be rewritten in the following form:

ms
f (t) = ∑ h ϕn , f (t)i ϕn (t) (32)
n=−∞

Moreover, it follows from Equation (26) in Theorem 9 that



k f k2L2 = ∑ | c k |2 (33)
k =−∞

In view of Equation (32), Fourier expansion of any square integrable signal f ∈


L2 ([t0 , t0 + T ]) can be decomposed as a linear combination of harmonic modes ϕk with
frequencies ωk [21], where each Fourier coefficient ck represents the signal’s component
associated with each mode ϕk . Furthermore, Equation (33) reveals how signal’s energy
k f k2L2 is distributed over the signal’s components {ck } and demonstrates an important
Sci 2022, 4, 40 15 of 28

fact that, for each component ck , the value |ck |2 represents a part of the signal’s energy
contributed by the component ck . This fact plays a central role in signal compression, where a
signal f ∈ L2 ([t0 , t0 + T ]) is approximated by using as few Fourier coefficients as possible;
this is accomplished with a minimum approximation error by considering those values of
{ck } with large magnitudes and by discarding those coefficients with small magnitudes.
Now we summarize the main results of Fourier series expansion of periodic functions
as a theorem.

Theorem 10 (Fourier Series Theorem). Let { ϕn } be an orthonormal set in a Hilbert space H.


Then, the following statements are equivalent:
1. { ϕn } is an orthonormal basis of H, i.e., { ϕn } is a complete orthonormal set in H.
ms
2. (Fourier series expansion) Any vector x ∈ H can be expanded as: x = ∑n∈N h x, ϕn i ϕn .
Note: The inner products h x, ϕn i are called Fourier coefficients of the vector x.
3. (Parseval Equality) For any two vectors x, y ∈ H, the inner product: h x, yi = ∑n∈N h x, ϕn i
hy, ϕn i
4. The norm: k x k2 = ∑n∈N |h x, ϕn i|2 ∀ x ∈ H
5. Let U be a subspace of H such that U contains the sequence { ϕn }. Then, U is dense in H, i.e.,
U = H.

Proof. The proof is given in [2] in pp. 307–312.

3.2. Fourier Transform and Inverse Fourier Transform


For decomposition by Fourier series expansion, a function needs to be periodic as
seen in Section 3.1. To extend this analysis to non-periodic functions, we first consider
square-integrable periodic functions f : [− T/2, T/2] → C and let T → ∞ so that the
restriction of periodic functions can be removed. Then, a combination of Equations (28)
and (31) yields:
Z T/2
ms 1
cn = √ dt e−iωn t f (t) (34)
T −T/2
and we define: √
fˆT (ωn ) , Tcn (35)
Having n → ∞ and ωn → ω as T → ∞, it follows that
Z ∞
fˆ(ω ) , lim fˆT (ωn ) = dt e−iωt f (t) (36)
T,n→∞ −∞

Now, we have Fourier transform of a signal f ∈ L2 (R). Since L2 (R) is the completion
L1 (R) ∩ L2 (R), we impose a mild restriction: f (t) to be both absolute-value integrable and
square-integrable. Nevertheless, this restriction is satisfied if f is an analytic function [21].

To obtain the inverse Fourier transform, we substitute Equations (28) and (35) into
Equation (30), which yields

ms 1
f (t) =
T ∑ eiωn t fˆT (ωn ) (37)
n=−∞

n 1 1
By defining ωn , , we have ∆ωn , ωn+1 − ωn = . Then, substitution of into
T T T
Equation (37) for ∆ωn yields

ms
f (t) = ∑ ∆ωn fˆT (ωn ) eiωn t (38)
n=−∞
Sci 2022, 4, 40 16 of 28

In the limits T → ∞ and n → ∞, Equation (38) becomes the inverse Fourier transform
by using the Riemann sum [4]:
Z ∞
ms
f (t) = dω eiωt fˆ(ω ) (39)
−∞

Equation (33) can also be rewritten as:



k f k2T = ∑ ∆ωn | fˆT (ωn )|2 (40)
n=−∞

This formula shows that a signal f (t) ∈ L2 (R) has, at any given time t, (possibly)
uncountably many harmonic components distributed over the frequency range −∞ <
ω < ∞, and the magnitude of the harmonic component at a frequency ω is given by the
signal’s Fourier transform fˆ(ω ). By taking the limits T → ∞ and n → ∞, it follows from
Equations (33) and (34) that
Z ∞ Z ∞
k f k2L2 = dt | f (t)|2 = dω | fˆ(ω )|2 = k fˆk2L2 (41)
−∞ −∞

The above relation is known as Plancherel’s theorem [3], which implies that the total
energy of the signal, obtained in the time domain t ∈ R is re-distributed over the frequency
domain ω ∈ R such that the energy density at each frequency ω is | fˆ(ω )|2 . It is worth-
mentioning that the inner products of two functions f and g in the time domain and the
frequency domain is related by:

h f , gi L2 = h fˆ, ĝi L2 (42)

which is known as Parseval’s identity [2].

In many signal processing applications, the signal f is complex-valued with a discrete


domain, i.e., f : Z → C. Then the discrete-time Fourier transform (DTFT) is given by:

fˆ(ω ) = ∑ f (n)e−inω (43)
n=−∞

From Equation (43), it follows that



| fˆ(ω )| = ∑ f (n)e−inω
n=−∞
∞ ∞
(44)
≤ ∑ | f (n)e −inω
|= ∑ | f (n)|
n=−∞ n=−∞

Therefore, a sufficient condition that guarantees the DTFT to be well-defined is that


f ∈ `1 . The original sequence can be recovered from its DTFT by the inverse discrete-time
Fourier transform (IDTFT)
1
Z π
f (n) = dω fˆ(ω )einω (45)
2π −π
Expressing the frequency from radians/sec to Hertz, i.e., by setting ω = 2πξ, it follows
that Z 1
f (n) = dξ fˆ(ξ )ei2πnξ (46)
−1
Although the Fourier transform plays a central role in signal analysis, it considers
the time-averaged frequency behavior of the signal by integrating over the entire time
domain −∞ < t < ∞. This property reduces the capability of capturing abrupt (i.e.,
rapid) changes which may occur in the signal; capturing such rapid changes in the signal
Sci 2022, 4, 40 17 of 28

is crucial in many applications, such as detection of faults and anomalies. In order to


remedy this shortcoming of Fourier transform, the signal is integrated over a time window,
instead of integrating over the entire time domain. This gives rise to the so-called windowed
Fourier transform (WFT) which augments Fourier transform with a time-localization property
that would provide information about the signal simultaneously in time and frequency
domains [21,22]; a quantum-mechanics-based explanation of time-frequency localization is
briefly explained in [23].

3.3. Windowed Fourier Transform in a Hilbert Space


A function g : R → C is said to have a compact support B ⊂ R, and we say supp( g) = B
if g vanishes outside its compact domain B, i.e., g(t) = 0 ∀t ∈ / B. Given a function
f : R → C and a t ∈ R, let us define

f t (u) , g(u − t) f (u) (47)

where g(·) is the complex conjugate of g(·), and supp( g) ⊂ [− T, 0] for a positive real
number T. Hence f t is a localized version of f and supp( f t ) ⊆ [t − T, t] [21]. Then, the
windowed Fourier transform (WFT) of f is the Fourier transform of f t , which is given as:
Z ∞
f˜(ω, t) , fˆt (ω ) = du e−iωu f t (u) (48)
−∞

and the inverse WFT is obtained as:


Z ∞ Z ∞
1
f (t) = dω dt gω,t (u) f˜(ω, t) (49)
k g k2 −∞ −∞

where gω,t (u) , eiωu g(u − t).


By following Equation (48) and Equation (49), an inner product is defined as:

h gω,t , f i , f˜(ω, t) (50)

Using Parseval’s identity in Equation (42), we have

h gω,t , f i = h ĝω,t , fˆi = f˜(ω, t) (51)

Example 6. An example of the window function g is:

(
1 + cos(πu), −1 ≤ u ≤ 1
g(u) =
0, otherwise

As mentioned in a previous subsection, if the signal is discrete, its DTFT is used to


provide a frequency representation of the signal. Although the signal f (n) in this case is
discrete in the time domain, its DTFT is a continuous function of the frequency ω. However,
most of the devices used in signal processing are digital, and therefore it is more convenient
to deal with a discrete frequency representation of the signal. Moreover, the discrete signal
f (n) in many cases represent a measurement data provided by some sensors, and such
signals f (n) are usually of finite length. The discrete Fourier transform (DFT) is a useful tool
in signal processing that accommodates for these two issues as explained below.
Given a finite-length discrete signal f : {0, N − 1} → C, the DFT is given as:

N −1
f˜[k] = ∑ f [n]e−i2πkn/N (52)
n =0
Sci 2022, 4, 40 18 of 28

The inverse discrete Fourier transform IDFT is:

N −1
1
f [n] =
N ∑ f˜[k ]ei2πkn/N (53)
k =0

Returning to the continuous WFT, if a function f is windowed over a time interval,


the resulting WFT would have a time-localization property as seen earlier in the end of
Section 3.2. Moreover, Equation (51)) shows that WFT of f localizes fˆ(ω ) to a neighborhhod
of ω. Therefore, WFT has both time and frequency localizations. However, due to the
uncertainty principle, these two kinds of localization have different physical interpretations
and are mutually exclusive in the sense that making a WFT f˜(ω, t) sharper in time makes
it more flat in frequency and vice versa [21,23]. Moreover, WFT is not efficient in scanning
signals that involve time intervals much shorter or much longer than the window length
T [21]. To address these issues, the notion of wavelet transform has been introduced in
Section 3.4, which produce an efficient analysis tool to capture signal features occurring
over short and long intervals.

3.4. Wavelet Transform in a Hilbert Space


The continuous wavelet transform (CWT) of a function f : R → C is defined as:
Z ∞
f˜(s, t) , du ψs,t (u) f (u) = hψs,t , f i (54)
−∞

where ψs,t , called the wavelet, is a scaled and translated version

u−t
 
−p
ψs,t (u) = |s| ψ where s 6= 0 (55)
s

of what is called a mother (or basic) wavelet ψ , ψ1,0 ; and ψs,t is the complex conjugate of
ψs,t .
It is noted from Equation (55) that when |s| > 1, ψs,t is a stretched version of ψ, and
when |s| < 1, ψs,t is a compressed version of ψ. Moreover, if s < 0 then ψs,t is a reflected
version of ψ. For example, these stretching, compression, and reflection processes can be
conveniently done on the time axis. The exponent term p in Equation (55) is a real number
that stretches or compresses ψ along the vertical axis. The idea of using p in Equation (55) is
to keep a desired norm unchanged when scaling the wavelet ψs,t . For example, if p = 1,
then both ψ and ψs,t have the same L1 norm; and if p = 1/2, then ψ and ψs,t have the same
L2 norm [21].
Using Parseval’s identity, Equation (54) can be written as

f˜(s, t) = hψs,t , f i = hψ̂s,t , fˆi (56)

where ψ̂s,t is the Fourier transform of ψs,t . This equality shows that wavelets transform
localizes signals in both time and frequency domains, where sharpness of these localizations
is controlled by the scaling factor s and the choice of the mother wavelet ψ.

Example 7. Morlet wavelet is a (frequency-modulated) mother wavelet which is given in the time
domain as:
2
ψ(u) = e−i2πξ 0 u e−u /2 (57)
whose Fourier transform is
2 /2
ψ̂(ξ ) = e−(ξ −ξ 0 ) (58)
where ξ 0 is the center frequency around which the signal is localized in the frequency domain.
Sci 2022, 4, 40 19 of 28

Various forms of the mother wavelet ψ have been reported in the wavelet literature [21,22].
All of these wavelet forms should satisfy the admissibility condition:
Z ∞
|ψ̂(ξ )|2
Cψ , dξ <∞ (59)
−∞ |ξ |

where ψ̂(ξ ) is the Fourier transform of ψ(t).


At a fixed scale s, the CWT of a signal f (u) yields information relevant to the feature
contained in the signal at the scale s, and the behavior of this feature over time is captured
by translating ψs,t over t. Then this process is repeated for different scales by changing s to
capture other signals’ features that are relevant to different scales.
Given a CWT of a signal f ∈ L2 , the original signal f can be reconstructed by
Z ∞ Z ∞
ms 1 f˜(s, t)
f (u) = dt ds ψs,t (u) (60)
Cψ −∞ −∞ s2

where Cψ is a constant depending on the wavelet ψ, and Equation (60) shows that any
signal f ∈ L2 can be represented as a superposition of shifted and dilated wavelets [24].
For a discrete signal f : Z → C, the discrete wavelet transform (DWT) is used with a
discrete wavelet as:
u−t
 
ψs,t [u] = |s|− p ψ (61)
s
where s is the scaling parameter and t is the shifting parameter. The most commonly used
discrete wavelets have the following values of the parameters:
s = 2 j , t = k2 j , and p = 1/2
where j is an integer that controls the scaling parameter and specifies the level of wavelet
decomposition of the signal, and k is another integer which controls the shifting parameter.
Substitution of these values into Equation (61) yields the most common form of the discrete
wavelet
n − k2 j
 
1
ψj,k [n] = √ ψ (62)
2j 2j
Notice that large values of j result in large scaling parameters which stretch the wavelet
function and let the DWT capture low-frequency features in the signal. On the other hand,
small values of j would make the DWT more capable of capturing high-frequency features
by decreasing the scaling parameter [21,22].
Given a wavelet level j, the DWT of a sequence { f [n]} consists of the following two
parts:
The average coefficients { A j [k2 j ]} are given by:


A j [k2 j ] , ∑ f [n]φj,k [n]
n=−∞

n − k2 j
 
1
= ∑ f [n] √ φ
2j
(63)
n=−∞ 2j

and the detail coefficients { D j [k2 j ]} are described by:


D j [k2 j ] , ∑ f [n]ψj,k [n]
n=−∞

n − k2 j
 
1
= ∑ f [n] √ ψ
2j
(64)
n=−∞ 2j

where the scaling function φj,k [n] is associated with the wavelet function ψj,k [n]; full details
are given in [21].
Sci 2022, 4, 40 20 of 28

Let us now consider a special case of DWT, where the analyses (i.e., computation of
f˜(ω, t) (see Equation (50)) or f˜(s, t) (see Equation (54)) or their discrete samples) are made
directly from relevant integration with necessary values of time-frequency or time-scale
parameters, Around 1980, a new method for performing DWT was created, which is known
as Multiresolution Analysis (MRA). This method is completely recursive and is therefore
ideal for computation, as succinctly described below.
In MRA, we may think of level-1 DWT of f [n] as the output of two filters connected
in parallel, consisting of a low-pass filter with the impulse response g and a high-pass
filter with the impulse response h, as seen in Figure 4. This is known as the filter bank
implementation of DWT, consisting of different levels j. The cutoff frequency of each filter
in the filter bank equals to a half of the bandwidth of the respective input signal. Hence,
the output of each filter has a half of the bandwidth of the original sequence f [n] so that it
is subsampled by 2. That is,

The average A1 [n] , ∑ f [k] g[2n − k] (65)
k =−∞


The detail D1 [n] , ∑ f [k ] h[2n − k] (66)
k =−∞

Therefore, given a level-j DWT of a discrete-time signal f [n], if A j [k2 j ] in the sequence
of average coefficients is passed through a parallel combination of identically structured
filters g and h, then the output is a sequence of level-( j + 1) DWT of f [n] as seen in Figure 4.
The features associated with different frequency components of the signal f [n] can be
captured by using a multilevel wavelet decomposition of f [n] via iterative implementation
of filter banks in the setting of time and frequency localization (see, for example, [21,23,24]).

Figure 4. Implementation of level-j and level-( j + 1) MRA filter banks

Example 8. Let us consider a function f (t) having a wavelet transform f˜(s, t), which can be
interpreted as the “details" contained at fixed scales s 6= 0. This interpretation is especially useful
in the discrete case for understanding the principles of MRA as seen below.
Let φ(u) be a zero-mean unit-variance probability density function, which has the following
properties:
• φ ( u ) ≥ 0 ∀ u ∈ R;
R∞
• du φ(u) = 1;
R−∞∞
• du φ(u) u = 0;
R−∞∞ 2
• −∞ du φ ( u ) u = 1;
Assuming that φ ∈ C n , i.e., φ is at least n times differentiable, where n ∈ N., it follows that
limn→±∞ φ(n−1) (u) = 0. Now letting ψn (u) , (−1)n φ(n) (u), we have
Z ∞ h i
du ψn (u) = (−1)n φ(n−1) (∞) − φ(n−1) (−∞) = 0
−∞
Sci 2022, 4, 40 21 of 28

Thus, ψn satisfies the admissibility condition in Equation (59) and hence can be used to define
a CWT.    
For s 6= 0 and t ∈ R, let φs,t (u) = |s|−1 φ u− s
t
and ψ n ( u ) = | s |−1 ψn u−t . Then, φ
s,t s s,t
n is qualified to be a wavelet
is a probability density with mean t and standard deviation |s|; and ψs,t
n
family {ψ } by setting p = 1 in Equation (61).
As a numerically explicit example, let φ represent the zero-mean unit-variance Gaussian
exp(−u2 /2)
density, i.e., φ(u) = √ . Since φ ∈ C ∞ , n can be taken to be any positive integer. For

u exp(−u2 /2) (u2 −1) exp(−u2 /2)
instance, ψ1 (u) = −φ(1) (u) = √ and ψ2 (u) = φ(2) (u) = √ , and so
2π 2π
2
on. Because of the shape of the graph, −ψ is popularly known as the Mexican hat mother wavelet,
which is often used in engineering applications.

3.5. Karhunen-Loéve Expansion of Random Signals


Karhunen-Loéve (K-L) expansion is a powerful tool that generalizes Fourier series
expansion for analysis of random time-dependent signals. The K-L expansion is frequently
used in statistical signal processing and detection theory by using deterministic time-
dependent orthonormal functions and random-variable coefficients.

Theorem 11 (Karhunen-Loéve Expansion). Let X (t) be a zero-mean, second-order random


process, defined over [− T/2, T/2] where T ∈ (0, ∞), with a continuous covariance function
KXX (t, τ ). Then, it follows that

ms
X (t) = ∑ Xn φn (t) ∀t ∈ [− T/2, T/2] (67)
n =1

where the (countable) sequence of (deterministic) functions {φn (t)} is a complete orthonormal set
of solutions to the following integral equation:
Z T/2
dτ KXX (t, τ )φn (τ ) = λn φn (t) ∀t ∈ [− T/2, T/2] (68)
− T/2
R T/2
and the random coefficients Xn , − T/2 dt X (t)φnher (t) are mutually statistically orthogonal, i.e.,

her
E [ Xn Xm ] = λn δmn with the Kronecker delta δmn

Proof. The proof is given in [25,26].

Remark 7. The deterministic functions φn (t) are orthonormal in the following sense:
Z T/2
her
dt φn (t)φm (t) = λn δmn
− T/2

Example 9 (K-L expansion of white noise). Let the covariance function of zero-mean stationary
white noise w(t) be Kww (t, τ ) = σ2 δ(t − τ ). Then, the orthonormal functions φn (t) satisfy the
K-L integral equation, for all n ∈ N, as:
Z T/2 Z T/2
2
dτ Kww (t, τ )φn (τ ) = σ dτ δ(t − τ )φn (τ )
− T/2 − T/2
R T/2
It is also true that −T/2 dτ Kww (t, τ )φn (τ ) = λn φn (t), which implies that λn φn (t) = σ2 φn (t)
∀n ∈ N. Thus, the choice of these orthonormal functions is arbitrary and all λn ’s are identically
equal to σ2 . It is concluded that, for any zero-mean white noise, the K-L expansion functions
{φn (t)} can be any set of orthonormal functions with all eigenvalues λn = σ2 .
Sci 2022, 4, 40 22 of 28

Example 10 (K-L expansion as an application to detection theory [25]). Let us assume that a
waveform X (t) is observed over a finite time interval [− T/2, T/2] to decide whether it contains a
recoverable signal buried in noise, or the signal is completely noise-corrupted (i.e., the signal cannot
be recovered). In this regard, we formulate a binary hypothesis testing problem with the hypothesis
H1 of having a recoverable signal and the hypothesis H0 of complete noise capture, i.e.,

s(t) + w(t) : if H1 is true
X (t) =
w(t) : if H0 is true

where the signal s(t) is a deterministic function of time, and the noise w(t) is modeled as zero-mean,
unit-variance, white Gaussian. Using the K-L expansion, we simplify the above decision problem
by replacing the waveform X (t) with a sequence { Xn }, which reduces to a sequence of simpler
problems as: 
sn + ωn : if H1 is true
Xn =
ωn : if H0 is true
where sn and ωn are the respective (at most countably many) K-L coefficients of the signal s(t) and
noise w(t).
Now we take the K-L transform (instead of Fourier transform) of the received signal X (t),
where the transform space is the space of sequences of K-L coefficients that are mutually statistically
orthogonal random variables. By taking advantage of the facts that the noise is zero-mean Gaussian
and that the K-L coefficients are mutually statistically orthogonal, the random variables ωn become
jointly independent., i.e., {ωn } is a sequence of independent and identically distributed (iid) random
variables. By selecting the first orthonormal function as:

s(t)
φ1 (t) = qR
T/2 2
− T/2 dθ s ( θ )

we can complete the rest of the orthonormal set {φn (t)} in a valid way. We also notice that all of the
random coefficients sk , with the exception of s1 , will be zeros, i.e., Xt is affected by the presence or
absence of the recoverable signal. Thus, the distributed detection problem is reduced to the following
scalar detection problem:
( qR
T/2
X1 = − T/2 dθ s2 (θ ) + ω1 : if H1 is true
ω1 : if H0 is true

We note that the scalar X1 can be computed as:


R T/2
− T/2 dθ X ( θ )s ( θ )
X1 = q R T/2
2
− T/2 dθ s ( θ )

which is commonly referred to as a matching operation. In fact, this operation can be performed by
sampling the output of a filter whose impulse response is:

s( T − t)
h(t) = qR
T/2 2
− T/2 dθ s ( θ )

where the parameter T should be chosen sufficiently large to make the impulse response causal. The
output of the physically realizable filter at time T is then Xt . This filter is called a matched filter and
is widely used in the disciplines of communications and pattern recognition.

3.6. Reproducing Kernel Hilbert Spaces


This subsection develops the concept of reproducing kernel Hilbert spaces (RKHS) [14],
in which each point in the space is a linear bounded (equivalently, a linear continuous)
Sci 2022, 4, 40 23 of 28

functional. The continuity (or boundedness) of linear functions implies that if two linear
bounded functions f and g are close to each other (i.e., k f − gk is small in the function
space), then f and g are also close to each other pointwise, i.e., | f (t) − g(t)| is also small
for all t.
The RKHS has many engineering and scientific applications, including those in har-
monic analysis, wavelet analysis, and quantum mechanics. In particular, functions from
RKHS have special properties that make them useful for function estimation problems in
high-dimensional spaces, which is critically important in the fields of statistical learning
theory and machine learning [17]. In fact, every function in RKHS that minimizes an
empirical risk functional can be expressed as a linear combination of the kernel functions
evaluated at the training points. This procedure potentially simplifies the handling of the
problem from infinite-dimensional to finite-dimensional.
We now present a formal definition of reproducing kernel Hilbert spaces (RKHS). The
presented theory is often applied to real-valued Hilbert spaces and can be extended to
complex-valued Hilbert spaces; examples of complex-valued RKHS are spaces of analytic
functions.

Definition 9 (Reproducing Kernel Hilbert Spaces). Let T be an arbitrary non-empty set (e.g.,
the time domain or the spatial domain of a function) and let H be a Hilbert space of real-valued
(resp. complex-valued) functions on T, equipped with pointwise vector addition and pointwise scalar
multiplication, and the continuous functions in H are evaluated at each point t ∈ T. Then, H
is defined to be a reproducing kernel Hilbert space (RKHS) if there exist a positive real Mt and a
continuous linear functional Lt on H such that |Lt ( f )| = | f (t)| ≤ Mt k f k H ∀t ∈ T ∀ f ∈ H.
Note: Although Mt is constrained to be a positive real, it is possible that supt∈T Mt = ∞.


Remark 8. Definition 9 is rather a weak condition to ensure the existence of an inner product
and the evaluation of every functional on H at every point in the domain T. From the application
perspectives, a more useful definition would be to construct an inner product of a given function
f ∈ H with another function Kt ∈ H, which is the so-called reproducing kernel function for the
Hilbert space H; the RKHS has taken its name from here.

To make Definition 9 more useful for many applications, we make use of Reisz rep-
resentation theorem (Theorem 6) which states that there exists a unique Kt ∈ H with the
following reproducing property for each f ∈ H, which takes values at any given t ∈ T as:

f ( t ) = Lt ( f ) = h Kt , f i H

Since, for a given t ∈ T, the function Kt ∈ H takes values in R (resp. C) and having
another Kτ ∈ H associated with the parameter τ ∈ T and a corresponding functional Lτ on
H, it follows that
Kt ( τ ) = Lτ ( Kt ) = h Kτ , Kt i H
The above situation can be interpreted as follows: Kτ is a time translation of Kt from t
to τ if the set T is the time domain of the functions in the Hilbert space. This allows us to
redefine the reproducing kernel of the Hilbert space H as a function K : T × T → R (resp. C)
as: K (t, τ ) , hKτ , Kt i H .

Example 11 (Bandlimited approximation of Dirac delta function in the RKHS setting). Let
us consider the space of continuous signals that are also band-limited with frequencies under the
compact support, i.e., in the range of [−2πΩ, 2πΩ], where the cutoff frequency Ω ∈ (0, ∞). It is
noted that Kt (•) is a bandlimited version of the Dirac delta function, because Kt (τ ) converges to
the delta distribution, expressed as δ(τ − t) in the weak sense, as the cutoff frequency Ω tends to
infinity.
Let us define T = R and H = f ∈ C0 ( T ) : supp( fˆ) ⊆ [−Ω, Ω] , where C0 ( T ) is the


space of continuous functions whose domain is T, and the Fourier transform of f is: fˆ(ξ ) ,
Sci 2022, 4, 40 24 of 28

exp(−i2πξt) f (t) and the inverse Fourier transform of fˆ(ξ ) is: f (t) , R dξ exp(i2πξt)
R R
R dt
fˆ(ξ ). Then, it follows by Cauchy-Schwarz inequality and Plancherel theorem that:
Z Ω Z Ω
| f (t)|2 ≤ dξ |ei2πξ |2 dξ | fˆ(ξ )|2 = 2Ωk f k2H
−Ω −Ω

i.e., | f (t)| ≤ 2Ω k f k H .
It follows from the relation: f (t) = Lt ( f ) = hKt , f i H , established earlier, that the functional
Lt and the RKHS kernel function Kt are bounded. Therefore, H is indeed an RKHS. 
 sin 2πΩ(τ −t)
By choosing the kernel function in this case as: Kt (τ ) = sinc 2πΩ (τ − t) , 2πΩ(τ −t) ,
and by taking limΩ→∞ Kt (τ ) = δ(τ − t), it follows that as Ω → ∞, the Fourier transform of the
kernel Kt (τ ) becomes
Z ∞
K̂t (ξ )) = dτ exp(−i2πξτ ) Kt (τ )
−∞
Z ∞
= dτ exp(−i2πξτ ) δ(τ − t)) = exp(−i2πξt)
−∞

This is a consequence of frequency modulation due to the time-shifting property of Fourier


transform. Then, it follows by using Plancherel theorem that:
Z ∞ Z ∞
h Kt , f i H = dτ K t (τ ) f (τ ) = dξ K̂ t (ξ ) fˆ(ξ )
−∞ −∞
Z ∞
= dξ fˆ(ξ ) exp(i2πξt) = f (t)
−∞

Thus, the reproducing property of the kernel is established as the cutoff frequency Ω → ∞.

4. Summary and Conclusions


This paper sheds light on some of the key concepts from functional analysis, which
provide a unified mathematical framework for solving problems in engineering and applied
sciences, and especially in modern signal processing. Additionally, the simple (and yet
elegant) way, by which this framework facilitates formulation of different topics in signal
processing along with several relevant examples, enables solving many problems in science
and engineering by utilizing the concepts from the discipline of functional analysis. Some of
the important results from functional analysis can find their ways to contribute for further
advances in statistical signal processing and adaptive signal processing. Nevertheless one
of the main difficulties in doing so is the existing gap in the terminologies and technological
languages used in these two (apparently different) fields; and this paper attempts to (at
least partially) bridge this gap.

Author Contributions: Conceptualization, N.F.G., A.R. and W.K.J.; methodology, N.F.G., A.R. and
W.K.J.; software, N.F.G. and A.R.; formal analysis, N.F.G. and A.R.; model preparation and validation,
N.F.G. and A.R.; data curation, N.F.G. and A.R.; writing—original draft preparation, N.F.G., A.R. and
W.K.J.; writing—review and editing, N.F.G., A.R. and W.K.J.; funding acquisition, A.R. All authors
have read and agreed to the published version of the manuscript.
Funding: The reported work has been supported in part by the U.S. Air Force Office of Scientific
Research under Grant No. FA9550-15-1-0400, by the U.S. Army Research Office under Grant No.
W911NF-20-1-0226, and by the U.S. National Science Foundation under Grant no. CNS-1932130.
Findings and conclusions or recommendations, expressed in this publication, are those of the authors
and do not necessarily reflect the views of the sponsoring agencies.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Sci 2022, 4, 40 25 of 28

Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or
in the decision to publish the results.

Appendix A. Preliminary Concepts


This appendix introduces several preliminary (but critical) concepts from real analysis,
probability theory, and topology, mainly taken from Naylor and Sell [2], Royden [4], Stark
and Woods [25], Ash [27], and Munkres [28]. These references are expected to be helpful,
along with their cited materials therein, for appropriate understanding of key concepts that
are frequently encountered in the discipline of modern signal processing.

Appendix A.1. Metric Spaces and Topological Spaces


This subsection introduces rudimentary concepts of metric and topological spaces.
Details are available in the afore-mentioned standard textbooks.

Definition A1 (Metric Spaces). Let X be a non-empty set. A function ρ : X × X → R, where


R is the space of real numbers, is called a metric (or a distance function) on X if the following
conditions hold for all x, y, z ∈ X:
(i) Positivity: ρ( x, y) ≥ 0, and ρ( x, y) = 0 iff x = y;
(ii) Symmetry: ρ( x, y) = ρ(y, x );
(iii) Triangular Inequality: ρ( x, z) ≤ ρ( x, y) + ρ(y, z).
The pair ( X, ρ) is called a metric space; if there is no ambiguity on ρ, the metric space is
denoted only by X.

Example A1. A well-known example of a metric q space is the n-dimensional Euclidean space
(Rn , ρ), where n is a positive integer, ρ( x, y) , ∑in=1 | xi − yi |2 for every vector x = [ x1 , . . . , xn ] T
and y = [y1 , . . . , yn ] T in Rn .

Remark A1. The set X, upon which the metric ρ can operate, is an arbitrary nonempty set. The
conditions (i)–(iii) in Definition A1 are obvious if ρ operates on Rn . However, in general, there
can be other types of metric operators and X may not be Rn ; an example is the Hamming distance
defined on sets of symbol sequences, which is widely used in error correction theory to measure the
distance between two code words.

Definition A2 (Open and Closed Sets). A set E ⊆ X in a metric space ( X, ρ) is called open if,
for all y ∈ E, there exists an open ball Bε (y) , { x ∈ E : ρ( x, y) < ε}, which is of radius ε > 0
with center at y. A set F ⊆ X is called closed if the complement X \ F is open in ( X, ρ).

Definition A3 (Cauchy Sequence). A sequence { xn } in a metric space ( X, ρ) is called a Cauchy


sequence if ∀ε > 0 ∃n(ε) ∈ N such that ρ( xk , x` ) < ε ∀k, ` > n. In other words, ρ( xk , x` ) → 0
as k, ` → ∞.

Definition A4 (Completeness of a metric space). A metric space is called complete if every


Cauchy sequence converges in the metric space,

Definition A5 (Sequential Compactness). A metric space ( X, ρ) is said to be sequentially


compact if every sequence of points { x1 , x2 , . . .} in ( X, ρ) contains a convergent subsequence
{ xn1 , xn2 , . . .}, and it is called sequentially precompact if every sequence of points { x1 , x2 , . . .} in
( X, ρ) contains a Cauchy subsequence { xn1 , xn2 , . . .} (see Definition A3 and also see [29]).
Sci 2022, 4, 40 26 of 28

In contrast to metric spaces, where a distance function ρ on a set X is used to introduce


key concepts (e.g., neighborhood, open set, closed set, and convergence), a more powerful
approach is to specify a system in terms of open sets to introduce such properties; this leads
to the notion of a topological space (e.g., [28]).

Definition A6 (Topological Spaces). A topology on a nonempty set X is a collection = of subsets


of X, which has the following properties:
1. = must contain the empty set ∅ and the set X.
2. Any union of the members in =, belonging to an arbitrary (i.e., finite, countable, or uncount-
able) (A set is defined to be countable if it is bijective to the set, N, of positive integers; and
a finite or a countably infinite set is often called at most countable. An infinite set, which is
not countable, is called uncountable. For example, the set of integers, Z ( R, is countable,
while an interval ( a, b) , { x ∈ R : a < x < b} is uncountable. These concepts lead to
the fundamental difference between “continuous-time (CT) analog" and “discrete-time (DT)
digital” signal processing.) subcollection of sets must be contained in =.
3. The intersection of the members of any finite subcollection of = must be contained in =.

Then, the pair ( X, =) is called a topological space, and the members of = are called open
sets of ( X, =); if B is an open set in ( X, =), the complement of B (i.e., X \ B) is called a closed
set in ( X, =). If there is no confusion regarding =, then = is often omitted from ( X, =) and
only X is referred to as a topological space.

Definition A7 (Topological Basis). A basis B for a topology ( X, =) is a collection of open sets in


=, called basis elements, if the following two conditions hold:
1. For each x ∈ X, there exists at least one basis element B ∈ B such that x ∈ B.
2. If x ∈ B1 ∩ B2 , where B1 , B2 ∈ B , then there exists a basis element B3 ∈ B such that x ∈ B3
and B3 ⊆ B1 ∩ B2 .

Appendix A.2. Random Variables and Stochastic Processes


This subsection introduces rudimentary concepts of random variables and stochastic
processes. Details are available in textbooks such as [25,27,30].

Definition A8 (algebra and σ-algebra). Let F be a (non-empty) collection of subsets of a (non-


empty) set Ω having some or all of the following properties.
(a) Ω ∈ F.
(b) If A ∈ F then Ac ∈ F , where Ac , Ω \ A.
If A1 , A2 , . . . , An ∈ F then in=1 Ai ∈ F .
S
(c)
S∞
(d) If A1 , A2 , . . . ∈ F then i=1 Ai ∈ F ,
Then, F is called an algebra (or a field) if the properties (a), (b) and (c) are true. If, in addition,
the property (d) is true, then F is called a σ-algebra (or a σ-field).

Remark A2. The largest σ-algebra of a nonempty set Ω is the collection of all subsets of Ω, which
is the power set 2Ω . On the other hand, the smallest σ-algebra consists of two sets φ and Ω, i.e.,
the indiscrete σ-algebra {φ, Ω}.

Definition A9 (Borel Sets). Given a non-empty collection D of subsets of Ω, the smallest σ-


algebra containing D is called the σ-algebra generated by D . The Borel σ-algebra B(R) is the
σ-algebra generated by the collection of all open intervals {( a, b) : a, b ∈ R} in the usual topology
of R. Members of B(R) are called Borel sets.

Definition A10 (Measure). A countably additive measure µ on a σ-algebra F is a non-


negative, extended real-valued function on F such that if { A1 , A2 , . . .} forms an at most countable
(i.e., finite or countably infinite) collection of disjoint sets in F , then µ( n An ) = ∑n µ( An ). A
S
Sci 2022, 4, 40 27 of 28

measurable space is a pair (Ω, F ), and a measure space is a triple (Ω, F , µ), where Ω is a
non-empty set, F is a σ-algebra of subsets of Ω, and µ is a measure on F . The sets in F are called
measurable sets.

Example A2. Let Ω = Rn , where n ∈ N and the Borel set B(Rn ) is the associated σ-algebra.
Then, µ : B(Rn ) → [0.∞] is called the n-dimensional Lebesgue measure, and Rn , B(Rn ), µ is


called the n-dimensional Lebesgue measure space. For n = 1, i.e., in the 1-dimensional real space
R, given an interval S ∈ B(R), the measure µ(S) is the length of the interval S. Similarly, for
two-dimensional (i.e., n = 2) and three-dimensional (i.e., n = 3) Lebesgue measures, µ(S) denotes
the area and volume measures, respectively.

Definition A11 (Probability Spaces). If µ(Ω) = 1, then µ is called a probability measure,


usually denoted by P, and the triplet (Ω, F , P) is called a probability space.

Definition A12 (Measurable Functions). Let (Ω1 , F1 ) and (Ω2 , F2 ) be two measurable spaces.
A function f : (Ω1 , F1 ) → (Ω2 , F2 ) is called (F1 − F2 ) measurable if the inverse image
f −1 ( A) ∈ F1 ∀ A ∈ F2 . If Ω2 = R and F2 = B(R) then f is said to be Borel measurable.

Definition A13 (Random Variables and Random Sequences). A random variable X on a


probability space (Ω, F , P) is a Borel measurable function from Ω to R. Similarly, a sequence of
random variables { X1 , X2 , . . .} is called a discrete random process.

Remark A3. A function f : (R, U ) → (R, U ) is continuous in the usual topology if f −1 ( A) is


open for every open set A ∈ U . Therefore, any continuous function is Borel measurable. Furthermore,
a function f : R → R which is continuous almost everywhere on R (i.e., except on a set of measure
zero) is also Borel measurable. As another example, the unit step function on a compact set S ( R,
which is discontinuous in R in the usual topology, is also a Borel-measurable function.

Now, we introduce the concept of the expected value of a random variable in a


probability measure space (Ω, F , P). Given a random variable X, the expected value of
X is denoted as E[ X ] (see [25] or [30]). Along this line, two random variables X and Y
ms
on (Ω, F , P) are said to be equal in the mean square (ms) sense, denoted as: X = Y if
E | X − Y |2 = 0 (see [25]). Similarly, two random variables X and Y on are said to be equal
as
in the almost sure (as) sense, denoted as: X = Y, if X (ζ ) 6= Y (ζ ) is allowed ∀ζ ∈ S ⊆ Ω
such that P[S] = 0.
Given a random process x (t), the autocorrelation is defined as:

r x (t, τ ) , E[ x (t) x her (τ )] (A1)

and the autocovariance is defined as

c x (t, τ ) , E[( x (t) − E[ x (t)]) ( x (τ ) − E[ x (τ )])her ] (A2)

where the superscript her, called Hermitian, indicates the complex conjugation of a complex
variable, or the complex conjugation of transpose of a complex vector/matrix.
A random process x(t) is called stationary (in the strict sense) if its statistics are not
affected by a time translation [25], i.e., x(t) and x(t + ε) have the same statistics for any real
number ε. A random process x(t) is said to be wide-sense stationary [7,25] if
1. The expected value E[ x (t)] is a constant for all t;
2. The autocorrelation r x (t, τ ) depends only on the difference (t − τ ), not explicitly on
both t and τ.

References
1. Bachman, G.; Narici, L. Functional Analysis; Academic Press: New York, NY, USA, 1966.
2. Naylor, A.; Sell, G. Linear Operator Theory in Engineering and Science, 2nd ed.; Springer-Verlag: New York, NY, USA, 1982.
Sci 2022, 4, 40 28 of 28

3. Rudin, W. Real and Complex Analysis; McGraw-Hill: Boston, MA, USA, 1987.
4. Royden, H. Real Analysis, 3rd ed.; Macmillan: New York, NY, USA, 1989.
5. Kreyszig, E. Introductory Functional Analysis with Applications; John Wiley & Sons: Hoboken, NJ, USA, 1978.
6. Bobrowski, A. Functional Analysis for Probability and Stochastic Processes; Cambridge University Press: Cambridge, UK, 2005.
7. Hayes, M. Statistical Digital Signal Processing and Modeling, 1st ed.; Wiley: Hoboken, NJ, USA, 1996.
8. Haykin, S. Adaptive Filter Theory, 4th ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2002.
9. Farhang-Boroujeny, B. Adaptive Filters Theory and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2003.
10. Bressan, A. Lecture Notes on Functional Analysis with Applications to Linear Partial Differential Equations; American Mathematical
Society: Providence, RI, USA, 2013.
11. Reed, M.; Simon, B. Methods of Modern Mathematical Physics Part 1: Functional Analysis; Academic Press: Cambridge, MA,
USA, 1980.
12. Luenberger, D. Optimization by Vector Space Methods; John Wiley & Sons: Hoboken, NJ, USA, 1969.
13. Desoer, C.; Vidyasagar, M. Feedback Systems: Input-Output Properties; Academic Press: Cambridge, MA, USA, 1975.
14. Therrien, C. Discrete Random Signals and Statistical Signal Processing; Prentice Hall: Englewood Cliffs, NJ, USA, 1992.
15. Proakis, J.; Manolakis, D. Digital Signal Processing: Principles, Algorithms, and Applications, 3rd ed.; Macmillan Publishing Company:
New York, NY, USA, 1998.
16. Oppenheim, A.; Schafer, R. Discrete-Time Signal Processing; Prentice Hall: Englewood Cliffs, NJ, USA, 1989.
17. Vapnik, V.; Izmailov, R. Rethinking statistical learning theory: Learning using statistical invariants. Mach. Learn. 2019,
108, 381–423. [CrossRef]
18. Ghalyan, N.F.; Ray, A. Symbolic Time Series Analysis for Anomaly Detection in Measure-invariant Ergodic Systems. J. Dyn. Syst.
Meas. Control. 2020, 142, 061003. [CrossRef]
19. Ghalyan, N.F.; Ray, A. Measure invariance of symbolic systems for low-delay detection of anomalous events. Mech. Syst. Signal
Process. 2021, 159, 107746. [CrossRef]
20. Lorch, E. Spectral Analysis; Oxford University Press: New York, NY, USA, 1962.
21. Kaiser, G. A Friendly Guide to Wavelets; Birkhauser: Boston, MA, USA, 1994.
22. Mallat, S. A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed.; Academic Press: Amsterdam, The Netherlands, 2009.
23. Ray, A. On State-space Modeling and Signal Localization in Dynamical Systems. ASME Lett. Dyn. Syst. Control. 2022, 2, 011006.
[CrossRef]
24. Vetterli, M.; Kovacevic, J. Wavelets and Subband Coding; Prentice-Hall, Inc.: Hoboken, NJ, USA, 1995.
25. Stark, H.; Woods, J. Probability and Random Processes with Applications to Signal Processing; Prentice-Hall: Upper Saddle River, NJ,
USA, 2002.
26. Helstrom, C. Elements of Signal Detection and Estimation; Prentice Hall: Englewood Cliffs, NJ, USA, 1995.
27. Ash, R. Real Analysis and Probability; Academic Press: Boston, MA, USA, 1972.
28. Munkres, J. Topology, 2nd ed.; Prentice-Hall: Upper Saddle River, NJ, USA, 2000.
29. Shilov, G. Elementary Real and Complex Analysis; Dover Publication Inc.: Mineola, NY, USA, 1996.
30. Papoulis, A. Probability, Random Variables, and Stochastic Processes, 2nd ed.; McGraw-Hill, Inc.: Boston, MA, USA, 1984.

You might also like