Universal Approximation Updated
Universal Approximation Updated
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 1 / 39
Contents
1 Types of Convergence
3 Hahn-Banach Theorem
5 Universal Approximation
7 References
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 2 / 39
Types of Convergence
Pointwise Convergence
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 3 / 39
Types of Convergence
Uniform Convergence
Definition
Suppose that (fn ) is a sequence of functions fn : A → R and f : A → R. Then
fn → f uniformly on A if, for every ε > 0, there exists N ∈ N such that
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 4 / 39
Lebesgue Dominated Convergence Theorem
Theorem
Let X be a measure space, µ be a Borel measure on X, g : X → R be L1 and
{fn } be a sequence of measurable functions from X → R such that
|fn (x)| ≤ g(x) for all x ∈ X and {fn } converges pointwise to a function f . Then
f is integrable and
Z Z
lim fn (x)dµ(x) = f (x)dµ(x).
n→∞
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 5 / 39
Lebesgue Dominated Convergence Theorem
n sin nx
Z
lim dx.
n→∞ R x(x2 + 1)
Solution
Let x ∈ R and begin by defining
n sin nx
fn (x) =
x(x2 + 1)
1
Each fn is measurable* and the sequence {fn } converges pointwise to 1+x2 :
x
!
sin n 1 1
lim fn (x) = lim x 2
=
n→∞ n→∞
n 1+x 1 + x2
x
sin n
lim x =1
n→∞
n
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 7 / 39
Lebesgue Dominated Convergence Theorem
Continuation
1
From this, we also see that g(x) = 1+x 2 works as a dominating function. Indeed,
g is integrable on R and
sin nx
1 x 1 1 1
|fn (x)| = x · 2
= sin · 2
≤ = g(x)
n 1 + x n x/n 1 + x 1 + x2
Z ∞ ∞
1
2
dx = tan−1 (x) = π.
−∞ 1 + x −∞
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 8 / 39
Lebesgue Dominated Convergence Theorem
Example
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 9 / 39
Hahn-Banach Theorem
Hahn-Banach Theorem
and
f (y) ≥ α + ϵ for any y ∈ B.
Corollary
Let V be a normed vector space over R and U ⊂ V be a linear subspace such that
U ̸= V . Then there exists a continuous linear map f : V → R with
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 10 / 39
Riesz Representation Theorem
Theorem
Let Ω be a subset of Rn and F : C(Ω) → R be a linear functional on the space of
continuous real functions with domain on Ω. Then there exists a signed Borel
measure µ on Ω such that for any f ∈ C(Ω), we have that
Z
F (f ) = f (x)dµ(x).
Ω
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 11 / 39
Riesz Representation Theorem
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 12 / 39
Riesz Representation Theorem
Theorem
If T is a bounded linear functional on a Hilbert space H, then there exists some
g ∈ H such that for every f ∈ H we have
T (f ) = ⟨f, g⟩.
Moreover, ∥T ∥ = ∥g∥, where ∥T ∥ denotes the operator norm of T , and ∥g∥ is the
Hilbert space norm of g.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 13 / 39
Riesz Representation Theorem
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 14 / 39
Riesz Representation Theorem
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 16 / 39
Riesz Representation Theorem
The Universal Approximation Theorem states that a neural network with at least
one hidden layer and a non-linear activation function can approximate any
continuous function on a compact domain to any desired accuracy. The LDCT
and Riesz Representation Theorem support this concept by providing the
mathematical foundation for convergence and approximation:
LDCT helps ensure that as the neural network’s weights are adjusted to
approximate the target function, the sequence of network approximations
converges in an integrable sense.
Riesz Representation allows us to interpret approximation errors as integrable
functionals, linking them to the network’s error and providing a structured
way to evaluate convergence in function space.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 17 / 39
Universal Approximation
Universal Approximation
Definition
Given a topological space Ω, we define
C(Ω) := {f : Ω → R | f is continuous};
Definition
Let Ω be a topological space and f : R → R. We say that a neural network with
activation function f is a universal approximator on Ω if Σn (f ) is dense in C(Ω),
the set of continuous functions from Ω to R.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 18 / 39
Universal Approximation
This means that the neural network fˆ(x) can approximate the function f (x) to
within any arbitrary degree of accuracy ϵ, given a sufficient number of neurons in
the hidden layer.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 19 / 39
Universal Approximation
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 20 / 39
Universal Approximation
where:
M is the number of neurons in the hidden layer,
ci are the weights associated with the output layer,
wi and bi are the weights and biases of the hidden neurons, and
σ is the activation function (commonly non-linear).
By adjusting ci , wi , and bi , the neural network can approximate any continuous
function f (x) over a given domain.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 21 / 39
Universal Approximation
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 22 / 39
Universal Approximation
is a zero measure.
This implies that the activation function is discriminatory if the only way the
integral of the function can be zero for all inputs is if the measure is zero.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 23 / 39
Universal Approximation
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 24 / 39
Universal Approximation
Definition
An activation function f : R → R is discriminatory if it is n-discriminatory for all
natural numbers n.
This means that the activation function can distinguish between all possible linear
combinations of inputs in any dimensionality of the input space.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 25 / 39
Universal Approximation
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 26 / 39
Universal Approximation
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 27 / 39
Common Activation Functions
Definition
The Rectified Linear Unit (also denoted ReLU) is a function R → R defined by
ReLU(x) = max(0, x)
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 28 / 39
Common Activation Functions
Definition
A function f : R → R is called a sigmoid if it satisfies the following two properties:
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 29 / 39
Common Activation Functions
exi
Softmax(xi ) = P xj
je
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 30 / 39
Common Activation Functions
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 31 / 39
Common Activation Functions
Sigmoid is often used in the output layer for binary classification problems.
It provides a probability score, indicating the likelihood of a data point
belonging to a particular class.
Example: For spam email detection, the Sigmoid output helps determine
whether an email is spam (1) or not spam (0).
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 32 / 39
Common Activation Functions
Softmax is commonly used in the output layer for tasks like digit
classification (0-9).
It assigns probabilities to each class, enabling the network to select the most
probable category.
Example: For digit recognition, Softmax ensures that the network outputs a
probability distribution over the 10 digits.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 33 / 39
Common Activation Functions
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 34 / 39
Common Activation Functions
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 35 / 39
Common Activation Functions
ReLU promotes positive gradient flow, especially when inputs are positive.
This helps neural networks learn hierarchical features, making ReLU an
excellent choice for deep networks.
Example: In object detection, ReLU aids in distinguishing between key
features, improving feature extraction.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 36 / 39
Common Activation Functions
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 37 / 39
Common Activation Functions
Generalization
The theorem ensures function approximation on the given training set but
does not guarantee generalization to new data. Cross-validation is often used
to ensure better generalization.
Training Difficulties
Although the theorem asserts the existence of an approximation, it offers no
guidance on efficient training. Gradient-based methods may get stuck in local
minima or saddle points, complicating the search for an optimal solution.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 38 / 39
References
References
https://www.geeksforgeeks.org/
universal-approximation-theorem-for-neural-networks/
http://mathonline.wikidot.com/
applying-lebesgue-s-dominated-convergence-theorem-1?fbclid=
IwZXh0bgNhZW0CMTAAAR1JCH0-0qWPH-Z8PtfwNNO03st43788pkVfWyJHVZqF8R
aem_WSTJPHD-3K52jgb3wLZQRg s
Leonardo Ferreira Guilhoto, An Overview of Artificial Neural Networks for
Mathematicians
https://math.uchicago.edu/~may/REU2018/REUPapers/Guilhoto.pdf
https://www.math3ma.com/blog/dominated-convergence-theorem
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 39 / 39