0% found this document useful (0 votes)

14 views39 pages

Universal Approximation Updated

Uploaded by

anano tamarashvili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views39 pages

Universal Approximation Updated

Uploaded by

anano tamarashvili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Universal Approximation

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili

November 28, 2024

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 1 / 39
Contents

1 Types of Convergence

2 Lebesgue Dominated Convergence Theorem

3 Hahn-Banach Theorem

4 Riesz Representation Theorem

5 Universal Approximation

6 Common Activation Functions

7 References

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 2 / 39
Types of Convergence

Pointwise Convergence

Pointwise convergence defines the convergence of functions in terms of the

convergence of their values at each point of their domain.
Definition
Suppose that (fn ) is a sequence of functions fn : A → R and f : A → R. Then
fn → f pointwise on A if fn (x) → f (x) as n → ∞ for every x ∈ A.

We say that the sequence (fn ) converges pointwise if it converges pointwise to

some function f , in which case

f (x) = lim fn (x).

n→∞

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 3 / 39
Types of Convergence

Uniform Convergence

Definition
Suppose that (fn ) is a sequence of functions fn : A → R and f : A → R. Then
fn → f uniformly on A if, for every ε > 0, there exists N ∈ N such that

n > N =⇒ |fn (x) − f (x)| < ε for all x ∈ A.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 4 / 39
Lebesgue Dominated Convergence Theorem

Lebesgue Dominated Convergence Theorem

Theorem
Let X be a measure space, µ be a Borel measure on X, g : X → R be L1 and
{fn } be a sequence of measurable functions from X → R such that
|fn (x)| ≤ g(x) for all x ∈ X and {fn } converges pointwise to a function f . Then
f is integrable and
Z Z
lim fn (x)dµ(x) = f (x)dµ(x).
n→∞

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 5 / 39
Lebesgue Dominated Convergence Theorem

Why is domination necessary?

Let’s see where things can go wrong if a sequence {fn } is not dominated by any
function. Take, for instance, the sequence of functions {fn } where for each n ∈ N
we define
(
n, if 0 < x ≤ n1
n(x) = n χ(0,1/n] (x) =
0, otherwise.
Then fn → 0 pointwise. But notice there is no integrable function g such that
|fn (x)| ≤ g(x) for all x ∈ (0, 1] and for all n. This is because for large values of
n, the height of fn is tending towards infinity, or in other words, the fn are
unbounded. Since the area of each rectangle is 1, we see that the integral and the
limit do not commute in this example. Explicitly:
Z 1 Z 1
1 = lim fn (x) dx ̸= lim fn (x) dx = 0
n→∞ 0 0 n→∞
where we have used the fact that
lim fn (x) = 0
n→∞
and that the Riemann and Lebesgue integral coincide in this case.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 6 / 39
Lebesgue Dominated Convergence Theorem

An example using the DCT

Compute the following integral:

n sin nx
Z
lim dx.
n→∞ R x(x2 + 1)

Solution
Let x ∈ R and begin by defining

n sin nx

fn (x) =
x(x2 + 1)
1
Each fn is measurable* and the sequence {fn } converges pointwise to 1+x2 :

x
!
sin n 1 1
lim fn (x) = lim x 2
=
n→∞ n→∞
n 1+x 1 + x2

x

sin n
lim x =1
n→∞
n
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 7 / 39
Lebesgue Dominated Convergence Theorem

Continuation

1
From this, we also see that g(x) = 1+x 2 works as a dominating function. Indeed,

g is integrable on R and

sin nx

1 x 1 1 1
|fn (x)| = x · 2
= sin · 2
≤ = g(x)
n 1 + x n x/n 1 + x 1 + x2

since | sin(x)| ≤ |x| for all x.

Thus, we apply the Dominated Convergence Theorem (DCT) to conclude
Z ∞
n sin nx n sin nx
Z
lim dx = lim dx =
n→∞ R x(x2 + 1) n→∞ −∞ x(x2 + 1)

Z ∞ ∞
1
2
dx = tan−1 (x) = π.
−∞ 1 + x −∞

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 8 / 39
Lebesgue Dominated Convergence Theorem

Example

Consider training a neural network that tries to approximate a continuous target

function f .We create a sequence of network functions fn where each fn
represents the network’s output after n training steps. As n → ∞, fn should
ideally converge to the target function f . LDCT ensures that the integral of fn
(related to the network’s prediction error over the domain) converges to the
integral of f . This is important for proving that, in the limit, the network can
approximate f over the entire domain, allowing for universal approximation.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 9 / 39
Hahn-Banach Theorem

Hahn-Banach Theorem

Theorem (Hahn-Banach Theorem - Geometric Form)

Let V be a normed vector space and A, B ⊂ V be two non-empty, closed,
disjoint, and convex subsets such that one of them is compact. Then there exists
a continuous linear functional f ̸≡ 0, some α ∈ R, and an ϵ > 0 such that

f (x) ≤ α − ϵ for any x ∈ A

and
f (y) ≥ α + ϵ for any y ∈ B.

Corollary
Let V be a normed vector space over R and U ⊂ V be a linear subspace such that
U ̸= V . Then there exists a continuous linear map f : V → R with

f (x) = 0 for any x ∈ U, and f ̸≡ 0.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 10 / 39
Riesz Representation Theorem

Riesz Representation Theorem

Theorem
Let Ω be a subset of Rn and F : C(Ω) → R be a linear functional on the space of
continuous real functions with domain on Ω. Then there exists a signed Borel
measure µ on Ω such that for any f ∈ C(Ω), we have that
Z
F (f ) = f (x)dµ(x).
Ω

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 11 / 39
Riesz Representation Theorem

The Riesz Representation Theorem connects linear functionals to integrals

involving a measure, which is essential for understanding how neural networks can
approximate functions in functional spaces. In functional approximation, we often
want to evaluate how ”close” an approximation is to a target function, and this
theorem provides a structured way to express functionals (such as a loss function)
as integrals

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 12 / 39
Riesz Representation Theorem

Riesz Representation Theorem in Hilbert Space

Theorem
If T is a bounded linear functional on a Hilbert space H, then there exists some
g ∈ H such that for every f ∈ H we have

T (f ) = ⟨f, g⟩.

Moreover, ∥T ∥ = ∥g∥, where ∥T ∥ denotes the operator norm of T , and ∥g∥ is the
Hilbert space norm of g.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 13 / 39
Riesz Representation Theorem

Example: Optimization in Machine Learning and Signal

Processing

In many applications, we need to minimize functionals of the form

J(f ) = ∥f − g∥2 , where g ∈ H and H is a Hilbert space.
Using the Riesz Representation Theorem, the minimizer f is characterized by
the inner product:
⟨f, g⟩ = ∥g∥2 .
Application: In support vector machines, we optimize using inner products
(kernels) where projections are onto a Hilbert space of functions.
The Riesz theorem underlies techniques for optimal filters, by allowing inner
product representations of functions.
This optimization concept is applied in designing classifiers and filters in
machine learning.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 14 / 39
Riesz Representation Theorem

Example: Functional Representation in Sobolev Spaces

(PDEs)
Sobolev spaces H 1 (Ω) are Hilbert spaces where functions have
square-integrable derivatives.
R
Given a functional L(u) = Ω f u dx based on boundary conditions in a PDE,
the theorem provides a unique u ∈ H 1 (Ω) such that

L(v) = ⟨v, u⟩H 1 (Ω) .

Application: This representation is key in finite element methods for solving

PDEs numerically.
For instance, the weak form of Poisson’s equation −∆u = f on Ω with
Dirichlet boundary conditions is solved by finding u ∈ H01 (Ω) such that
Z Z
∇u · ∇v dx = f v dx.
Ω Ω

This is widely used in engineering applications, like structural analysis and

fluid dynamics.
Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 15 / 39
Riesz Representation Theorem

Example: Measure Theory and Probability

The Riesz Representation Theorem for continuous functionals on C([a, b])

states that there exists a unique measure µ such that
Z
ϕ(f ) = f dµ.

Application: This is foundational in probability, as it represents expectations

in terms of integrals.
For a probability space with distribution µ, the expectation E[X] for a
random variable X is represented as
Z
E[X] = X dµ.

This representation is crucial in defining probability distributions and

expectations, used in fields like finance and statistical mechanics.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 16 / 39
Riesz Representation Theorem

Connecting to the Universal Approximation Theorem

The Universal Approximation Theorem states that a neural network with at least
one hidden layer and a non-linear activation function can approximate any
continuous function on a compact domain to any desired accuracy. The LDCT
and Riesz Representation Theorem support this concept by providing the
mathematical foundation for convergence and approximation:
LDCT helps ensure that as the neural network’s weights are adjusted to
approximate the target function, the sequence of network approximations
converges in an integrable sense.
Riesz Representation allows us to interpret approximation errors as integrable
functionals, linking them to the network’s error and providing a structured
way to evaluate convergence in function space.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 17 / 39
Universal Approximation

Universal Approximation

Definition
Given a topological space Ω, we define

C(Ω) := {f : Ω → R | f is continuous};

Definition
Let Ω be a topological space and f : R → R. We say that a neural network with
activation function f is a universal approximator on Ω if Σn (f ) is dense in C(Ω),
the set of continuous functions from Ω to R.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 18 / 39
Universal Approximation

What is the Universal Approximation Theorem?

The Universal Approximation Theorem states that a feedforward neural network

with a single hidden layer and a finite number of neurons can approximate any
continuous function on a compact subset of Rn , given an appropriate activation
function.
Formally, let C(K) be the space of continuous functions on a compact set
K ⊆ Rn . For any continuous function f ∈ C(K) and for any ϵ > 0, there exists a
feedforward neural network fˆ with a single hidden layer such that:

|f (x) − fˆ(x)| < ϵ for all x ∈ K

This means that the neural network fˆ(x) can approximate the function f (x) to
within any arbitrary degree of accuracy ϵ, given a sufficient number of neurons in
the hidden layer.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 19 / 39
Universal Approximation

How Neural Networks Approximate Functions

Neural networks approximate functions by adjusting the weights and biases of

their neurons. During training, the network iteratively adjusts these parameters to
minimize the error between its predictions and the actual outputs.
Input Layer: Accepts input data.
Hidden Layers: Processes the input through weighted connections and
activation functions.
Output Layer: Produces the final result or prediction.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 20 / 39
Universal Approximation

Neural Network Structure

A neural network’s function fˆ(x) can be described mathematically as a

composition of linear transformations and activation functions.
For a network with a single hidden layer, the output is given by:
M
X
fˆ(x) = ci · σ(wiT x + bi )
i=1

where:
M is the number of neurons in the hidden layer,
ci are the weights associated with the output layer,
wi and bi are the weights and biases of the hidden neurons, and
σ is the activation function (commonly non-linear).
By adjusting ci , wi , and bi , the neural network can approximate any continuous
function f (x) over a given domain.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 21 / 39
Universal Approximation

Compactness and Continuity

The theorem applies to functions defined on a compact set K ⊆ Rn .

A set is compact if it is closed and bounded.
Compactness ensures that the function f (x) is bounded and behaves well on
the domain K, which simplifies the approximation process.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 22 / 39
Universal Approximation

Definition of n-Discriminatory Activation Function

Let n be a natural number. We say an activation function f : R → R is

n-discriminatory if the only signed Borel measure µ such that:
Z
f (y · x + θ) dµ(x) = 0 for all y ∈ Rn and θ ∈ R

is a zero measure.
This implies that the activation function is discriminatory if the only way the
integral of the function can be zero for all inputs is if the measure is zero.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 23 / 39
Universal Approximation

Intuition Behind n-Discriminatory Property

An n-discriminatory activation function is one that can distinguish between

different input patterns for any dimensionality n.
It ensures that the network can separate different features or classes in the
data by mapping them to non-zero outputs.
Example: The function effectively ”filters” data by ensuring no non-trivial
measure satisfies the integral condition.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 24 / 39
Universal Approximation

Definition of Discriminatory Activation Function

Definition
An activation function f : R → R is discriminatory if it is n-discriminatory for all
natural numbers n.
This means that the activation function can distinguish between all possible linear
combinations of inputs in any dimensionality of the input space.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 25 / 39
Universal Approximation

Importance of Discriminatory Activation Functions

Discriminatory activation functions are key to enabling neural networks to

model complex, non-linear relationships.
They allow networks to separate data points or features that may not be
linearly separable.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 26 / 39
Universal Approximation

Properties of Activation Functions

The theorem requires that σ(x) be:

Non-constant
Bounded
Continuous
Monotonically increasing
These properties allow the neural network to capture complex, non-linear
relationships in the data.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 27 / 39
Common Activation Functions

ReLU Activation Function

Definition
The Rectified Linear Unit (also denoted ReLU) is a function R → R defined by

ReLU(x) = max(0, x)

Discriminatory Property: ReLU allows the network to ”activate” only

positive input signals, helping it discriminate between different feature types.
The ReLU activation function is commonly used in hidden layers of neural
networks.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 28 / 39
Common Activation Functions

Sigmoid Activation Function

Definition
A function f : R → R is called a sigmoid if it satisfies the following two properties:

lim f (x) = 1 and lim f (x) = 0.

x→∞ x→−∞

It maps input values to the range (0, 1):

1
σ(x) =
1 + e−x
Discriminatory Property: Sigmoid produces outputs that can be interpreted
as probabilities, distinguishing between different classes.
The Sigmoid activation function is widely used in output layers for binary
classification tasks.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 29 / 39
Common Activation Functions

Softmax Activation Function

Softmax converts raw output values (logits) into probabilities:

exi
Softmax(xi ) = P xj
je

Discriminatory Property: Softmax ensures that the sum of the output

probabilities equals 1, allowing clear distinction between multiple classes.
Softmax is used in the output layer for multi-class classification problems.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 30 / 39
Common Activation Functions

ReLU in Neural Networks

ReLU is effective at learning complex features in the hidden layers of a neural

network.
By activating only positive inputs, ReLU provides sparse activations, helping
the network focus on the most important features.
Example: In image recognition, ReLU helps the network detect edges,
shapes, and textures effectively.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 31 / 39
Common Activation Functions

Sigmoid for Binary Classification

Sigmoid is often used in the output layer for binary classification problems.
It provides a probability score, indicating the likelihood of a data point
belonging to a particular class.
Example: For spam email detection, the Sigmoid output helps determine
whether an email is spam (1) or not spam (0).

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 32 / 39
Common Activation Functions

Softmax for Multi-Class Classification

Softmax is commonly used in the output layer for tasks like digit
classification (0-9).
It assigns probabilities to each class, enabling the network to select the most
probable category.
Example: For digit recognition, Softmax ensures that the network outputs a
probability distribution over the 10 digits.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 33 / 39
Common Activation Functions

Training Neural Networks with Discriminatory Activation

Functions

Discriminatory activation functions ensure that the gradients during training

are non-zero, enabling the network to learn effectively.
By promoting differentiation between inputs, these functions allow the
network to adjust weights efficiently during backpropagation.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 34 / 39
Common Activation Functions

Gradient Descent and Backpropagation

Gradient Descent is an optimization method that minimizes the loss

function by updating weights based on the gradient.
Backpropagation calculates the gradient of the loss with respect to each
weight and propagates it backward through the network.
Discriminatory activation functions enable faster convergence and better
learning.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 35 / 39
Common Activation Functions

ReLU Example with Gradient Flow

ReLU promotes positive gradient flow, especially when inputs are positive.
This helps neural networks learn hierarchical features, making ReLU an
excellent choice for deep networks.
Example: In object detection, ReLU aids in distinguishing between key
features, improving feature extraction.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 36 / 39
Common Activation Functions

Practical Limitations of the Universal Approximation

Theorem

While the Universal Approximation Theorem is mathematically elegant, it has

practical limitations:
Network Size and Efficiency
The theorem guarantees approximation of any continuous function, but it
does not specify the required network size. In some cases, achieving high
accuracy may require an impractically large number of neurons.
Overfitting
While a network may fit the training data perfectly, it risks overfitting,
leading to poor performance on unseen data. Regularization techniques, such
as dropout or early stopping, help mitigate this risk.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 37 / 39
Common Activation Functions

Generalization
The theorem ensures function approximation on the given training set but
does not guarantee generalization to new data. Cross-validation is often used
to ensure better generalization.
Training Difficulties
Although the theorem asserts the existence of an approximation, it offers no
guidance on efficient training. Gradient-based methods may get stuck in local
minima or saddle points, complicating the search for an optimal solution.

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 38 / 39
References

References

https://www.geeksforgeeks.org/
universal-approximation-theorem-for-neural-networks/
http://mathonline.wikidot.com/
applying-lebesgue-s-dominated-convergence-theorem-1?fbclid=
IwZXh0bgNhZW0CMTAAAR1JCH0-0qWPH-Z8PtfwNNO03st43788pkVfWyJHVZqF8R
aem_WSTJPHD-3K52jgb3wLZQRg s
Leonardo Ferreira Guilhoto, An Overview of Artificial Neural Networks for
Mathematicians
https://math.uchicago.edu/~may/REU2018/REUPapers/Guilhoto.pdf
https://www.math3ma.com/blog/dominated-convergence-theorem

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili Universal Approximation November 28, 2024 39 / 39

Universality of L-Functions J. Steuding
No ratings yet
Universality of L-Functions J. Steuding
12 pages
Dominated Convergence Theorem
No ratings yet
Dominated Convergence Theorem
4 pages
Harmonic Analysis
No ratings yet
Harmonic Analysis
123 pages
BURMAYEARBOOK2007
100% (1)
BURMAYEARBOOK2007
964 pages
On The Universality of The Riemann Zeta-Function
100% (2)
On The Universality of The Riemann Zeta-Function
70 pages
Pmath 450
No ratings yet
Pmath 450
207 pages
Tentative Deviation at KK Nagar - Correction
100% (1)
Tentative Deviation at KK Nagar - Correction
84 pages
Math6338 hw2
100% (1)
Math6338 hw2
6 pages
1 Anoteonl - Spaces: Tma 401/man 670 Functional Analysis 2003/2004
100% (1)
1 Anoteonl - Spaces: Tma 401/man 670 Functional Analysis 2003/2004
13 pages
The Kernels of The Hankel and Toeplitz Operator
No ratings yet
The Kernels of The Hankel and Toeplitz Operator
43 pages
Solution Space of A Homogeneous Linear Differential Equation
No ratings yet
Solution Space of A Homogeneous Linear Differential Equation
37 pages
5215 Notes
No ratings yet
5215 Notes
113 pages
Measure Theory Notes
No ratings yet
Measure Theory Notes
83 pages
Sobolov Spaces
No ratings yet
Sobolov Spaces
49 pages
Riemann Hypothesis Proof Finished
No ratings yet
Riemann Hypothesis Proof Finished
14 pages
Lecture Notes MQM1819
No ratings yet
Lecture Notes MQM1819
199 pages
Learn Kapampangan
0% (1)
Learn Kapampangan
4 pages
Research: 1 Theorems and Open Problems
No ratings yet
Research: 1 Theorems and Open Problems
12 pages
Purchase Order
No ratings yet
Purchase Order
1 page
Fa mth405 1
0% (1)
Fa mth405 1
31 pages
Reisz Representation Theorem: Birla Institute of Technology
No ratings yet
Reisz Representation Theorem: Birla Institute of Technology
6 pages
Probability Theory I: CAM 384K Concepts
No ratings yet
Probability Theory I: CAM 384K Concepts
14 pages
Fseries
No ratings yet
Fseries
7 pages
Functional Analysis and Partial Differential Equations: Notes On
100% (1)
Functional Analysis and Partial Differential Equations: Notes On
114 pages
231A Lecture Notes v23
No ratings yet
231A Lecture Notes v23
92 pages
Complex Variables, Theory and Application: An International Journal
No ratings yet
Complex Variables, Theory and Application: An International Journal
7 pages
Functions of Housekeeping Department and Inter Relationship With Other Departments
No ratings yet
Functions of Housekeeping Department and Inter Relationship With Other Departments
12 pages
Riesz
No ratings yet
Riesz
3 pages
Introduction To Some Convergence Theorems: 2.1 Recap
No ratings yet
Introduction To Some Convergence Theorems: 2.1 Recap
7 pages
Midterm
No ratings yet
Midterm
6 pages
Domain Name System: Window Server 2012 R2
No ratings yet
Domain Name System: Window Server 2012 R2
46 pages
COVID BoE Amended Complaint and PI
No ratings yet
COVID BoE Amended Complaint and PI
564 pages
Functional Analysis
No ratings yet
Functional Analysis
66 pages
Thy Distr HC Notes 2025jan 1
No ratings yet
Thy Distr HC Notes 2025jan 1
83 pages
MAT614 - 2020-3 Co
No ratings yet
MAT614 - 2020-3 Co
19 pages
Care Management of Small Ruminant
No ratings yet
Care Management of Small Ruminant
29 pages
MRP System Nervousness
100% (1)
MRP System Nervousness
232 pages
Bochner's Theorem On The Fourier Transform On R
No ratings yet
Bochner's Theorem On The Fourier Transform On R
9 pages
Course in Theoretical Physics Cam/hard
No ratings yet
Course in Theoretical Physics Cam/hard
161 pages
Notes2 PDF
No ratings yet
Notes2 PDF
31 pages
Real Analysis Flashcards
No ratings yet
Real Analysis Flashcards
18 pages
Anal Qual
No ratings yet
Anal Qual
10 pages
Menon
No ratings yet
Menon
17 pages
A Short Proof of F. Riesz Representation Theorem: Rafael Del Rio Asaf L. Franco Jose A. Lara
No ratings yet
A Short Proof of F. Riesz Representation Theorem: Rafael Del Rio Asaf L. Franco Jose A. Lara
13 pages
A Basis Theory Primer, Heil
No ratings yet
A Basis Theory Primer, Heil
97 pages
EWM Setup - Expertsoft
No ratings yet
EWM Setup - Expertsoft
7 pages
Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.
No ratings yet
Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.
87 pages
Well Completion
No ratings yet
Well Completion
64 pages
Abbas C. - Functional Analysis-Math 920 (Spring 2003) (2003)
No ratings yet
Abbas C. - Functional Analysis-Math 920 (Spring 2003) (2003)
135 pages
Exponential Approximation And Meromorphic Interpolation: e L π, π f L π, π f n
No ratings yet
Exponential Approximation And Meromorphic Interpolation: e L π, π f L π, π f n
15 pages
July 2010 Nursing Board Exam Topnotchers
No ratings yet
July 2010 Nursing Board Exam Topnotchers
2 pages
SERFATY Functional Analysis
No ratings yet
SERFATY Functional Analysis
66 pages
1 3 1 Preparing The Title And-Title Page
No ratings yet
1 3 1 Preparing The Title And-Title Page
3 pages
Fundamental Approximation Theorems: Kunal Narayan Chaudhury
No ratings yet
Fundamental Approximation Theorems: Kunal Narayan Chaudhury
4 pages
A Proof of Corona Theorem
No ratings yet
A Proof of Corona Theorem
41 pages
Bases
No ratings yet
Bases
97 pages
Introduction To Partial Differential Equations 802635S: Valeriy Serov University of Oulu 2011
No ratings yet
Introduction To Partial Differential Equations 802635S: Valeriy Serov University of Oulu 2011
122 pages
ACI Materials Journal July 2023 v.120 No.4
No ratings yet
ACI Materials Journal July 2023 v.120 No.4
106 pages
Subhangi Notes
No ratings yet
Subhangi Notes
32 pages
IMC 2017 - Day 2 (Problems and Solutions)
No ratings yet
IMC 2017 - Day 2 (Problems and Solutions)
5 pages
303-01c Engine - V8 (4V)
No ratings yet
303-01c Engine - V8 (4V)
94 pages
Fa Eng 2017
No ratings yet
Fa Eng 2017
4 pages
Daniel MI Notes
No ratings yet
Daniel MI Notes
74 pages
Kendre Anitha Consumer Forum Final
No ratings yet
Kendre Anitha Consumer Forum Final
19 pages
Measure Theory and Lebesgue Integration: Appendix D
No ratings yet
Measure Theory and Lebesgue Integration: Appendix D
14 pages
ORFE 526 - Probability: 1 Definitions
No ratings yet
ORFE 526 - Probability: 1 Definitions
10 pages
Ss12 Fa Extra Exercises Solutions
No ratings yet
Ss12 Fa Extra Exercises Solutions
5 pages
LP Applied 2
No ratings yet
LP Applied 2
3 pages
Measure Theory and Lebesgue Integration: Appendix D
No ratings yet
Measure Theory and Lebesgue Integration: Appendix D
14 pages
Study of Brand
No ratings yet
Study of Brand
38 pages
Crystal Academy: Ecosystem
No ratings yet
Crystal Academy: Ecosystem
3 pages
All Those Rules Cimt June 2020
No ratings yet
All Those Rules Cimt June 2020
27 pages
Tma4225 2010-12-08 en Sol
No ratings yet
Tma4225 2010-12-08 en Sol
4 pages
Comprehensive Accounting Review Center
No ratings yet
Comprehensive Accounting Review Center
3 pages
AP LMS 2021-2025 Batch Not Verified List
No ratings yet
AP LMS 2021-2025 Batch Not Verified List
127 pages
G2000 Ambient Oil Mist Detector
No ratings yet
G2000 Ambient Oil Mist Detector
4 pages
Aimrad Alexis 2
No ratings yet
Aimrad Alexis 2
9 pages
Bush & James JR 2020 - Adolescents in Individualistics Cultures
No ratings yet
Bush & James JR 2020 - Adolescents in Individualistics Cultures
11 pages
Course 5
No ratings yet
Course 5
19 pages
Soy Milk Maker, Commercial Large Soybean Milk Grinding Machine Electric Automatic Ordinary Soya Milk Tofu Maker and Dregs Separater Splitter Organic Soy Nuts Milk Filter 25KG - H - Home & Kitchen
No ratings yet
Soy Milk Maker, Commercial Large Soybean Milk Grinding Machine Electric Automatic Ordinary Soya Milk Tofu Maker and Dregs Separater Splitter Organic Soy Nuts Milk Filter 25KG - H - Home & Kitchen
9 pages
CHEMISTRY Grade 9 Retake
No ratings yet
CHEMISTRY Grade 9 Retake
8 pages
4 Some Functional Analysis
No ratings yet
4 Some Functional Analysis
4 pages
Caste Wise Schemes Abstract
No ratings yet
Caste Wise Schemes Abstract
2 pages
Tips and Tricks in Real Analysis: Nate Eldredge August 3, 2008
No ratings yet
Tips and Tricks in Real Analysis: Nate Eldredge August 3, 2008
5 pages
Amjad Khan
No ratings yet
Amjad Khan
2 pages
Lax-Milgram Lemma and Applications
No ratings yet
Lax-Milgram Lemma and Applications
6 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Universal Approximation Updated

Uploaded by

Universal Approximation Updated

Uploaded by

Universal Approximation

Anano Tamarashvili, Ani Okropiridze, Mariami Mamageishvili

November 28, 2024

2 Lebesgue Dominated Convergence Theorem

4 Riesz Representation Theorem

6 Common Activation Functions

Pointwise convergence defines the convergence of functions in terms of the

We say that the sequence (fn ) converges pointwise if it converges pointwise to

f (x) = lim fn (x).

n > N =⇒ |fn (x) − f (x)| < ε for all x ∈ A.

Lebesgue Dominated Convergence Theorem

Why is domination necessary?

An example using the DCT

since | sin(x)| ≤ |x| for all x.

Consider training a neural network that tries to approximate a continuous target

Theorem (Hahn-Banach Theorem - Geometric Form)

f (x) ≤ α − ϵ for any x ∈ A

f (x) = 0 for any x ∈ U, and f ̸≡ 0.

Riesz Representation Theorem

The Riesz Representation Theorem connects linear functionals to integrals

Riesz Representation Theorem in Hilbert Space

Example: Optimization in Machine Learning and Signal

In many applications, we need to minimize functionals of the form

Example: Functional Representation in Sobolev Spaces

L(v) = ⟨v, u⟩H 1 (Ω) .

Application: This representation is key in finite element methods for solving

This is widely used in engineering applications, like structural analysis and

Example: Measure Theory and Probability

The Riesz Representation Theorem for continuous functionals on C([a, b])

Application: This is foundational in probability, as it represents expectations

This representation is crucial in defining probability distributions and

Connecting to the Universal Approximation Theorem

What is the Universal Approximation Theorem?

The Universal Approximation Theorem states that a feedforward neural network

|f (x) − fˆ(x)| < ϵ for all x ∈ K

How Neural Networks Approximate Functions

Neural networks approximate functions by adjusting the weights and biases of

Neural Network Structure

A neural network’s function fˆ(x) can be described mathematically as a

Compactness and Continuity

The theorem applies to functions defined on a compact set K ⊆ Rn .

Definition of n-Discriminatory Activation Function

Let n be a natural number. We say an activation function f : R → R is

Intuition Behind n-Discriminatory Property

An n-discriminatory activation function is one that can distinguish between

Definition of Discriminatory Activation Function

Importance of Discriminatory Activation Functions

Discriminatory activation functions are key to enabling neural networks to

Properties of Activation Functions

The theorem requires that σ(x) be:

ReLU Activation Function

Discriminatory Property: ReLU allows the network to ”activate” only

Sigmoid Activation Function

lim f (x) = 1 and lim f (x) = 0.

It maps input values to the range (0, 1):

Softmax Activation Function

Softmax converts raw output values (logits) into probabilities:

Discriminatory Property: Softmax ensures that the sum of the output

ReLU in Neural Networks

ReLU is effective at learning complex features in the hidden layers of a neural

Sigmoid for Binary Classification

Softmax for Multi-Class Classification

Training Neural Networks with Discriminatory Activation

Discriminatory activation functions ensure that the gradients during training

Gradient Descent and Backpropagation

Gradient Descent is an optimization method that minimizes the loss

ReLU Example with Gradient Flow

Practical Limitations of the Universal Approximation

While the Universal Approximation Theorem is mathematically elegant, it has

You might also like