0% found this document useful (0 votes)

14 views8 pages

REIMANNOPT

The document presents R IEMANN O PT, a framework designed to optimize the Riemann Sum approximations used in Integrated Gradients (IG) for deep neural networks, thereby reducing noise and improving attribution accuracy. This method can enhance Insertion Scores by up to 20% and significantly decrease computational costs, making it suitable for real-time applications. R IEMANN O PT is versatile and can be applied to various IG derivatives, allowing for cleaner saliency maps without imposing architectural constraints on the models.

Uploaded by

swadiswain007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views8 pages

REIMANNOPT

Uploaded by

swadiswain007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Riemann Sum Optimization for Accurate Integrated

Gradients Computation

Swadesh Swain Shree Singhi

Indian Institute of Technology, Roorkee Indian Institute of Technology, Roorkee
[email protected] [email protected]

Abstract

Integrated Gradients (IG) is a widely used algorithm for attributing the outputs
of a deep neural network to its input features. Due to the absence of closed-
form integrals for deep learning models, inaccurate Riemann Sum approximations
are used to calculate IG. This often introduces undesirable errors in the form
of high levels of noise, leading to false insights in the model’s decision-making
process. We introduce a framework, R IEMANN O PT, that minimizes these errors
by optimizing the sample point selection for the Riemann Sum. Our algorithm
is highly versatile and applicable to IG as well as its derivatives like Blur IG and
Guided IG. R IEMANN O PT achieves up to 20% improvement in Insertion Scores.
Additionally, it enables its users to curtail computational costs by up to four folds,
thereby making it highly functional for constrained environments.

1 Introduction

Deep Neural Network (DNN) classifiers for computer vision are increasingly being utilized in critical
fields such as healthcare [4] and autonomous driving [3]. Hence, it has become increasingly important
to understand the decision-making process for these models. This has led to a growing body of
research focused on understanding how the predictions of these deep networks can be attributed to
specific regions of the image. An attribution method attempts to explain which inputs the model
considers to be most important for its outputs. Several gradient-based [20, 18, 19, 9, 17, 12] and
gradient-free [11, 14, 23, 5, 7, 30, 22, 29] attribution methods have been developed for deep learning
models. Integrated Gradient methods [27, 10, 24] are a specific class of gradient-based attribution
methods that compute a line integral of the gradients of the model over a path defined from a baseline
image to the given input.
The complex functional space of deep learning models is often considered as a source of noise for
many gradient-based attribution methods, resulting in undesirable high attribution to some background
regions. For Integrated Gradients, Kapishnikov et al. [2021] claim that the source of noise is large
gradients in the model surface while Smilkov et al. [2017] argue that the source of error is the rapid
fluctuation of the gradients of deep learning models.
Deep learning models do not have closed-form integrals, so their integrals are approximated by
the Riemann Sums [24]. This approximation involves the sampling of a number of points along
the path and approximating the integral using interpolation between these points. Using more
points to approximate the Riemann sum naturally results in cleaner saliency maps. However, most
applications of Integrated Gradients require a high number of steps for the Riemann Sum [24, 16],
generally between 20 to 1000, rendering Integrated Gradients’ usage practically unfeasible in real-
time applications. On the other hand, using lesser number of samples severely impacts the quality of
the saliency map. This results in a trade-off between speed and performance.

Interpretable AI: Past, Present and Future Workshop at NeurIPS 2024

Figure 1: Visual comparison of Integrated Gradient methods with and without R IEMANN O PT. For
IG, R IEMANN O PT suppresses the noise around the Spoonbill and also slightly concentrates stronger
attribution scores on the mouse trap. Applying R IEMANN O PT to BlurIG significantly increases
concentration on the subjects of images. GIG saliency maps remain perceptually similar.

Sotoudeh and Thakur [2019] have attempted to tackle the above issue of inaccurate Integrated
Gradients computation by exactly computing the underlying integral using E XACT L INE. However,
their application is limited to neural networks that are composed of piece-wise linear operations. Most
traditional models, like InceptionV3 [25], ViT [6] and ResNet [8], while being primarily composed
of linear operations, also make use of non-linear operations like LayerNorm [1], GroupNorm [26]
and Attention [2] thus prohibiting the use of E XACT L INE. Furthermore, the use of E XACT L INE
might not be considered ideal in cases where there are computational constraints since it requires
∼ 14000 gradient computations per image for large models.
To overcome the problem of redundancy by ineffective sampling schedules prevalent in Integrated
Gradient methods, we introduce R IEMANN O PT, a framework to pre-determine optimal points for
sampling to calculate Riemann Sums. The pre-determined points are specific only to the model.
Hence, the computation to determine the points is only done once and does not need to be repeated for
every image. Unlike E XACT L INE, our method does not impose any architectural constraints on the
underlying model, with the additional benefit of requiring far fewer samples. We present qualitative
and quantitative results for R IEMANN O PT on Integrated Gradients (IG) [24], Blur Integrated Gradients
(BlurIG) [27] and Guided IG (GIG) [10]. Our method can be easily combined with existing IG-based
methods and enable them to generate cleaner saliency maps.

2 Background

In this section, we review the mathematical definition of IG [24], BlurIG [27], and GIG [10].

2.1 Integrated Gradients

Sundararajan et al. [2017] utilized the idea of a path function. γ : [0, 1] → Rn is a smooth function
that denotes a path within Rn from x′ to x, satisfying γ(0) = x′ and γ(1) = x. Further, they defined
path integrated gradients along the ith dimension for an input x, given a baseline x′ , obtained by
integrating the gradients along the path γ(α) for α ∈ [0, 1] as:

Z 1
∂f (γ(α)) ∂γi (α)
Ii (x) = dα, (1)
0 ∂γi (α) ∂α

Where f denotes a DNN classifier. Integrated Gradients (IG) Sundararajan et al. [2017] originally
defined the path method as a straight line path specified γ IG (α) = x′ +α×(x−x′ ) for α ∈ [0, 1].
Later, BlurIG and GIG introduced non-linear paths that had their respective advantages over IG.

2
2.2 Blur Integrated Gradients

Xu et al. [2020] introduced Blur Integrated Gradients: For a given function f : Rm×n → [0, 1]
representing a classifier, let z(x, y) be the 2D input. Blur IG’s path is defined by a Gaussian filter that
progressively blurs the input. Formally:

∞ ∞
X X1 − x2 +y2
γ BlurIG (x, y, α) = e α z(x − m, y − n)
m=−∞ n=−∞
πα

The final BlurIG computation is as follows:

0
∂fc (γ BlurIG (x, y, α)) ∂γ BlurIG (x, y, α)
Z
I BlurIG (x, y) ::= dα
∞ ∂γ BlurIG (x, y, α) ∂α

2.3 Guided Integrated Gradients

Guided IG [10] (GIG) follows an adaptive integration path γ IG (α), α ∈ [0, 1] to avoid high gradient
regions. An adaptive path is one that depends on the model being used:
N Z 1
X ∂f (γ(α)) ∂γi (α)
γ GIG = argmin | |dα, (2)
γ∈Γ i=1 0 ∂γi (α) ∂α

After finding the optimal path γ GIG , GIG computes the attribution values similar to IG,
Z 1
∂f (γ GIG (α)) ∂γiGIG (α)
IiGIG (x) = dα. (3)
0 ∂γiGIG (α) ∂α

3 Methodology
In this section, we present simple derivation to determine an upper bound on the error introduced by
approximating a one-dimensional integral using a Riemann Sum. We then extend the definition for
multi-dimensional line integrals and define the algorithm R IEMANN O PT uses to schedule samples to
minimize this upper bound.

3.1 Error Minimization of Riemann Sums in 1D

We now present the derivation to estimate

R α the error introduced due to the left Riemann Sum ap-
proximation of a standard 1D integral α0k g(α) dα where {αi }ki=0 is the set of points at which the
integrand, g(α), is evaluated.
The standard way to calculate the left Riemann Sum is:

n−1
X
R= g(αi )(αi+1 − αi ) (4)
i=0

The integral can be broken down as:

n−1
X Z αi+1
I= g(α)dα (5)
i=0 αi

By applying the Taylor Series approximation around αi in (5):

n−1
X Z αi+1
I≈ g(αi ) + (α − αi )g ′ (αi ) dx
i=0 αi
(6)
n−1
X (αi+1 − αi )2
≈ g(αi )(αi+1 − αi ) + g ′ (αi )
i=0
2

3
By (4), (6) and the Triangle Inequality:
n−1
1X ′
|R − I| ≲ g (αi )(αi+1 − αi )2 (7)
2 i=0

3.2 Algorithm

IG computes d integrals (attributions) per image, one for each pixel. We treat each integral indepen-
dently and use the derivation above to estimate the average error over all integrals. The input to the
function, g, would be multidimensional, resulting in a different Taylor Series expansion. However, the
approximation would still be mathematically sound since the integral corresponding to the ith feature
is only dependant on the gradient along that component, i.e. error corresponding to the ith dimension
of gradient only contributes to the ith integral. We use this observation in conjunction with the finite
distance approximation of the derivative to determine the optimal points for sampling a Riemann
Sum for the dataset. The primary idea behind the algorithm is to approximate the average |g ′ (α)|
for all input features on a small subset of images, ∼ 1% of the validation dataset, then compute the
optimal sampling points and use them for the entire dataset.
The following tensors are used in Algorithm 1 where d is the dimensionality of the input:

• Ik×d : Samples evaluated at k equispaced points along the path.

• Ck−1×d : Finite difference estimate of the derivative of I for all input features.
• Ak−1 : Absolute derivative estimate of I, corresponding to |g ′ (α)|

Algorithm 1 Estimation of Optimal Alphas

Inputs:
A subset of m examples from the validation dataset: Xi ∈ Rd , i ∈ {1, . . . , m}
Number of sample points in a path: k
Integrand of the IG method: ∂f∂γ(α)
(γ(α))
⊙ ∂γ(α)
∂α
Output:
Optimal sampling points: {αj∗ }kj=1
Initialization:
Set {αj }kj=1 as k linearly spaced scalars between the integral bounds
A ← Initialize with zeros
for each i in {1, . . . , m} do ▷ Loop over training examples
∂f (γ(α)) ∂γ(α)
Ij ← ∂γ(α) ⊙ ∂α for j in {1, . . . , k}
α=αj
I −I
j+1
Ck−1×d ← αj+1 j
−αj for j in {1, . . . , k − 1} ▷ Finite difference: g ′ (α) ≈ g(α+∆α)−g(α)
∆α
Apply element-wise absolute to C
Ak−1 += Average Ck−1×d across all features ▷ Estimate of |g ′ (α)|
end for
Normalize A by dividing by number of examples m
|g ′ (α)| ← Linearly Interpolate(A, α) ▷ α ∈ [0, 1], A ∈ Rk−1
∗ k
αj ← The set {αj }j=1 that minimizes the upper bound error defined by Equation (7)
return {αj∗ }kj=1

4 Experimental Setup and Metrics

In this section, we discuss the details of the implementation, dataset, model, and metrics used.

4.1 Experimental Setup

We use the original implementations with default parameters in the authors’ code for IG, GIG,
and BlurIG and implement R IEMANN O PT as a pre-computation step that links with the original

4
implementations. We present our results using InceptionV3 for 16, 32, 64 and 128 sample points on
the correctly classified images of the ImageNet validation dataset, ∼ 40K. To estimate |g ′ (α)|, we
apply Algorithm 1 to a set of 200 randomly correctly classified images from the ImageNet validation
dataset for 128 samples. Then, we use Powell’s method [15] to determine the optimal set of sampling
points. This roughly has the same computational cost as computing the saliency map for the set of
200 images. Using R IEMANN O PT is still cost-effective since we only use a small number of images
to calculate sample points but are able to use these points for the entire dataset.

4.2 Metrics

Previous works use the Insertion Score and Normalized Insertion Score to compare different attribu-
tion methods [27, 9, 10, 28, 14, 13]. It is critical to note that the purpose of the Insertion Score is
to measure the efficacy of a saliency map, i.e. it is not designed to measure how close the Riemann
Sum is to the actual integral. However, it is reasonable to assume that the true saliency map would
generally achieve better Insertion Scores than an inaccurate approximation since inaccurate estimates
introduce noise. Hence, we report the Insertion Scores and, additionally, employ the Axiom of
Completeness [24] to define a new metric that measures the quality of the saliency maps without the
need for this hypothesis.
According to the Axiom of Completeness, the sum of all feature attributions, determined by any
Integrated Gradients method, must ideally add up to the difference between the output of f at x
and x′ . However, there is always an error due to inaccurate Riemann Sum estimates. Furthermore,
Sundararajan et al. [2017] advise the developer to ensure that all feature attributions add up to
f (x) − f (x′ ) (within 5%) and suggest increasing the number of samples if the error is greater.
Since the ground truth is unavailable, it is non-trivial to determine the numerical accuracy of a
computed saliency map. We use the relative error between the sum of feature attributions and
f (x) − f (x′ ) to estimate the error. This metric is not infallible since the features’ positive and
negative errors partially offset each other during the summation. Using the Triangle Inequality, it can
be easily shown that this metric is a lower bound on the true error. Nevertheless, it serves as a helpful
proxy since near-perfect saliency maps will have near-zero error, and highly erroneous maps will, on
average, have high error even after the errors partial offset.

5 Results and Discussion

In this section, we compare the sampling points chosen by R IEMANN O PT to the linear schedules,
followed by qualitative and quantitative evaluation against the baselines. In the case of BlurIG, the
sample points chosen by R IEMANN O PT highly differ from the linearly spaced samples, as depicted
in Figure 2. Every path starts with an information-less baseline image, x′ , and gradually gains
perceptible features as it moves towards the input image, x. Along the path, when the image becomes
perceptible, the gradients rapidly change, resulting in large values of |g ′ (α)|. For BlurIG, the image
features become perceptible at the end of the path, when most of the sharpening occurs. For IG and
GIG, the image becomes perceptible as soon as its brightness crosses a certain threshold, α ≈ 0.1.

Figure 2: Estimated |g ′ (α)| and comparison of 16 linearly spaced samples and 16 optimal samples
chosen by R IEMANN O PT. High values of |g ′ (α)| indicate regions of the path where the gradients of
the model are rapidly changing, i.e. regions where the image becomes perceptible to the model.

5
IG IG + Ours BlurIG BlurIG + Ours GIG GIG + Ours

Insertion Score (↑) Normalized Insertion Score (↑)

0.45

0.5
0.4

0.35
0.4

0.3

0.25 0.3
24 25 26 27 24 25 26 27
Samples Samples
Figure 3: We compare R IEMANN O PT against the baseline methods using the Insertion Score and
Normalized Insertion Score. We observe noticeable improvement for BlurIG and IG.

R IEMANN O PT always reduces the relative error and improves metric scores across all methods
and sample counts as depicted in Table 1 and Figure 3 respectively, with a noticeable enhancement
for BlurIG. On the other hand, the improvement in GIG is not very significant. The path of GIG
is theoretically fixed for chosen model. However, due to the employment of an adaptive path, its
practical implementation is highly dependent on the number of samples as well as the location of
the samples, unlike BlurIG and IG. In the derivation 3.1 of R IEMANN O PT, we assumed that the
path function was constant and independent of the sample points. The practical implementation
of GIG breaks this assumption; this is a possible explanation for why GIG is not as improved by
R IEMANN O PT as the other methods are. In terms of relative error, R IEMANN O PT significantly
reduces the number of samples while maintaining comparable performance. Specifically, BlurIG +
R IEMANN O PT achieves similar results with 16 samples as BlurIG with 64 samples. Additionally,
BlurIG + R IEMANN O PT with 16 samples performs comparably to BlurIG with 32 samples, and GIG
+ R IEMANN O PT matches the performance of GIG with 128 samples using just 16 samples. This
makes R IEMANN O PT highly functional for computationally constrained environments.

Table 1: Relative Error (↓) across different methods

Method 16 Samples 32 Samples 64 Samples 128 Samples

IG 0.708 0.374 0.166 0.066
IG + R IEMANN O PT 0.404 0.223 0.123 0.065
BlurIG 0.886 0.554 0.268 0.114
BlurIG + R IEMANN O PT 0.269 0.123 0.058 0.041
GIG 0.786 0.788 0.725 0.612
GIG + R IEMANN O PT 0.666 0.731 0.711 0.610

6 Conclusion
In this paper, we present R IEMANN O PT, a highly efficient framework designed to optimize sample
points in Riemann Sums for the computation of Integrated Gradients. Both qualitative and quantitative
results demonstrate that R IEMANN O PT effectively minimizes numerical errors in saliency maps and
achieves improved Insertion Scores by up to 20%, thereby enhancing the accuracy and reliability of
attribution maps. R IEMANN O PT is adaptable, extending its applicability to any multi-dimensional
line integral computation, including derivatives of Integrated Gradients such as BlurIG and GIG.
Additionally, it enables users to curtail computational costs by up to fourfold, significantly boosting
efficiency. Opportunities for future work include extending R IEMANN O PT to further improve its
suitability for Integrated Gradient methods that employ adaptive paths.

6
Acknowledgments and Disclosure of Funding
We would like to thank Aayan Yadav, Shweta Singh, Anupriya Kumari and Devansh Bhardwaj for
their insights on the paper writing. We would also like to thank all members of the Data Science
Group of IIT Roorkee for their invaluable support.

References
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016.
Dzmitry Bahdanau, Kyunghyun Cho, and Y. Bengio. Neural machine translation by jointly learning
to align and translate. ArXiv, 2014.
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush
Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multimodal dataset for
autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2020.
Bresnick G Cuadros J. Eyepacs: An adaptable telemedicine system for diabetic retinopathy screening.
In Journal of Diabetes Science and Technology, 2009.
Piotr Dabkowski and Yarin Gal. Real time image saliency for black box classifiers. In Advances in
Neural Information Processing Systems. Curran Associates, Inc., 2017.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas
Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit,
and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.
In International Conference on Learning Representations, 2021.
Ruth C. Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturba-
tion. In 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image
recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2016.
Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viegas, and Michael Terry. Xrai: Better attributions
through regions. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
Andrei Kapishnikov, Ben Wedin, Besim Namik Avci, Michael Terry, Subhashini Venugopalan, and
Tolga Bolukbasi. Guided integrated gradients: An adaptive path method for removing noise. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
2021, 2021.
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances
in Neural Information Processing Systems, 2017.
Ettore Mariotti, Jose M. Alonso-Moral, and Albert Gatt. Measuring model understandability by
means of shapley additive explanations. In 2022 IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE), 2022.
Deng Pan, Xin Li, and Dongxiao Zhu. Explaining deep neural network models with adversarial
gradient integration. In Proceedings of the Thirtieth International Joint Conference on Artificial
Intelligence, IJCAI-21, 2021.
Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of
black-box models, 2018.
M. J. D. Powell. An efficient method for finding the minimum of a function of several variables
without calculating derivatives. The Computer Journal, 1964.
Kristina Preuer, Günter Klambauer, Friedrich Rippmann, Sepp Hochreiter, and Thomas Unterthiner.
Interpretable deep learning in drug discovery, 2019.

7
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should i trust you?": Explaining the
predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2016.
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh,
and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localiza-
tion. In 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks:
Visualising image classification models and saliency maps, 2014.
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad:
removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
Matthew Sotoudeh and Aditya V Thakur. Computing linear restrictions of neural networks. In
H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,
Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for
simplicity: The all convolutional net, 2015.
Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. In
Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine
Learning Research, pages 9269–9278, 2020.
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In
International Conference on Machine Learning. PMLR, 2017.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru
Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In 2015
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
Yuxin Wu and Kaiming He. Group normalization. In Proceedings of the European Conference on
Computer Vision (ECCV), 2018.
S. Xu, S. Venugopalan, and M. Sundararajan. Attribution in scale and space. In 2020 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2020.
Ruo Yang, Binghui Wang, and Mustafa Bilgic. Idgi: A framework to eliminate explanation noise
from integrated gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), 2023.
Luisa M Zintgraf, Taco S Cohen, Tameem Adel, and Max Welling. Visualizing deep neural network
decisions: Prediction difference analysis, 2017.

Erik Štrumbelj and Igor Kononenko. Explaining prediction models and individual predictions with
feature contributions. Knowledge and Information Systems, 2013.

Carraro 20.19
100% (1)
Carraro 20.19
10 pages
Pipe Thickness Calculation For Internal Pressure
No ratings yet
Pipe Thickness Calculation For Internal Pressure
12 pages
Ig Symmetry Preserving
No ratings yet
Ig Symmetry Preserving
7 pages
Applying Statistical Learning Theory To Deep Learning
No ratings yet
Applying Statistical Learning Theory To Deep Learning
51 pages
Index
No ratings yet
Index
127 pages
Notes On Gans, Energy-Based Models, and Saddle Points
No ratings yet
Notes On Gans, Energy-Based Models, and Saddle Points
10 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
Convex Optimizatiom IP
No ratings yet
Convex Optimizatiom IP
97 pages
(00000) - 2018-Weinan - (CommMathStat) - The Deep Ritz Method A Deep Learning-Based
No ratings yet
(00000) - 2018-Weinan - (CommMathStat) - The Deep Ritz Method A Deep Learning-Based
12 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
Approximate Inference Via Variational Sampling
No ratings yet
Approximate Inference Via Variational Sampling
13 pages
MFMLHandout
No ratings yet
MFMLHandout
7 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Solving Parabolic Periodic P-Laplacian by Deep Learning
No ratings yet
Solving Parabolic Periodic P-Laplacian by Deep Learning
15 pages
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
No ratings yet
MAT-52506 Inverse Problems: Samuli Siltanen February 20, 2009
58 pages
Understanding Deep Convolutional Networks
No ratings yet
Understanding Deep Convolutional Networks
17 pages
Optimizing (Variational) Physics-Informed Neural Networks Using Least Squares
No ratings yet
Optimizing (Variational) Physics-Informed Neural Networks Using Least Squares
15 pages
1 - Lipschitz Layers Compared
No ratings yet
1 - Lipschitz Layers Compared
24 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
7 Optimization2 Stochastic Gradient
No ratings yet
7 Optimization2 Stochastic Gradient
114 pages
Computational Inverse Problems
100% (1)
Computational Inverse Problems
67 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
FISTA
No ratings yet
FISTA
20 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Mathematical Foundations of Deep Learning
No ratings yet
Mathematical Foundations of Deep Learning
174 pages
F - P N N S L: Unction Space Arameterization of Eural Etworks For Equential Earning
No ratings yet
F - P N N S L: Unction Space Arameterization of Eural Etworks For Equential Earning
29 pages
FL LectureNotes
No ratings yet
FL LectureNotes
92 pages
A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization
No ratings yet
A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization
34 pages
A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
No ratings yet
A Generic Proximal Algorithm For Convex Optimization - Application To Total Variation Minimization
5 pages
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
No ratings yet
A Unified Convergence Analysis of Block Successive Minimization Methods For Nonsmooth Optimization
34 pages
IPAM Splines
No ratings yet
IPAM Splines
48 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Predictive Machines With Uncertainty Quantification (2022)
No ratings yet
Predictive Machines With Uncertainty Quantification (2022)
18 pages
Universal Approximations of Invariant Maps by Neural Networks
No ratings yet
Universal Approximations of Invariant Maps by Neural Networks
64 pages
Amath731 Intro
No ratings yet
Amath731 Intro
7 pages
Lec 7 Optimization Part 2
No ratings yet
Lec 7 Optimization Part 2
139 pages
LN - Optimization For ML
No ratings yet
LN - Optimization For ML
129 pages
Mathematical Introduction To Deep Learning: Methods, Implementations, and Theory
No ratings yet
Mathematical Introduction To Deep Learning: Methods, Implementations, and Theory
601 pages
Integral Transformation For Box-Constrained Global Optimization of Decomposable Functions
No ratings yet
Integral Transformation For Box-Constrained Global Optimization of Decomposable Functions
30 pages
Iterative Linear
No ratings yet
Iterative Linear
10 pages
Hagemann 2021 Inverse Problems 37 085002
No ratings yet
Hagemann 2021 Inverse Problems 37 085002
24 pages
Learning Multidimensional Fourier Series With Tensor Trains
No ratings yet
Learning Multidimensional Fourier Series With Tensor Trains
6 pages
A Survey of Gaussian Convolution Algorithms: Pascal Getreuer
No ratings yet
A Survey of Gaussian Convolution Algorithms: Pascal Getreuer
25 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
Curs Tehnici de Optimizare
No ratings yet
Curs Tehnici de Optimizare
141 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
67 pages
Montanari
No ratings yet
Montanari
10 pages
Notes On Deep Learning Theory
No ratings yet
Notes On Deep Learning Theory
68 pages
Lecture Notes For Machine Learning Theory
No ratings yet
Lecture Notes For Machine Learning Theory
167 pages
IBVPs For Inhomogeneous Systems of Balance Laws
No ratings yet
IBVPs For Inhomogeneous Systems of Balance Laws
32 pages
Coeurdoux etal23-PnPGibbs
No ratings yet
Coeurdoux etal23-PnPGibbs
15 pages
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
No ratings yet
8 Adagrad, RMSprop, Adam 04 Sep 2020material I 04 Sep 2020 Module4 Optimization
50 pages
Mathematical Theory of Deep
No ratings yet
Mathematical Theory of Deep
275 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
Lecture 13 - Kernels
No ratings yet
Lecture 13 - Kernels
5 pages
Weighted Ensemble 2003.02316v3
No ratings yet
Weighted Ensemble 2003.02316v3
41 pages
On Deep Learning For Inverse Problems: Jaweria Amjad Jure Sokoli C Miguel R.D. Rodrigues
No ratings yet
On Deep Learning For Inverse Problems: Jaweria Amjad Jure Sokoli C Miguel R.D. Rodrigues
5 pages
DLbook
No ratings yet
DLbook
165 pages
Lecture Notes On Iterative Optimization Algorithms
No ratings yet
Lecture Notes On Iterative Optimization Algorithms
102 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
GeM Bidding 5879144
No ratings yet
GeM Bidding 5879144
5 pages
MCAD
No ratings yet
MCAD
24 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Gmail - FWD - Uniosun Webpay Transaction Details (Ref - Osu - 24bp29522 - lnb6g7)
No ratings yet
Gmail - FWD - Uniosun Webpay Transaction Details (Ref - Osu - 24bp29522 - lnb6g7)
2 pages
Sandy & Tristan v3 - (Points A - H)
No ratings yet
Sandy & Tristan v3 - (Points A - H)
11 pages
ADBMS Lab Outline
No ratings yet
ADBMS Lab Outline
3 pages
V V V V: Scrapers
No ratings yet
V V V V: Scrapers
38 pages
Product Data Sheet Metco 5MPE Series Powder Feeders
No ratings yet
Product Data Sheet Metco 5MPE Series Powder Feeders
4 pages
GC 2025 01 26
No ratings yet
GC 2025 01 26
2 pages
Univan Ship Management LTD.: Chennai Office
No ratings yet
Univan Ship Management LTD.: Chennai Office
17 pages
Introduction To Web Technologies
No ratings yet
Introduction To Web Technologies
8 pages
Programming in C - CS8251 2017 Regulation - Semester Question Paper 2019 Nov Dec
No ratings yet
Programming in C - CS8251 2017 Regulation - Semester Question Paper 2019 Nov Dec
5 pages
Matrices and Determinants - JEE Mains PYQ 2023 Session 2
No ratings yet
Matrices and Determinants - JEE Mains PYQ 2023 Session 2
63 pages
Notification Styler
No ratings yet
Notification Styler
2 pages
P2 Salvador ST., Sta. Rita Karsada Batangas City Contact Number: 09108430187/09217794192 Email Address
No ratings yet
P2 Salvador ST., Sta. Rita Karsada Batangas City Contact Number: 09108430187/09217794192 Email Address
2 pages
Grainger - Task 1
No ratings yet
Grainger - Task 1
5 pages
Das 350
No ratings yet
Das 350
6 pages
Top 41 SAP Security Interview Questions and Answers
No ratings yet
Top 41 SAP Security Interview Questions and Answers
6 pages
Scan
No ratings yet
Scan
75 pages
Minchenkov 2022
No ratings yet
Minchenkov 2022
6 pages
Flange Dim EN1092-1
No ratings yet
Flange Dim EN1092-1
18 pages
Draft Thesis Proposal
No ratings yet
Draft Thesis Proposal
10 pages
Bredel Pumps
No ratings yet
Bredel Pumps
80 pages
Amazon Products Data Entry Task Clarification - 17 Jan 2022
No ratings yet
Amazon Products Data Entry Task Clarification - 17 Jan 2022
3 pages
Function Key
100% (1)
Function Key
3 pages
Solution 6000 Matrix V2.53.28
No ratings yet
Solution 6000 Matrix V2.53.28
2 pages
Iphone Thesis Statement
100% (2)
Iphone Thesis Statement
6 pages
IADC Rig Equipment List DDD - Rev - 2014.12.09
No ratings yet
IADC Rig Equipment List DDD - Rev - 2014.12.09
95 pages

REIMANNOPT

Uploaded by

REIMANNOPT

Uploaded by

Riemann Sum Optimization for Accurate Integrated

Swadesh Swain Shree Singhi

Interpretable AI: Past, Present and Future Workshop at NeurIPS 2024

2.1 Integrated Gradients

The final BlurIG computation is as follows:

2.3 Guided Integrated Gradients

3.1 Error Minimization of Riemann Sums in 1D

We now present the derivation to estimate

The integral can be broken down as:

By applying the Taylor Series approximation around αi in (5):

• Ik×d : Samples evaluated at k equispaced points along the path.

Algorithm 1 Estimation of Optimal Alphas

4 Experimental Setup and Metrics

4.1 Experimental Setup

5 Results and Discussion

Insertion Score (↑) Normalized Insertion Score (↑)

Table 1: Relative Error (↓) across different methods

Method 16 Samples 32 Samples 64 Samples 128 Samples

You might also like