Skip to main content

Showing 1–26 of 26 results for author: Aitchison, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.06525  [pdf, other

    stat.ML cs.LG

    Flexible infinite-width graph convolutional networks and the importance of representation learning

    Authors: Ben Anson, Edward Milsom, Laurence Aitchison

    Abstract: A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the NNGP kernel is fixed, and tunable only through a small number of hyperparameters, eliminating any possibility of representation learning. This contrasts with fi… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  2. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 6 August, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  3. arXiv:2310.17374  [pdf, other

    stat.CO math.ST

    Using Autodiff to Estimate Posterior Moments, Marginals and Samples

    Authors: Sam Bowyer, Thomas Heap, Laurence Aitchison

    Abstract: Importance sampling is a popular technique in Bayesian inference: by reweighting samples drawn from a proposal distribution we are able to obtain samples and moment estimates from a Bayesian posterior over latent variables. Recent work, however, indicates that importance sampling scales poorly -- in order to accurately approximate the true posterior, the required number of importance samples grows… ▽ More

    Submitted 18 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  4. arXiv:2309.09814  [pdf, ps, other

    stat.ML cs.LG

    Convolutional Deep Kernel Machines

    Authors: Edward Milsom, Ben Anson, Laurence Aitchison

    Abstract: Standard infinite-width limits of neural networks sacrifice the ability for intermediate layers to learn representations from data. Recent work (A theory of representation learning gives a deep generalisation of kernel methods, Yang et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained. Furthermore, they found… ▽ More

    Submitted 26 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 Camera Ready Version

  5. arXiv:2305.14454  [pdf, other

    stat.ML cs.LG

    An Improved Variational Approximate Posterior for the Deep Wishart Process

    Authors: Sebastian Ober, Ben Anson, Edward Milsom, Laurence Aitchison

    Abstract: Deep kernel processes are a recently introduced class of deep Bayesian models that have the flexibility of neural networks, but work entirely with Gram matrices. They operate by alternately sampling a Gram matrix from a distribution over positive semi-definite matrices, and applying a deterministic transformation. When the distribution is chosen to be Wishart, the model is called a deep Wishart pr… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  6. arXiv:2305.11022  [pdf, other

    cs.LG cs.NE stat.ML

    Massively Parallel Reweighted Wake-Sleep

    Authors: Thomas Heap, Gavin Leech, Laurence Aitchison

    Abstract: Reweighted wake-sleep (RWS) is a machine learning method for performing Bayesian inference in a very general class of models. RWS draws $K$ samples from an underlying approximate posterior, then uses importance weighting to provide a better estimate of the true posterior. RWS then updates its approximate posterior towards the importance-weighted estimate of the true posterior. However, recent work… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  7. arXiv:2302.04081  [pdf, other

    stat.ML cs.LG

    Decision trees compensate for model misspecification

    Authors: Hugh Panton, Gavin Leech, Laurence Aitchison

    Abstract: The best-performing models in ML are not interpretable. If we can explain why they outperform, we may be able to replicate these mechanisms and obtain both interpretability and performance. One example are decision trees and their descendent gradient boosting machines (GBMs). These perform well in the presence of complex interactions, with tree depth governing the order of interactions. However, i… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  8. arXiv:2108.13097  [pdf, other

    stat.ML cs.LG

    A theory of representation learning gives a deep generalisation of kernel methods

    Authors: Adam X. Yang, Maxime Robeyns, Edward Milsom, Ben Anson, Nandi Schoots, Laurence Aitchison

    Abstract: The successes of modern deep machine learning methods are founded on their ability to transform inputs across multiple layers to build good high-level representations. It is therefore critical to understand this process of representation learning. However, standard theoretical approaches (formally NNGPs) involving infinite width limits eliminate representation learning. We therefore develop a new… ▽ More

    Submitted 25 May, 2023; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Published in ICML 2023

  9. arXiv:2107.10125  [pdf, other

    stat.ML cs.LG

    A variational approximate posterior for the deep Wishart process

    Authors: Sebastian W. Ober, Laurence Aitchison

    Abstract: Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particula… ▽ More

    Submitted 3 December, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). 23 pages

  10. arXiv:2107.02495  [pdf, other

    stat.ML cs.LG

    InfoNCE is variational inference in a recognition parameterised model

    Authors: Laurence Aitchison, Stoil Ganev

    Abstract: Here, we show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model, the recognition parameterised model (RPM). When we learn the optimal prior, the RPM ELBO becomes equal to the mutual information (MI; up to a constant), establishing a connection to pre-existing self-supervised learning methods such as InfoNCE. However, practical InfoNCE methods do… ▽ More

    Submitted 10 August, 2023; v1 submitted 6 July, 2021; originally announced July 2021.

  11. arXiv:2106.05586  [pdf, other

    stat.ML cs.LG

    Data augmentation in Bayesian neural networks and the cold posterior effect

    Authors: Seth Nabarro, Stoil Ganev, Adrià Garriga-Alonso, Vincent Fortuin, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks that incorporate data augmentation implicitly use a ``randomly perturbed log-likelihood [which] does not have a clean interpretation as a valid likelihood function'' (Izmailov et al. 2021). Here, we provide several approaches to developing principled Bayesian neural networks incorporating data augmentation. We introduce a ``finite orbit'' setting which allows likelihoods t… ▽ More

    Submitted 9 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  12. BNNpriors: A library for Bayesian neural network inference with different prior distributions

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks have shown great promise in many applications where calibrated uncertainty estimates are crucial and can often also lead to a higher predictive performance. However, it remains challenging to choose a good prior distribution over their weights. While isotropic Gaussian priors are often chosen in practice due to their simplicity, they do not reflect our true prior beliefs w… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at Software Impacts

  13. arXiv:2103.00222   

    stat.ML cs.LG

    Variational Laplace for Bayesian neural networks

    Authors: Ali Unlu, Laurence Aitchison

    Abstract: We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. The Variational Laplace objective is simple to evaluate, as it is (in essence) the log-likelihood, plus weight-decay, plus a squared-gradient regularizer. Variational L… ▽ More

    Submitted 20 July, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: Accidental resubmission of new version of arXiv:2011.10443

  14. arXiv:2102.12959  [pdf, other

    stat.ML cs.LG

    Bayesian OOD detection with aleatoric uncertainty and outlier exposure

    Authors: Xi Wang, Laurence Aitchison

    Abstract: Typical Bayesian approaches to OOD detection use epistemic uncertainty. Surprisingly from the Bayesian perspective, there are a number of methods that successfully use aleatoric uncertainty to detect OOD points (e.g. Hendryks et al. 2018). In addition, it is difficult to use outlier exposure to improve a Bayesian OOD detection model, as it is not clear whether it is possible or desirable to increa… ▽ More

    Submitted 28 October, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

  15. arXiv:2102.06571  [pdf, other

    stat.ML cs.LG

    Bayesian Neural Network Priors Revisited

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Sebastian W. Ober, Florian Wenzel, Gunnar Rätsch, Richard E. Turner, Mark van der Wilk, Laurence Aitchison

    Abstract: Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, it is unclear whether these priors accurately reflect our true beliefs about the weight distributions or give optimal performance. To find better priors, we study summary statistics of neural network weights in networks trained using stochastic gradient descent (SGD). We find that convolution… ▽ More

    Submitted 16 March, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted at ICLR 2022

  16. arXiv:2011.10443  [pdf, other

    stat.ML cs.LG

    Variational Laplace for Bayesian neural networks

    Authors: Ali Unlu, Laurence Aitchison

    Abstract: We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. The Variational Laplace objective is simple to evaluate, as it is (in essence) the log-likelihood, plus weight-decay, plus a squared-gradient regularizer. Variational L… ▽ More

    Submitted 10 August, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  17. arXiv:2010.01590  [pdf, other

    stat.ML cs.LG

    Deep kernel processes

    Authors: Laurence Aitchison, Adam X. Yang, Sebastian W. Ober

    Abstract: We define deep kernel processes in which positive definite Gram matrices are progressively transformed by nonlinear kernel functions and by sampling from (inverse) Wishart distributions. Remarkably, we find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes. For DGPs the equivalence ari… ▽ More

    Submitted 30 May, 2021; v1 submitted 4 October, 2020; originally announced October 2020.

    Comments: 21 pages

  18. arXiv:2009.11677  [pdf, other

    cs.LG cs.CY stat.ML

    Legally grounded fairness objectives

    Authors: Dylan Holden-Sim, Gavin Leech, Laurence Aitchison

    Abstract: Recent work has identified a number of formally incompatible operational measures for the unfairness of a machine learning (ML) system. As these measures all capture intuitively desirable aspects of a fair system, choosing "the one true" measure is not possible, and instead a reasonable approach is to minimize a weighted combination of measures. However, this simply raises the question of how to c… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

  19. arXiv:2008.05913  [pdf, other

    stat.ML cs.LG

    Semi-supervised learning objectives as log-likelihoods in a generative model of data curation

    Authors: Stoil Ganev, Laurence Aitchison

    Abstract: We currently do not have an understanding of semi-supervised learning (SSL) objectives such as pseudo-labelling and entropy minimization as log-likelihoods, which precludes the development of e.g. Bayesian SSL. Here, we note that benchmark image datasets such as CIFAR-10 are carefully curated, and we formulate SSL objectives as a log-likelihood in a generative model of data curation that was initi… ▽ More

    Submitted 8 October, 2021; v1 submitted 13 August, 2020; originally announced August 2020.

  20. arXiv:2008.05912  [pdf, other

    stat.ML cs.LG

    A statistical theory of cold posteriors in deep neural networks

    Authors: Laurence Aitchison

    Abstract: To get Bayesian neural networks to perform comparably to standard neural networks it is usually necessary to artificially reduce uncertainty using a "tempered" or "cold" posterior. This is extremely concerning: if the prior is accurate, Bayes inference/decision theory is optimal, and any artificial changes to the posterior should harm performance. While this suggests that the prior may be at fault… ▽ More

    Submitted 27 April, 2021; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: Published at ICLR 2021 (https://openreview.net/forum?id=Rd138pWXMvG)

  21. arXiv:2005.08140  [pdf, other

    stat.ML cs.LG

    Global inducing point variational posteriors for Bayesian neural networks and deep Gaussian processes

    Authors: Sebastian W. Ober, Laurence Aitchison

    Abstract: We consider the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the tw… ▽ More

    Submitted 22 June, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: Accepted for publication at the 38th International Conference on Machine Learning (ICML 2021, PMLR 139), 33 pages

  22. arXiv:1910.08013  [pdf, other

    stat.ML cs.LG

    Why bigger is not always better: on finite and infinite neural networks

    Authors: Laurence Aitchison

    Abstract: Recent work has argued that neural networks can be understood theoretically by taking the number of channels to infinity, at which point the outputs become Gaussian process (GP) distributed. However, we note that infinite Bayesian neural networks lack a key facet of the behaviour of real neural networks: the fixed kernel, determined only by network hyperparameters, implies that they cannot do any… ▽ More

    Submitted 24 June, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

    Journal ref: ICML 2020

  23. arXiv:1808.05587  [pdf, other

    stat.ML cs.LG

    Deep Convolutional Networks as shallow Gaussian Processes

    Authors: Adrià Garriga-Alonso, Carl Edward Rasmussen, Laurence Aitchison

    Abstract: We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the o… ▽ More

    Submitted 4 May, 2019; v1 submitted 16 August, 2018; originally announced August 2018.

  24. arXiv:1807.07540  [pdf, other

    stat.ML cs.LG

    Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods

    Authors: Laurence Aitchison

    Abstract: We formulate the problem of neural network optimization as Bayesian filtering, where the observations are the backpropagated gradients. While neural network optimization has previously been studied using natural gradient methods which are closely related to Bayesian inference, they were unable to recover standard optimizers such as Adam and RMSprop with a root-mean-square gradient normalizer, inst… ▽ More

    Submitted 16 April, 2020; v1 submitted 19 July, 2018; originally announced July 2018.

  25. arXiv:1806.08593  [pdf, other

    stat.ML cs.LG

    Tensor Monte Carlo: particle methods for the GPU era

    Authors: Laurence Aitchison

    Abstract: Multi-sample, importance-weighted variational autoencoders (IWAE) give tighter bounds and more accurate uncertainty estimates than variational autoencoders (VAE) trained with a standard single-sample objective. However, IWAEs scale poorly: as the latent dimensionality grows, they require exponentially many samples to retain the benefits of importance weighting. While sequential Monte-Carlo (SMC) c… ▽ More

    Submitted 17 January, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

  26. arXiv:1805.10958  [pdf, other

    stat.ML cs.LG q-bio.NC

    Discrete flow posteriors for variational inference in discrete dynamical systems

    Authors: Laurence Aitchison, Vincent Adam, Srinivas C. Turaga

    Abstract: Each training step for a variational autoencoder (VAE) requires us to sample from the approximate posterior, so we usually choose simple (e.g. factorised) approximate posteriors in which sampling is an efficient computation that fully exploits GPU parallelism. However, such simple approximate posteriors are often insufficient, as they eliminate statistical dependencies in the posterior. While it i… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.