0% found this document useful (0 votes)
21 views16 pages

Deep Learning and Genetic Algorithms For Cosmological Bayesian Inference Speed-Up

Uploaded by

walidbenali31000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

Deep Learning and Genetic Algorithms For Cosmological Bayesian Inference Speed-Up

Uploaded by

walidbenali31000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Deep Learning and genetic algorithms for cosmological Bayesian inference speed-up

Isidro Gómez-Vargas1, 2, a and J. Alberto Vázquez1, b


1
Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, 62210, Cuernavaca, Morelos, México.
2
Department of Astronomy, University of Geneva, Versoix, 1290, Switzerland.
(Dated: October 17, 2024)
In this paper, we present a novel approach to accelerate the Bayesian inference process, focusing
specifically on the nested sampling algorithms. Bayesian inference plays a crucial role in cosmological
parameter estimation, providing a robust framework for extracting theoretical insights from observa-
tional data. However, its computational demands can be substantial, primarily due to the need for
numerous likelihood function evaluations. Our method utilizes the power of deep learning, employing
feedforward neural networks to approximate the likelihood function dynamically during the Bayesian
inference process. Unlike traditional approaches, our method trains neural networks on-the-fly using
arXiv:2405.03293v2 [astro-ph.IM] 15 Oct 2024

the current set of live points as training data, without the need for pre-training. This flexibility
enables adaptation to various theoretical models and datasets. We perform the hyperparameter
optimization using genetic algorithms to suggest initial neural network architectures for learning each
likelihood function. Once sufficient accuracy is achieved, the neural network replaces the original
likelihood function. The implementation integrates with nested sampling algorithms and has been
thoroughly evaluated using both simple cosmological dark energy models and diverse observational
datasets. Additionally, we explore the potential of genetic algorithms for generating initial live
points within nested sampling inference, opening up new avenues for enhancing the efficiency and
effectiveness of Bayesian inference methods.

PACS numbers:

I. INTRODUCTION chine learning tools to accelerate the Bayesian inference


process [10, 13–16].
Bayesian inference is a powerful tool in several scien- The use of artificial neural networks (ANNs) to ap-
tific fields where it is essential to constrain mathematical proximate the likelihood function can greatly improve the
models using experimental data. It allows parameter es- efficiency of Bayesian inference [14, 16–21]. However, it is
timation and model comparison. In particular, it is the necessary to have a careful consideration of the trade-off
data analysis technique per excellence in observational cos- between accuracy and speed, along with quality moni-
mology, as it provides a robust method to obtain valuable toring of the resulting posterior samples. In addition,
statistical information from a theoretical model given a set neural networks present several drawbacks that must be
of observational data. However, a significant disadvantage taken into account to effectively aid in the performance
of Bayesian inference lies in its high computational cost; it of Bayesian inference:
requires a considerable number of likelihood function eval- 1. ANNs excel at interpolation, but not at ex-
uations to generate sufficient samples from the posterior trapolation. Like all machine learning algorithms,
distribution. For example, a small Bayesian inference task ANNs generate models based on datasets, allowing
could involve thousands of samples and require thousands, them to learn data structures and predict unseen
or even millions, of likelihood evaluations. data within the bounds of the training region. In
Given the crucial importance of parameter estimation the Bayesian inference domain, new samples try
in the context of astronomical surveys, within the fields to find better likelihood values, which could corre-
of cosmology and astrophysics, numerous valuable efforts spond to points outside the ranges of the random
have been made to address the computational challenge sample used for the ANN training.
of mitigating the complexity of the likelihood function cal-
culation to speed up Bayesian inference. Some strategies 2. The performance of ANNs depends on their
provide an approximation of Bayesian inference by avoid- hyperparameters. This is perhaps one of the
ing the computation of the full likelihood function, as most challenging issues facing neural networks. If
suggested by [1–3]. On the other hand, some efforts try to the hyperparameters are not chosen carefully, the
speed up the inference with different statistical techniques neural network models can be under- or over-fitted.
[4–8]. Alternatively, other works [9–12] introduced the 3. The selection of hyperparameters depends on
concept of generating synthetic likelihood distributions. the data. There is no unique architecture for an
Furthermore, there is an emerging trend of exploiting ma- ANN. Each dataset requires certain hyperparameter
configurations to have an efficient training of the
neural network.
a Electronic address: [email protected] 4. Training an ANN requires computational re-
b Electronic address: [email protected] sources. It is a well-known fact that training a
2

neural network can be computationally demanding, A. Bayesian inference


which seems contradictory when the goal is to re-
duce the computational time in a Bayesian inference Considering the Bayes’ theorem as follows:
process.
P (D|θ)P (θ)
P (θ|D) = , (1)
We will come back to these issues in Section IV by P (D)
presenting how each of them is addressed by the method
we propose. where P (θ) denotes the prior distribution over parame-
Previous works using neural networks in cosmological ters θ, encapsulating any prior knowledge about them
parameter estimation save an amazing amount of before observing the data. P (D|θ) represents the likeli-
computational time training neural networks before the hood function, expressing the conditional probability of
Bayesian inference process [14, 16, 22]; however, the observing the data given the model. Finally, the Bayesian
pre-training time in these cases is expensive and the evidence P (D) serves as a normalization constant through
trained neural networks are only useful for a specific likelihood marginalization:
configuration of backgrounds, models, and data sets. For Z N
this reason, our work is inspired by BAMBI [18, 19], and P (D) = P (D|θ)P (θ)dθ, (2)
pyBambi [23], where their neural networks are trained θ
in real-time to learn the likelihood function, which is where N is the number of dimensions of the parameter
subsequently replaced within a nested sampling process. space for θ.
The strength of this approach lies in its ability to train the It can be assumed that the measurement error ϵ is
neural network in real-time and accelerate the Bayesian independent of θ and has a Probability Density Function
inference process without restricting a cosmological or (PDF) Pϵ . In this case, the predicted value and the
theoretical model and specific datasets. In our method, measurement error share the same distribution, therefore
we explore features beyond those of our predecessors, the likelihood function can be expressed as:
such as parallelism, PyTorch implementation [24], and
hyperparameter tuning. In addition, we exclusively P (D|θ) = Pϵ (D − f (x; θ)), (3)
used live points for training to reduce the dispersion of
the training dataset and to obtain results with higher and if the error ϵ ∼ N (0, C) has a normal distribution
accuracy. A criterion was also chosen to initiate our centered in zero and a covariance matrix C, then we have
method that serves as a regulator of the trade-off between the following:
accuracy and speed. We also implemented an on-the-fly 1 T −1
performance evaluation to accept or reject the neural P (D|θ) = e−0.5(D−f (x;θ)) C (D−f (x;θ)) .
(2π)N/2 |C|1/2
network predictions. In addition, we have conducted a (4)
preliminary investigation on the use of genetic algorithms
to generate the initial sample of live points on the nested
sampling process. B. Nested sampling

The structure of the paper is as follows: Section II Nested sampling (NS) belongs to a category of infer-
offers an overview of Bayesian inference and nested sam- ence methods that estimate the Bayesian evidence along
pling. Section III provides a concise exposition of the with its uncertainty by sampling the posterior probability
machine learning fundamentals employed in this study. density function. It was proposed by John Skilling in
The concept and development of our machine learning 2004 [25, 26]. The evidence, or marginalization of the
strategies are detailed in Section IV. Section V and Sec- likelihood function, is a key quantity in model comparison,
tion VI present our results, applied respectively to testing through the Bayes factor in the Jeffreys’ scale. It is a
toy models and estimating cosmological parameters. In more rigorous technique [26, 27] than other widely used
Section VII, we discuss our research findings and present methods such as the information criteria approximations
our final reflections. Furthermore, the Appendix features [28, 29]. NS works by computing the Bayesian evidence
preliminary results about the incorporation of genetic while assuming that the parameter space (prior volume or
algorithms as initiators of the live points in a nested prior mass) shrinks by a certain factor. There are success-
sampling execution. ful Nested Sampling implementations [30–32] and several
applications in cosmology [33–37], astrophysics [38–40],
gravitational waves analysis [41–43], biology [44, 45] and
II. STATISTICAL BACKGROUND in other scientific fields [46–48].
To understand the method proposed in this work, we
In this section, we describe an overview of Bayesian briefly describe some considerations about the NS algo-
inference and neural networks. In particular, we focus rithm. For more details, we recommend Refs. [26, 31, 32].
on the nested sampling algorithm and feedforward neural First of all, the Bayesian evidence can be written as fol-
networks. lows:
3

III. MACHINE LEARNING BACKGROUND


Z
Z= L(θ)π(θ)dθ, (5) Machine learning is the field of Artificial Intelligence
concerning to the mathematical modeling of datasets. Its
where θ represents the free parameters, π(θ) is the prior methods identify inherent properties of datasets by min-
density, and L is the likelihood function. imizing a target function until it reaches a satisfactory
The basic idea of NS is to simplify the integration of value. Over the past few years, Artificial Neural Net-
Bayesian evidence by mapping the parameter space in works (ANNs) have emerged as the most successful type
a unit hypercube. The fraction of the prior contained of machine learning models, giving rise to the field of deep
within an iso-likelihood contour Lc in the unit hypercube learning. On the other hand, genetic algorithms are a spe-
is called prior volume (or prior mass): cial class of evolutionary algorithms, called metaheuristics,
facilitating function optimization without derivatives.
Z This section offers a succinct overview of artificial neural
X(L) = π(θ)dθ. (6) networks and genetic algorithms.
L(θ)>Lc

The Bayesian evidence can be reduced as a one- A. Artificial neural networks


dimensional integral of the Likelihood as a function of the
prior volume X:
An artificial neural network (ANN) is a computational
model inspired by biological synapses, aiming to repli-
1
cate their behavior. It consists of interconnected layers
Z
Z= L(X)dX. (7) of nodes, or neurons, serving as basic processing units.
0
A fundamental type of ANN is the feedforward neural
NS starts with a specific number nlive of random points, network, comprising input, hidden, and output layers. In
termed live points, distributed within the prior volume such networks, connections between neurons, known as
defined by the constrained prior. These samples are weights, are parameters of the model. Deep learning, a
ordered based on their likelihood values. During each subset of machine learning, focuses exclusively on neural
iteration the worst point Lworst , with the lowest likelihood networks.
value, is removed. A new sample is then generated within The intrinsic parameters of a neural network, known as
a contour bounded by Lworst and with a likelihood, L(θ) > hyperparameters, are set before training, and include pa-
Lworst . Equation 7 can be simplified as a Riemann sum: rameters such as the number of layers and neurons, epochs,
N
and activation functions. Parameters of gradient descent
X and backpropagation algorithms [49], like batch size and
Z≈ Li ωi , (8)
learning rate, may also be hyperparameters. While some
i=1
hyperparameters are predetermined, others are adjusted
where ωi is the difference between the prior volume of through tuning strategies.
two consecutive points: ωi = Xi−1 − Xi . Throughout ANNs are valued for their capacity to model large and
the process, NS retains the population of nlive live points complex datasets. The Universal Approximation Theo-
and ultimately consolidates the final set of live points rem asserts that an ANN with a single hidden layer and
within a region of high probability. Depending on the non-linear activation functions can model any nonlinear
sampling approach employed from the constrained prior, function [50], enhancing its utility for datasets with com-
various nested sampling algorithms exist. For instance, plex relationships. Even though an exhaustive review of
MultiNest [30] utilizes rejection sampling within ellip- ANNs is beyond the scope of this paper, great references
soids, whereas Polychord [31] generates points using slice exist in the literature [51, 52]. For a basic introduction
sampling. to their algorithms in the cosmological context, we rec-
Several stopping criteria exist for terminating a nested ommend reading [53].
sampling run; in this study, we adopt the remaining
evidence criterion, which is roughly outlined as follows:
B. Genetic algorithms
∆Zi ≈ Lmax Xi , (9)
Genetic algorithms are optimization techniques inspired
hence defining the logarithmic ratio between the current
by genetic population principles, treating each potential
estimated evidence and the remaining evidence as:
solution to an optimization problem as an individual. Ini-
∆ ln Zi ≡ ln(Zi + ∆Zi ) − ln Zi , (10) tially, a genetic algorithm generates a population compris-
ing multiple individuals within the search space. Across
referred to as dlogz hereafter in this paper. Stopping at iterations or generations, the population evolves through
a value dlogz implies sampling until only a fraction of operations like offspring, crossover, and mutation, pro-
the evidence remains unaccounted for. gressively approaching the optimal solution of a target
4

function. Genetic algorithms excel in addressing large- the neural network, the learning rate decreases by
scale nonlinear and nonconvex optimization problems in half. However, during each individual ANN training
challenging search scenarios [54, 55]. session, the learning rate remains constant within
To apply genetic algorithms to a specific problem, one the adaptive gradient descent algorithm called Adam
must select the objective function to optimize, delineate [59].
the search space, and specify the genetic parameters such
as crossover, mutation, and elitism. Probability values • Hyperparameter tunning. We have imple-
for crossover and mutation operators are assigned, and a mented the option of using genetic algorithms to
selection operator determines which individuals advance find the architecture of the first trained neural net-
to the subsequent generation. Elitism, represented by a work. For this purpose, we use the library nnogada
positive integer value, dictates the number of individu- [60]. For simplicity in this work, we use genetic al-
als guaranteed passage to the next generation. Overall, gorithms over 3 generations with a population size
genetic algorithms initialize a population and iteratively of 5 to explore combinations of batch size (4 or 8),
modify individuals through the operators and the ob- number of layers (2 or 3), learning rate (0.0005 or
jective function, progressively approaching the optimal 0.001), and number of neurons per layer (50 or 100).
solution of the target function. In a nested sampling execution, where we can train
While this paper does not delve deeply into the math- the neural networks multiple times, we use these
ematical principles underlying genetic algorithms, inter- small configurations. This approach yields better
ested readers are directed to the following references results compared to not tuning hyperparameters
[56, 57], particularly for parameter estimation in cos- and is more effective than using a hyperparameter
mology [58]. grid [60].

We implemented our method inside of the code


IV. MACHINE LEARNING STRATEGIES SimpleMC [61, 62]1 , which uses the library dynesty [63]
for nested sampling algorithms. In all our neural network
training, we use the mean squared error (MSE) as the loss
In this section, we outline our proposed method, which
function. If early stopping, with a patience of 100 epochs,
integrates machine learning techniques to implement neu-
does not stop the training, we select the configuration of
ral networks and genetic algorithms within a nested sam-
weights that achieved the lowest MSE value.
pling framework. Below we describe some deep learning
techniques utilized in our training:, elucidating their ap-
plication:
A. neuralike method
• Data scaling. Since all samples within the parame-
ter space are already scaled between 0 and 1 during Neural networks are widely acclaimed for their
nested sampling, no additional scaling is required formidable capabilities in handling extensive datasets.
for training the neural networks. However, several studies have shown their effectiveness
in modeling small datasets as well; even demonstrating
• Early stopping. It is a regularization technique that neural models can accommodate a total number
that monitors the performance of a model on a val- of weights exceeding the number of sample data points
idation set during training and stops the training [64]. In addition, recent research has focused on novel ap-
process when the performance on the validation set proaches by using neural networks with smaller datasets
starts to degrade, indicating overfitting. It helps [65–67]. While it is true that models with a large number
to prevent overfitting and choose the best weight of parameters can be prone to overfitting, this risk can
configuration along the epochs of the training. By be mitigated through the use of regularization techniques
stopping the training process early, the generaliza- such as dropout and early stopping. In our approach,
tion performance of the model can be improved, these techniques, combined with genetic algorithms for
particularly when the training data is limited or optimizing the network’s architecture and hyperparam-
noisy. We implement early stopping with a patience eters, ensure that our models generalize well even when
of 100 epochs to guarantee a minimum number of the number of parameters exceeds the number of data
training epochs, given the smaller size of the dataset. points.
However, our primary focus is on preserving the In nested sampling, as discussed in the previous section,
best-performing weights at the end of the training there is a set of live points that maintain a constant
process. number of elements. At a certain point in its execution,
• Dynamic learning rate. There are popular strate-
gies for dynamic learning rates. However, our dy-
namic learning rate is only adjusted during the 1 The modified version of SimpleMC that includes our neuralike
nested sampling run and not during the training of method is available at https://github.com/igomezv/simplemc_
a specific neural network. For each new training of tests
5

a new sample is extracted within a prior iso-likelihood or are met, the live points are used to train the ANN. If the
mass surface. Our goal is for the neural network to predict ANN’s performance metrics meet the required threshold,
the likelihood of points within this prior volume. To do the analytical likelihood function is replaced by the ANN
this, we train the neural network with only the current set to save computational time. While this substitution does
of live points. These points, which typically are around not alter the fundamental nested sampling process, it can
hundreds or thousands, are sufficient to effectively train significantly enhance efficiency by reducing computational
a neural network and have several advantages: overhead.
• The relatively small dataset size implies that the
neural network training process is not computation- B. Using genetic algorithms
ally intensive.
• By excluding points outside the current prior vol- We proposed genetic algorithms, like in our nnogada
ume, we can potentially avoid inaccurate predictions library [60], as an optional method to find the hyperpa-
in regions where points would be rejected based on rameter of the neural network as part of the workflow
the original likelihood. The points that the neural of neuralike, as it can be noticed in the Algorithm 1.
network learns efficiently are those within the prior In large parameter estimation processes, it is useful, de-
volume, becasue they have a higher probability of spite the time required, to find the best neural network
acceptance according to the original likelihood. architecture.
On the other hand, we explored the first insight about
• The quantity of elements within the training set re- the generation of the initial live points of a nested sam-
mains constant. Whether the neural network starts pling process with genetic algorithms. It is analyzed in
its training at the beginning of sampling or at a later Appendix A. Although we have incorporated the use of
stage, the element count does not vary. As a result, genetic algorithms in our code, the primary focus of this
the majority of neural network hyperparameters paper is on our neuralike method (Section IV A). As
could stay consistent across different datasets. such, further analysis of genetic algorithms in this context
will be the subject of future research.
The likelihood function in cosmological parameter
estimation can be quite complex, often involving various
types of observational data and intricate numerical
V. TOY MODELS
operations, such as integrals, derivatives, or approxi-
mation methods for solving differential equations. To
address this complexity, the idea is to replace the As a first step in testing our method, we use some
analytical likelihood function with a trained neural toy models as log-likelihood functions. These toy models
network. This substitution reduces the problem to a only generate samplers within the Bayesian inference,
simple matrix multiplication, where the optimal weights, without parameter estimation. However, it is useful to
obtained during ANN training, are stored in a binary file. check the ability of the neural networks to learn, given a
Consequently, the evaluation of the likelihood becomes set of live points, the shape of these functions in runtime,
significantly faster. This acceleration is particularly and their respective values for the Bayesian evidence.
advantageous in a Bayesian inference process, where the We use the following toy models, with the mentioned
likelihood function may need to be evaluated thousands hyperparameters:
or even millions of times, making the reduction in
• A gaussian,
computational time highly beneficial. 2
f (x, y) = − 12 (x2 + y2 − xy). Learning rate 5 × 10−3 ,
Algorithm 1 provides an overview of our proposed 100 epochs, batch size as 1.
methodology within a nested sampling execution. Con- • Eggbox function,
cerning the neural network implementation, our primary x
f (x, y) = (2 + cos( 2.0 y
) cos( 2.0 ))5.0 . Learning rate
focus is on the segment within the for loop. Once a prede-
1 × 10 , 100 epochs, batch size as 1.
−4
termined number of samples have been reached, or when
the flag dlogz_start is activated, the ANN leverages • Himmelblau’s function,
the current live points for its training. The benefit of f (x, y) = (x2 + y − 11)2 + (x + y 2 − 7)2 . Learning
utilizing only the set of live points is twofold: firstly, it rate 1 × 10−4 , 100 epochs, batch size as 1.
facilitates swift training, and secondly, it ensures that
the ANN learns likelihood values strictly within the prior We have used some toy models as log-likelihood func-
volume. This area is precisely where new samples should tions: Gaussian, egg-box, and Himmelblau. In Table I,
be located. you can see the results of the Bayesian evidence calcu-
It is important to note that the nested sampling process, lation with and without our method for the three toy
including the selection of priors, typically uniform or models, while in Figure I, you can see the samples of
Gaussian distributions, remains consistent with standard the three functions, which at first glance are very similar.
practices. Once the criteria for initiating ANN training Based on these results, we can notice that for all these
6

using_neuralike = False
if livegenetic == True (optional) then
Define Pmut and Pcross
Generate a population P with Nind individuals
Evolve population through Ngen generations
else
Generate Nlive live points
for i in range(iteration) do
if (dlogz < dlogz_start) OR (nsamples >= nsamples_start) then
if i % N == 0 AND using_neuralike == False then
Use nlive points as training dataset
Optional: Use genetic algorithms with nnogada to choose the best architecture
Use the best architecture to model the likelihood
if loss function < valid_loss then
using_neuralike = True
L = ANNmodel
else
continue with NS
if min(saved_logl) - logl_tolerance < neuralike < max(saved_logl) + logl_tolerance then
continue else
like=logL;
using_neuralike = False
end
Algorithm 1: Nested sampling with neuralike. dlogz_start and nsamples_start are the two ways to start neuralike, with a
dlogz value (recommended) or given a specific number of generated samples. The logl_tolerance parameter represents the neural
network prediction tolerance required to be considered valid. saved_logl denotes the log-likelihoods of the current live points, and
valid_loss determines the criterion for accepting or rejecting a neural network training. Any loss function values higher than
valid_loss will be rejected. The variable logL represents the analytical log-likelihood function, while L can either be logL or
AN N model, depending on the successful neural model.

models, the speed of sampling using neural networks is denotes its present-day (z = 0) value. In this case, the
slower than in the case of nested sampling alone; this EoS for the dark energy is w(z) = −1.
is because the analytical functions are being evaluated A step further to the standard model is to consider the
directly without sampling from an unknown posterior dark energy being dynamic, where the evolution of its
distribution; nevertheless, these examples are very useful EoS is usually parameterized. A commonly used form
to verify the accuracy in calculating Bayesian evidence of w(z) is to take into account the next contribution of
and sampling from the distribution. We can observe that a Taylor expansion in terms of the scale factor w(a) =
both the log-Bayesian evidence and the graphs of the w0 + (1 − a)wa or in terms of redshift w(z) = w0 + 1+z
z
wa
nested sampling process without and with neural net- (CPL model [68, 69]). The parameters w0 and wa are
works are consistent; however, as Table I shows, for more real numbers such that at the present epoch w|z=0 = w0
complex functions, we need a lower value of dlogz_start, and dw/dz|z=0 = −wa ; we recover ΛCDM when w0 = −1
which means that we need to start learning the neural and wa = 0. Hence the Friedmann equation for the CPL
network at a later stage of nested sampling. Therefore, parameterization turns out to be:
a lower dlogz_start parameter is needed to be more
accurate but slower, and it is precisely this parameter H(z)2 = H02 [Ωm,0 (1 + z)3 +
that regulates the speed-accuracy trade-off. 3wa z (12)
(1 − Ωm,0 )(1 + z)3(1+w0 +wa ) e− 1+z ].

In this work, we use cosmological datasets from Type-


VI. COSMOLOGICAL PARAMETER
Ia Supernovae (SN), cosmic chronometers, growth rate
ESTIMATION
measurements, baryon acoustic oscillations (BAO), and
a point with Planck information. Following, we briefly
Assuming the geometric unit system where ℏ = c = describe them:
8πG = 1, the Friedmann equation that describes the late-
time dynamical evolution for a flat-ΛCDM model can be • Type-Ia Supernovae. We use the Pantheon SNeIa
written as: compilation, a dataset of 1048 Type Ia supernovae,
with a covariance matrix of systematic errors Csys ∈
H(z)2 = H02 Ωm,0 (1 + z)3 + (1 − Ωm,0 ) , (11)
 
R1048×1048 [70].

where H is the Hubble parameter and Ωm is the matter • Cosmic chronometers. Cosmic chronometers,
density parameter; subscript 0 attached to any quantity also known as Hubble distance (HD) measurements,
7

Model log Z log Z neuralike dlogz_start Valid loss Samples ANN samples
Gaussian −2.13 ± 0.05 −2.16 ± 0.05 50 0.05 6774 6773
Eggbox −235.83 ± 0.11 −235.82 ± 0.11 10 0.05 11794 3688
Himmelblau −5.59 ± 0.09 −5.64 ± 0.09 5 0.05 10253 4528

TABLE I: Comparing Bayesian evidence for toy models with nested sampling alone and using neuralike. The column dlogz_start
indicates the dlogz value marking the start of neural network training; higher values suggest earlier integration of neural networks into
Bayesian sampling. Valid loss represents the threshold value of the loss function required for accepting a neural network as valid. The last
two columns display the total number of samples generated through the nested sampling process and the subset produced by the trained
neural networks.

FIG. 1: Comparison of neural likelihoods versus original likelihoods using toy models. Using 1000 live points.

are galaxies that evolve slowly and allow direct tended version of the Gold-2017 compilation avail-
measurements of the Hubble parameter H(z). We able in [85], which includes 22 independent measure-
use a compilation with 31 data points collected ments of f σ 8 (z) with their statistical errors obtained
over several years within redshifts between 0.09 and from redshift space distortion measurements across
1.965. [71–78]. various surveys.
• BAO. We employ data from Baryon Acoustic Oscil- • Planck-15 information. We also consider a com-
lation measurements (BAO) with redshifts z < 2.36. pressed version of Planck-15 information, where the
They are from SDSS Main Galaxy Sample (MGS) Cosmic Microwave Background (CMB) is treated
[79], Six-Degree Field Galaxy Survey (6dFGS) [80], as a BAO experiment located at redshift z = 1090,
SDSS DR12 Galaxy Consensus [81], BOSS DR14 measuring the angular scale of the sound horizon.
quasars (eBOSS) [82], Ly-α DR14 cross-correlation For more details, see the Reference [62].
[83] and Ly-α DR14 auto-correlation [84], .
We executed three cases of parameter estimation to
• Growth rate measurements. We used an ex- verify the performance of our method. We start with
8

one thousand live points and a model with five free pa- network earlier within the nested sampling process (i.e. in
rameters; then, we increase the live points and free pa- a higher value for dlogz_start). Therefore, we increase
rameters, to test our method with higher dimensionality the number of live points to 4000 and dlogz_start = 20;
and with higher computational power demand (larger the outputs are included in Figure 3 showing an excellent
number of live points). The results are compared with a concordance for the Bayesian evidence values with our
nested sampling run with the same data sets and the same method, and speed-up around the 28.4%. Table II con-
configuration (live points, stopping criterion, etc.) but tains the results of the Bayesian evidence, and it can be
without ANN; this comparison aims to test the accuracy noticed that the uncertainty of this case is in better agree-
and speedup achieved by our neuralike method. For ment with nested sampling than the two scenarios of Case
this comparison, we report the parameter estimation and 1. In addition, we can analyze Table III and conclude
Bayesian evidence obtained with and without our method that, effectively, its performance has a similar quality to
and, in addition, we calculate the Wasserstein distances Case 1 with dlogz_start = 5; however because it uses a
[86] between the samples of the posterior nested sampling higher dlogz_start value, the percentage of saved time
without and with neuralike for each free parameter con- is notorious.
sidering their respective sampling weights.
In the results, a baseline neural network architecture
was employed, configured with the following hyperparam- C. Case 3
eters: 3 hidden layers, a batch size of 32, a learning rate
of 0.001 (utilizing the Adam gradient descent algorithm Lastly, we included more data: f σ8 measurements and
for optimization), 500 epochs, and an early stopping pa- a point with Planck-15 information. To have more free
tience of 200 epochs. In scenarios where multiple neural parameters, eight in total, we consider contributions of
networks were required, the learning rate was reduced the neutrino masses Σmν , growth rate σ8 , and curvature
following the previously mentioned approach. As for eval- Ωk . In this case, we also used 4000 live points. With these
uating the accuracy of the neural networks, we adopted new considerations, we aim to test our method in higher
a valid_loss threshold of 0.05 for their training, and a dimensions and to involve a more complex likelihood
logl_tolerance of 0.05 for their predictions. function that demands more computational power with
each evaluation. We made several tests, but we include the
corresponding to dlogz_start = 5, in which we obtain
A. Case 1 excellent results as can be noticed in Table II. Due to
the complexity of the likelihood, the full nested sampling
First, we perform the Bayesian inference for the CPL process had to train three different neural networks, which
model using SNeIa from Pantheon compilation, with cos- allowed the use of erroneous predictions during sampling
mic chronometers and BAO data. In this case, we consider to be avoided.
only five free parameters: Ωm , Ωb h2 , h, wa , and w0 . We We needed a lower value for the dlogz_start parame-
use 1000 live points. Figure 3 shows our results and we ter due to the complexity of the model (given by the new
can notice that when dlogz_start = 10 the saved time free parameters); however, the saved time around of 19%
is around 19% and when dlogz_start = 5 it is around concerning the nested sampling alone is remarkable and
only 6%. According to Table II both cases are in agree- the Wasserstein distance shown in Table III indicates that
ment with the logZ value for nested sampling alone. If the posterior distributions between the nested sampling
we check Table III we can notice that, in general, the with and without our method are similar, it also can be
samples of the dlogz_start = 5 are more similar to the noticed in the posterior plots of the Figure 4.
nested sampling posterior distributions; it also can be
appreciated in the posterior plots shown in the Figure
2. Although the case of dlogz_start = 5 saves less time VII. CONCLUSIONS
than the case of dlogz_start = 10, it gains in accuracy.
In this paper, we have introduced a novel method that
incorporates a neural network trained on-the-fly to learn
B. Case 2 the likelihood function within a nested sampling process.
The main objective is to avoid the time-consuming ana-
Secondly, we consider the same model, free parameters, lytical likelihood function, thus increasing computational
and datasets as in Case 1. The difference in this second efficiency. We present the dlogz_start parameter as a
case is to analyze the behavior of our method with a larger tool to handle the trade-off of accuracy and computa-
number of live points. It has three new considerations: a) tional speed. In addition, we incorporate several deep
the training set for the neural network would be better learning techniques to minimize the risk of inaccurate
because has a larger size, b) the number of operations neural network predictions.
in parallel for nested sampling is also larger, and c) we To verify the effectiveness of our method, we employed
test whether the hypotheses based on a larger number of several toy models, demonstrating their ability to repli-
live points can obtain a better accuracy for the neural cate a probability distribution with remarkable accuracy
9

dlogz_start log Z log Z neuralike Samples ANN samples % saved time


Case 1 10 −536.39 ± 0.13 −536.62 ± 0.13 16001 9570 18.8
Case 1 5 −536.39 ± 0.13 −536.8 ± 0.13 16297 7873 5.7
Case 2 20 −536.38 ± 0.07 −536.39 ± 0.07 65216 46147 28.4
Case 3 5 −550.59 ± 0.09 −550.82 ± 0.09 95148 31923 19.6

TABLE II: Exploring Bayesian Inference with Nested Sampling and neuralike. The definitions of the columns are consistent with those in
Table I. Additionally, the % saved time quantifies the speed-up achieved using our method.

Ωm Ω b h2 h w0 wa Ωk σ8 Σmν
Case 1a (dlogz_start = 10) 0.00480 0.00004 0.00428 0.01241 0.11283 − − −
Case 1b (dlogz_start = 5) 0.00091 0.00004 0.00518 0.00997 0.05761 − − −
Case 2 (dlogz_start = 20) 0.00095 0.00002 0.00111 0.00810 0.06279 − − −
Case 3 (dlogz_start = 5) 0.00055 0.00001 0.00054 0.00335 0.01396 0.00018 0.00971 0.01753

TABLE III: Wasserstein distances [86] between nested sampling posterior samples without and with neuralike, for each free parameter.
The closer the value of this distance is to zero, the more similar are the distributions compared. This distance is implemented in scipy and
takes into account the 1D posterior samples and their respective weights. Overall, parameters Ωm , Ωb h2 , and h exhibit relatively small
distances across all cases. However, in Case 1a, higher values of w0 and wa distances are observed due to a higher dlogz_start value and
fewer data points used. On the other hand, Case 3 demonstrates smaller (better) distances, attributed to the utilization of more data and a
lower dlogz_start value.

in the nested sampling framework. Furthermore, in the live points in Appendix A; however, future studies will
cosmological parameter estimation, by performing a com- address further research on this topic.
parative analysis using the CPL cosmological model and In this work, we only used observations from the late
various data sets, we highlighted the potential of our universe, as our neuralike method is integrated with
method to significantly improve the speed of nested sam- the SimpleMC code that employs mainly background cos-
pling processes, without compromising the statistical reli- mology. However, our method is easily applicable to the
ability of the results. We found that, as the number of use of other types of observations, such as CMB data, an
dimensions increased, our method produced a larger time aspect we are currently working on.
reduction with a lower dlogz_start value. We emphasize the importance of high accuracy in neu-
Despite commencing neural network training relatively ral network predictions in observational cosmology since
late in the nested sampling process, the overall time re- accurate parameter estimation is crucial for a robust phys-
duction was notable, as evidenced in Table II, showcasing ical interpretation of the results. In light of the machine
reductions ranging from 6% to 19%. Potential errors learning strategies proposed in this paper, we can have
in the neural network predictions were not found to be greater confidence in the use of neural networks to accel-
substantial because the training data set comprised the erate nested sampling processes, without compromising
live points. As such, the likelihood predictions are not the statistical quality of the results.
expected to deviate significantly from the actual prior
volume, which enhances the credibility and robustness of
our method and instills confidence in its application in Acknowledgments
nested sampling. In addition, our constant monitoring of
the ANN prediction accuracy with the actual likelihood IGV thanks the CONACYT postdoctoral grant, the
value allows us to be more confident in the results ob- ICF-UNAM support, and Will Handley for his invaluable
tained, because if the criteria were not met, the analytical advisory about nested sampling. JAV acknowledges the
function would be used again and, after certain samples, support provided by FOSEC SEP-CONACYT Investi-
another neural network would be retrained. gación Básica A1-S-21925, FORDECYT-PRONACES-
We also explore the potential utility of genetic algo- CONACYT 304001, and UNAM-DGAPA-PAPIIT
rithms in finding optimal neural network hyperparameters IN117723. This worked was performed thanks to the
and in generating initial live points for nested sampling. help of the computational unit of the ICF-UNAM and
Concerning the former, in scenarios where models are the clusters Chalcatzingo and Teopanzolco.
complex or high-dimensional, searching for an optimal
architecture can be beneficial; however, our neuralike
method allows this hyperparameter calibration to be op- Data Availability
tional so that hyperparameters can also be set by hand.
Regarding the latter, we provide some insight into the po- The implemented algorithm presented in this work is
tential advantages of using genetic algorithms to generate available in https://github.com/igomezv/neuralike
10

and the original SimpleMC code in https://github.com/ in this paper.


javazquez/SimpleMC, which contains the datasets used

[1] Joël Akeret, Alexandre Refregier, Adam Amara, Sebas- [15] Isidro Gómez-Vargas, Ricardo Medel Esquivel, Ricardo
tian Seehars, and Caspar Hasner. Approximate bayesian García-Salcedo, and J Alberto Vázquez. Neural network
computation for forward modeling in cosmology. Jour- within a bayesian inference framework. J. Phys. Conf.
nal of Cosmology and Astroparticle Physics, 2015(08):043, Ser., 1723(1):012022, 2021.
2015. [16] Alessio Spurio Mancini, Davide Piras, Justin Alsing, Ben-
[2] Elise Jennings and Maeve Madigan. astroabc: an ap- jamin Joachimi, and Michael P Hobson. Cosmopower:
proximate bayesian computation sequential monte carlo emulating cosmological power spectra for accelerated
sampler for cosmological parameter estimation. Astron- bayesian inference from next-generation surveys. Monthly
omy and computing, 19:16–22, 2017. Notices of the Royal Astronomical Society, 511(2):1771–
[3] E.E.O. Ishida, S.D.P. Vitenti, M. Penna-Lima, J. Cisewski, 1788, 2022.
R.S. de Souza, A.M.M. Trindade, E. Cameron, and V.C. [17] T Auld, Michael Bridges, MP Hobson, and SF Gull. Fast
Busti. cosmoabc: Likelihood-free inference via population cosmological parameter estimation using neural networks.
monte carlo approximate bayesian computation. Astron- Monthly Notices of the Royal Astronomical Society: Let-
omy and Computing, 13:1–11, 2015. ters, 376(1):L11–L15, 2007. [arXiv: astro-ph/0608174].
[4] Aleksandr Petrosyan and Will Handley. Supernest: ac- [18] Philip Graff, Farhan Feroz, Michael P Hobson, and An-
celerated nested sampling applied to astrophysics and thony Lasenby. Bambi: blind accelerated multimodal
cosmology. Physical Sciences Forum, 5(1):51, 2023. bayesian inference. Monthly Notices of the Royal Astro-
[5] Joanna Dunkley, Martin Bucher, Martin Bucher, Mar- nomical Society, 421(1):169–180, 2012. [arXiv:1110.2997].
tin Bucher, Pedro G. Ferreira, Pedro G. Ferreira, Kav- [19] Philip Graff, Farhan Feroz, Michael P Hobson, and An-
ilan Moodley, Kavilan Moodley, Kavilan Moodley, and thony Lasenby. Skynet: an efficient and robust neu-
Constantinos Skordis. Fast and reliable markov chain ral network training tool for machine learning in astron-
monte carlo technique for cosmological parameter estima- omy. Monthly Notices of the Royal Astronomical Society,
tion. Monthly Notices of the Royal Astronomical Society, 441(2):1741–1759, 2014. [arXiv:1309.0790].
356(3):925–936, 2005. [20] Héctor J Hortúa, Riccardo Volpi, Dimitri Marinelli, and
[6] Thejs Brinckmann and Julien Lesgourgues. Montepython Luigi Malagò. Parameter estimation for the cosmic mi-
3: boosted mcmc sampler and other features. Physics of crowave background with bayesian neural networks. Phys-
the Dark Universe, 24:100260, 2019. ical Review D, 102(10):103509, 2020.
[7] Robert L Schuhmann, Benjamin Joachimi, and Hiranya V [21] Andreas Nygaard, Emil Brinch Holm, Steen Hannestad,
Peiris. Gaussianization for fast and accurate inference and Thomas Tram. Connect: A neural network based
from cosmological data. Monthly Notices of the Royal framework for emulating cosmological observables and
Astronomical Society, 459(2):1916–1928, 2016. cosmological parameter inference. Journal of Cosmology
[8] Antony Lewis. Efficient sampling of fast and slow cos- and Astroparticle Physics, 2023(05):025, 2023.
mological parameters. Physical Review D, 87(10):103529, [22] Augusto T Chantada, Susana J Landau, Pavlos Protopa-
2013. pas, Claudia G Scóccola, and Cecilia Garraffo. Nn bundle
[9] Masanori Sato, Kiyotomo Ichiki, and Tsutomu T Takeuchi. method applied to cosmology: an improvement in compu-
Copula cosmology: Constructing a likelihood function. tational times. arXiv preprint arXiv:2311.15955, 2023.
Physical review D, 83(2):023501, 2011. [23] Will Handley. pyBAMBI. https://pybambi.
[10] William A Fendt and Benjamin D Wandelt. Pico: param- readthedocs.io/en/latest/#, 2018. [Online: accessed
eters for the impatient cosmologist. The Astrophysical 9-January-2020].
Journal, 654(1):2, 2007. [24] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
[11] Marcos Pellejero-Ibanez, Raul E Angulo, Giovanni Aricó, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
Matteo Zennaro, Sergio Contreras, and Jens Stücker. Cos- Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An
mological parameter estimation via iterative emulation of imperative style, high-performance deep learning library.
likelihoods. Monthly Notices of the Royal Astronomical Advances in neural information processing systems, 32,
Society, 499(4):5257–5268, 2020. 2019.
[12] Justin Alsing, Tom Charnock, Stephen Feeney, and Ben- [25] John Skilling. Nested sampling. AIP Conference Proceed-
jamin Wandelt. Fast likelihood-free cosmology with neural ings, 735(1):395–405, 2004.
density estimators and active learning. Monthly Notices of [26] John Skilling et al. Nested sampling for general bayesian
the Royal Astronomical Society, 488(3):4440–4458, 2019. computation. Bayesian analysis, 1(4):833–859, 2006.
[arXiv:1903.00007]. [27] Adrian E Raftery. Approximate bayes factors and account-
[13] Adam Moss. Accelerated bayesian inference using deep ing for model uncertainty in generalised linear models.
learning. Monthly Notices of the Royal Astronomical Biometrika, 83(2):251–266, 1996.
Society, 496(1):328–338, 2020. [28] Andrew R Liddle. Information criteria for astrophysical
[14] Hector J Hortua, Riccardo Volpi, Dimitri Marinelli, and model selection. Monthly Notices of the Royal Astronom-
Luigi Malago. Accelerating mcmc algorithms through ical Society: Letters, 377(1):L74–L78, 2007.
bayesian deep networks. arXiv preprint arXiv:2011.14276, [29] Andrew R Liddle, Pia Mukherjee, and David Parkin-
2020. son. Cosmological model selection. arXiv preprint astro-
11

ph/0608184, 2006. parameter inference in systems biology: application to an


[30] F Feroz, MP Hobson, and M Bridges. Multinest: an exemplar circadian model. BMC systems biology, 7(1):72,
efficient and robust bayesian inference tool for cosmol- 2013.
ogy and particle physics. Monthly Notices of the Royal [46] Lívia B Pártay, Albert P Bartók, and Gábor Csányi.
Astronomical Society, 398(4):1601–1614, 2009. Nested sampling for materials: The case of hard spheres.
[31] WJ Handley, MP Hobson, and AN Lasenby. Polychord: Physical Review E, 89(2):022302, 2014.
nested sampling for cosmology. Monthly Notices of the [47] Robert John Nicholas Baldock. Classical Statistical Me-
Royal Astronomical Society: Letters, 450(1):L61–L65, chanics with Nested Sampling. Springer, 2017.
2015. [arXiv:1502.01856]. [48] Béla Szekeres, Livia B Partay, and Edit Mátyus. Direct
[32] Josh Speagle and Kyle Barbary. dynesty: Dynamic nested computation of the quantum partition function by path-
sampling package. Astrophysics Source Code Library, integral nested sampling. Journal of chemical theory and
2018. computation, 14(8):4353–4359, 2018.
[33] Roberto Trotta, Farhan Feroz, Mike Hobson, and [49] David E Rumelhart, Geoffrey E Hinton, and Ronald J
Roberto Ruiz de Austri. Recent advances in bayesian Williams. Learning representations by back-propagating
inference in cosmology and astroparticle physics thanks errors. nature, 323(6088):533–536, 1986.
to the multinest algorithm. In Astrostatistical Challenges [50] Kurt Hornik, Maxwell Stinchcombe, and Halbert White.
for the New Astronomy, pages 107–119. Springer, 2013. Universal approximation of an unknown mapping and its
[34] David Parkinson, Pia Mukherjee, and Andrew Liddle. derivatives using multilayer feedforward networks. Neural
Cosmonest: Cosmological nested sampling. ascl, pages networks, 3(5):551–560, 1990.
ascl–1110, 2011. [51] Michael A Nielsen. Neural networks and deep learning,
[35] Pia Mukherjee, David Parkinson, and Andrew R Liddle. volume 25. Determination press San Francisco, CA, USA,
A nested sampling algorithm for cosmological model se- 2015.
lection. The Astrophysical Journal Letters, 638(2):L51, [52] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and
2006. Yoshua Bengio. Deep learning, volume 1. MIT press
[36] Benjamin Audren, Julien Lesgourgues, Karim Benabed, Cambridge, 2016.
and Simon Prunet. Conservative constraints on early [53] Juan de Dios Rojas Olvera, Isidro Gómez-Vargas, and
cosmology with monte python. Journal of Cosmology and Jose Alberto Vázquez. Observational cosmology with
Astroparticle Physics, 2013(02):001, 2013. artificial neural networks. Universe, 8(2):120, 2022.
[37] Y Akrami, F Arroja, M Ashdown, J Aumont, C Bacci- [54] Kerry Gallagher and Malcolm Sambridge. Genetic algo-
galupi, M Ballardini, AJ Banday, RB Barreiro, N Bartolo, rithms: a powerful tool for large-scale nonlinear optimiza-
S Basak, et al. Planck 2018 results-x. constraints on tion problems. Computers & Geosciences, 20(7-8):1229–
inflation. Astronomy & Astrophysics, 641:A10, 2020. 1236, 1994.
[38] I Bernst, P Schilke, T Moeller, D Panoglou, V Ossenkopf, [55] SN Sivanandam and SN Deepa. Genetic algorithms.
M Roellig, J Stutzki, and D Muders. Magix: A generic tool Springer, 2008.
for fitting models to astrophysical data. In Astronomical [56] Colin R Reeves. Genetic algorithms for the operations
Data Analysis Software and Systems XX, volume 442, researcher. INFORMS journal on computing, 9(3):231–
page 505, 2011. 250, 1997.
[39] J Buchner, A Georgakakis, K Nandra, L Hsu, C Rangel, [57] Sourabh Katoch, Sumit Singh Chauhan, and Vijay Ku-
M Brightman, A Merloni, M Salvato, J Donley, and mar. A review on genetic algorithm: past, present, and
D Kocevski. X-ray spectral modelling of the agn obscuring future. Multimedia Tools and Applications, 80(5):8091–
region in the cdfs: Bayesian model selection and catalogue. 8126, 2021.
Astronomy & Astrophysics, 564:A125, 2014. [58] Ricardo Medel-Esquivel, Isidro Gómez-Vargas, Alejandro
[40] Enrico Corsaro and Joris De Ridder. Diamonds: A new A. Morales Sánchez, Ricardo García-Salcedo, and José
bayesian nested sampling tool-application to peak bag- Alberto Vázquez. Cosmological Parameter Estimation
ging of solar-like oscillations. Astronomy & Astrophysics, with Genetic Algorithms. Universe, 10(1):11, 2024.
571:A71, 2014. [59] Diederik P Kingma. Adam: A method for stochastic
[41] Farhan Feroz, Jonathan R Gair, Michael P Hobson, and optimization. arXiv preprint arXiv:1412.6980, 2014.
Edward K Porter. Use of the multinest algorithm for [60] Isidro Gómez-Vargas, Joshua Briones Andrade, and J Al-
gravitational wave data analysis. Classical and Quantum berto Vázquez. Neural networks optimized by genetic al-
Gravity, 26(21):215003, 2009. gorithms in cosmology. Physical Review D, 107(4):043509,
[42] Matthew Pitkin, Colin Gill, John Veitch, Erin Macdonald, 2023.
and Graham Woan. A new code for parameter estimation [61] JA Vazquez, I Gomez-Vargas, and A Slosar. Updated
in searches for gravitational waves from known pulsars. version of a simple mcmc code for cosmological parameter
Journal of Physics: Conference Series, 363(1):012041, estimation where only expansion history matters. https:
2012. //github.com/ja-vazquez/SimpleMC, 2020.
[43] Walter Del Pozzo, John Veitch, and Alberto Vecchio. [62] Éric Aubourg, Stephen Bailey, Julian E Bautista, Flo-
Testing general relativity using bayesian model selection: rian Beutler, Vaishali Bhardwaj, Dmitry Bizyaev, Michael
Applications to observations of gravitational waves from Blanton, Michael Blomqvist, Adam S Bolton, Jo Bovy,
compact binary systems. Physical Review D, 83(8):082002, et al. Cosmological implications of baryon acoustic oscil-
2011. lation measurements. Physical Review D, 92(12):123516,
[44] Nick Pullen and Richard J Morris. Bayesian model com- 2015. [arXiv:1411.1074].
parison and parameter inference in systems biology using [63] Joshua S Speagle. dynesty: a dynamic nested sam-
nested sampling. PloS one, 9(2):e88419, 2014. pling package for estimating bayesian posteriors and evi-
[45] Stuart Aitken and Ozgur E Akman. Nested sampling for dences. Monthly Notices of the Royal Astronomical Soci-
12

ety, 493(3):3132–3158, 2020. [arXiv:1904.02180]. evidence of the epoch of cosmic re-acceleration. Journal of
[64] Salvatore Ingrassia and Isabella Morlini. Neural network Cosmology and Astroparticle Physics, 2016(05):014, 2016.
modeling for small datasets. Technometrics, 47(3):297– [arXiv:1601.01701].
311, 2005. [78] AL Ratsimbazafy, SI Loubser, SM Crawford, CM Cress,
[65] Hong-Wei Ng, Viet Dung Nguyen, Vassilios Vonikakis, and BA Bassett, RC Nichol, and P Väisänen. Age-dating lumi-
Stefan Winkler. Deep learning for emotion recognition on nous red galaxies observed with the southern african large
small datasets using transfer learning. In Proceedings of telescope. Monthly Notices of the Royal Astronomical
the 2015 ACM on international conference on multimodal Society, 467(3):3239–3254, 2017. [arXiv:1702.00418].
interaction, pages 443–449, 2015. [79] Ashley J Ross, Lado Samushia, Cullan Howlett, Will J
[66] Antonello Pasini. Artificial neural networks for small Percival, Angela Burden, and Marc Manera. The cluster-
dataset analysis. Journal of thoracic disease, 7(5):953, ing of the sdss dr7 main galaxy sample–i. a 4 per cent
2015. distance measure at z= 0.15. Monthly Notices of the Royal
[67] Isidro Gómez-Vargas, Ricardo Medel-Esquivel, Ricardo Astronomical Society, 449(1):835–847, 2015.
García-Salcedo, and J Alberto Vázquez. Neural network [80] Florian Beutler, Chris Blake, Matthew Colless, D Heath
reconstructions for the hubble parameter, growth rate Jones, Lister Staveley-Smith, Lachlan Campbell, Quentin
and distance modulus. The European Physical Journal C, Parker, Will Saunders, and Fred Watson. The 6df galaxy
83(4):304, 2023. survey: baryon acoustic oscillations and the local hubble
[68] Michel Chevallier and David Polarski. Accelerating uni- constant. Monthly Notices of the Royal Astronomical
verses with scaling dark matter. International Journal Society, 416(4):3017–3032, 2011.
of Modern Physics D, 10(02):213–223, 2001. [arXiv: gr- [81] Shadab Alam, Metin Ata, Stephen Bailey, Florian Beutler,
qc/0009008]. Dmitry Bizyaev, Jonathan A Blazek, Adam S Bolton,
[69] Eric V Linder. Exploring the expansion history of the Joel R Brownstein, Angela Burden, Chia-Hsun Chuang,
universe. Physical Review Letters, 90(9):091301, 2003. et al. The clustering of galaxies in the completed sdss-
[arXiv: astro-ph/0208512]. iii baryon oscillation spectroscopic survey: cosmological
[70] Daniel Moshe Scolnic, DO Jones, A Rest, YC Pan, analysis of the dr12 galaxy sample. Monthly Notices of
R Chornock, RJ Foley, ME Huber, R Kessler, Gautham the Royal Astronomical Society, 470(3):2617–2652, 2017.
Narayan, AG Riess, et al. The complete light-curve sam- [arXiv:1607.03155].
ple of spectroscopically confirmed sne ia from pan-starrs1 [82] Metin Ata, Falk Baumgarten, Julian Bautista, Florian
and cosmological constraints from the combined pantheon Beutler, Dmitry Bizyaev, Michael R Blanton, Jonathan A
sample. The Astrophysical Journal, 859(2):101, 2018. Blazek, Adam S Bolton, Jonathan Brinkmann, Joel R
[arXiv:1710.00845]. Brownstein, et al. The clustering of the sdss-iv extended
[71] Raul Jimenez, Licia Verde, Tommaso Treu, and Daniel baryon oscillation spectroscopic survey dr14 quasar sam-
Stern. Constraints on the equation of state of dark energy ple: first measurement of baryon acoustic oscillations
and the hubble constant from stellar ages and the cos- between redshift 0.8 and 2.2. Monthly Notices of the
mic microwave background. The Astrophysical Journal, Royal Astronomical Society, 473(4):4773–4794, 2018.
593(2):622, 2003. [arXiv: astro-ph/0302560]. [83] Michael Blomqvist, Hélion Du Mas Des Bourboux, Vic-
[72] Joan Simon, Licia Verde, and Raul Jimenez. Constraints toria de Sainte Agathe, James Rich, Christophe Balland,
on the redshift dependence of the dark energy potential. Julian E Bautista, Kyle Dawson, Andreu Font-Ribera,
Physical Review D, 71(12):123001, 2005. [arXiv: astro- Julien Guy, Jean-Marc Le Goff, et al. Baryon acoustic
ph/0412269]. oscillations from the cross-correlation of lyα absorption
[73] Daniel Stern, Raul Jimenez, Licia Verde, Marc and quasars in eboss dr14. Astronomy & Astrophysics,
Kamionkowski, and S Adam Stanford. Cosmic chronome- 629:A86, 2019.
ters: constraining the equation of state of dark energy. i: [84] Victoria de Sainte Agathe, Christophe Balland, Hélion
h(z) measurements. Journal of Cosmology and Astropar- Du Mas Des Bourboux, Michael Blomqvist, Julien Guy,
ticle Physics, 2010(02):008, 2010. [arXiv:0907.3149]. James Rich, Andreu Font-Ribera, Matthew M Pieri, Ju-
[74] Michele Moresco, Licia Verde, Lucia Pozzetti, Raul lian E Bautista, Kyle Dawson, et al. Baryon acoustic
Jimenez, and Andrea Cimatti. New constraints on cos- oscillations at z= 2.34 from the correlations of lyα absorp-
mological parameters and neutrino properties using the tion in eboss dr14. Astronomy & Astrophysics, 629:A85,
expansion rate of the universe to z ∼ 1.75. Journal of 2019.
Cosmology and Astroparticle Physics, 2012(07):053, 2012. [85] Bryan Sagredo, Savvas Nesseris, and Domenico Sapone.
[arXiv:1201.6658]. Internal robustness of growth rate data. Physical Review
[75] Cong Zhang, Han Zhang, Shuo Yuan, Siqi Liu, Tong-Jie D, 98(8):083543, 2018. [arXiv:1806.10822].
Zhang, and Yan-Chun Sun. Four new observational h(z) [86] Aaditya Ramdas, Nicolás García Trillos, and Marco Cu-
data from luminous red galaxies in the sloan digital sky turi. On wasserstein two-sample testing and related fami-
survey data release seven. Research in Astronomy and lies of nonparametric tests. Entropy, 19(2):47, 2017.
Astrophysics, 14(10):1221, 2014. [arXiv:1207.4541]. [87] David W Hogg and Daniel Foreman-Mackey. Data analy-
[76] Michele Moresco. Raising the bar: new constraints on sis recipes: Using markov chain monte carlo. The Astro-
the hubble parameter with cosmic chronometers at z ∼ physical Journal Supplement Series, 236(1):11, 2018.
2. Monthly Notices of the Royal Astronomical Society:
Letters, 450(1):L16–L20, 2015. [arXiv:1503.01116].
[77] Michele Moresco, Lucia Pozzetti, Andrea Cimatti, Raul
Jimenez, Claudia Maraston, Licia Verde, Daniel Thomas,
Annalisa Citro, Rita Tojeiro, and David Wilkinson. A 6%
measurement of the hubble parameter at z ∼ 0.45: direct
13

Model eggbox eggbox ΛCDM ΛCDM


Sampler NS NS+GA NS NS+GA
log Z −236.16 ± 0.34 −235.07 ± 0.37 −532.87 ± 0.34 −532.70 ± 0.33
Ωm −− −− 0.31 ± 0.011 0.31 ± 0.011
Ωb h2 −− −− 0.02 ± 0.0005 0.02 ± 0.0005
h −− −− 0.683 ± 0.009 0.683 ± 0.009
% saved time −− 38 −− 23

TABLE IV: Nested sampling for the eggbox toy model and ΛCDM
using 100 live points. In the NS+GA cases, we generate the first
live points through genetic algorithms with a probability of
mutation equal to 0.5 and a probability of crossover of 0.8.

Appendix A: Genetic algorithms as initial live points

Previously, we mentioned that neural networks are good


at interpolating, but not at extrapolating. Within the
Bayesian inference process, we sample an indeterminate
posterior probability distribution, whose shape remains
unknown. Despite having some idea of the range of new
samples in parameter space, we cannot definitively state
that the highest likelihood point is already among the
live points; this uncertainty may lead to inaccurate pre-
dictions for points close to the maximum likelihood point.
In reference [87], the authors propose the use of an opti-
mizer to identify the optimal posterior probability sample,
albeit at the expense of probabilism. The application of
genetic algorithms to generate initial live points could be
beneficial in cases where the Bayesian inference process
must stop. In such circumstances, the partially generated
posterior sampling aided by genetic algorithms will be
more aligned with the maximum than a sampling gen-
erated without them. This alignment could facilitate a
partial posterior sampling analysis. Although further in-
vestigation of this foray into genetic algorithms is needed,
we have observed that when a small number of live points
are used, and the initial live points are produced by a
genetic algorithm, the stopping criterion is reached more
quickly.
As the first insight into the genetic algorithms to gen-
erate the initial live points, we show some results about
potential advantages in which genetic algorithms could
help a nested sampling execution. In Table IV, we can
see some examples in which the use of GA to generate the
first live points can reduce computational time without
sacrificing the statistical results. However, it is worth
noticing that we are using a low number of live points
because this is the case in which we observed this ad-
vantage, when a higher number of live points is used, in
general, NS alone is faster because have points in a sparse
region of the search space and the use of GA cluster the
points around the optimums losing exploration capacity.
Nonetheless, there are possible scenarios in which there
could be a low number of live points and in these cases,
the incursion of GA to generate the initial sampling points
could apport an advantage. This is part of a further study
of the exploration in detail of this combination between
GA with NS.
14

FIG. 2: Case 1. Posterior plots for CPL Pantheon+HD+BAO with the proposed methods in this work.
15

FIG. 3: Case 2. Posterior plots for CPL using Pantheon+HD+BAO with the proposed methods in this work. We use 4000 live points.
16

FIG. 4: Case 3. 2D posterior plots for CPL with curvature using Pantheon+HD+BAO+fσ8 +Planck with the proposed methods in this
work. Using 4000 points and considering 8 free parameters. In this case, because of the complexity, there were three neural networks
trained before to substitute the likelihood function, however, the Bayesian inference process using our method was 19.6% faster.

You might also like