Deep Learning and Genetic Algorithms For Cosmological Bayesian Inference Speed-Up
Deep Learning and Genetic Algorithms For Cosmological Bayesian Inference Speed-Up
the current set of live points as training data, without the need for pre-training. This flexibility
enables adaptation to various theoretical models and datasets. We perform the hyperparameter
optimization using genetic algorithms to suggest initial neural network architectures for learning each
likelihood function. Once sufficient accuracy is achieved, the neural network replaces the original
likelihood function. The implementation integrates with nested sampling algorithms and has been
thoroughly evaluated using both simple cosmological dark energy models and diverse observational
datasets. Additionally, we explore the potential of genetic algorithms for generating initial live
points within nested sampling inference, opening up new avenues for enhancing the efficiency and
effectiveness of Bayesian inference methods.
PACS numbers:
The structure of the paper is as follows: Section II Nested sampling (NS) belongs to a category of infer-
offers an overview of Bayesian inference and nested sam- ence methods that estimate the Bayesian evidence along
pling. Section III provides a concise exposition of the with its uncertainty by sampling the posterior probability
machine learning fundamentals employed in this study. density function. It was proposed by John Skilling in
The concept and development of our machine learning 2004 [25, 26]. The evidence, or marginalization of the
strategies are detailed in Section IV. Section V and Sec- likelihood function, is a key quantity in model comparison,
tion VI present our results, applied respectively to testing through the Bayes factor in the Jeffreys’ scale. It is a
toy models and estimating cosmological parameters. In more rigorous technique [26, 27] than other widely used
Section VII, we discuss our research findings and present methods such as the information criteria approximations
our final reflections. Furthermore, the Appendix features [28, 29]. NS works by computing the Bayesian evidence
preliminary results about the incorporation of genetic while assuming that the parameter space (prior volume or
algorithms as initiators of the live points in a nested prior mass) shrinks by a certain factor. There are success-
sampling execution. ful Nested Sampling implementations [30–32] and several
applications in cosmology [33–37], astrophysics [38–40],
gravitational waves analysis [41–43], biology [44, 45] and
II. STATISTICAL BACKGROUND in other scientific fields [46–48].
To understand the method proposed in this work, we
In this section, we describe an overview of Bayesian briefly describe some considerations about the NS algo-
inference and neural networks. In particular, we focus rithm. For more details, we recommend Refs. [26, 31, 32].
on the nested sampling algorithm and feedforward neural First of all, the Bayesian evidence can be written as fol-
networks. lows:
3
function. Genetic algorithms excel in addressing large- the neural network, the learning rate decreases by
scale nonlinear and nonconvex optimization problems in half. However, during each individual ANN training
challenging search scenarios [54, 55]. session, the learning rate remains constant within
To apply genetic algorithms to a specific problem, one the adaptive gradient descent algorithm called Adam
must select the objective function to optimize, delineate [59].
the search space, and specify the genetic parameters such
as crossover, mutation, and elitism. Probability values • Hyperparameter tunning. We have imple-
for crossover and mutation operators are assigned, and a mented the option of using genetic algorithms to
selection operator determines which individuals advance find the architecture of the first trained neural net-
to the subsequent generation. Elitism, represented by a work. For this purpose, we use the library nnogada
positive integer value, dictates the number of individu- [60]. For simplicity in this work, we use genetic al-
als guaranteed passage to the next generation. Overall, gorithms over 3 generations with a population size
genetic algorithms initialize a population and iteratively of 5 to explore combinations of batch size (4 or 8),
modify individuals through the operators and the ob- number of layers (2 or 3), learning rate (0.0005 or
jective function, progressively approaching the optimal 0.001), and number of neurons per layer (50 or 100).
solution of the target function. In a nested sampling execution, where we can train
While this paper does not delve deeply into the math- the neural networks multiple times, we use these
ematical principles underlying genetic algorithms, inter- small configurations. This approach yields better
ested readers are directed to the following references results compared to not tuning hyperparameters
[56, 57], particularly for parameter estimation in cos- and is more effective than using a hyperparameter
mology [58]. grid [60].
a new sample is extracted within a prior iso-likelihood or are met, the live points are used to train the ANN. If the
mass surface. Our goal is for the neural network to predict ANN’s performance metrics meet the required threshold,
the likelihood of points within this prior volume. To do the analytical likelihood function is replaced by the ANN
this, we train the neural network with only the current set to save computational time. While this substitution does
of live points. These points, which typically are around not alter the fundamental nested sampling process, it can
hundreds or thousands, are sufficient to effectively train significantly enhance efficiency by reducing computational
a neural network and have several advantages: overhead.
• The relatively small dataset size implies that the
neural network training process is not computation- B. Using genetic algorithms
ally intensive.
• By excluding points outside the current prior vol- We proposed genetic algorithms, like in our nnogada
ume, we can potentially avoid inaccurate predictions library [60], as an optional method to find the hyperpa-
in regions where points would be rejected based on rameter of the neural network as part of the workflow
the original likelihood. The points that the neural of neuralike, as it can be noticed in the Algorithm 1.
network learns efficiently are those within the prior In large parameter estimation processes, it is useful, de-
volume, becasue they have a higher probability of spite the time required, to find the best neural network
acceptance according to the original likelihood. architecture.
On the other hand, we explored the first insight about
• The quantity of elements within the training set re- the generation of the initial live points of a nested sam-
mains constant. Whether the neural network starts pling process with genetic algorithms. It is analyzed in
its training at the beginning of sampling or at a later Appendix A. Although we have incorporated the use of
stage, the element count does not vary. As a result, genetic algorithms in our code, the primary focus of this
the majority of neural network hyperparameters paper is on our neuralike method (Section IV A). As
could stay consistent across different datasets. such, further analysis of genetic algorithms in this context
will be the subject of future research.
The likelihood function in cosmological parameter
estimation can be quite complex, often involving various
types of observational data and intricate numerical
V. TOY MODELS
operations, such as integrals, derivatives, or approxi-
mation methods for solving differential equations. To
address this complexity, the idea is to replace the As a first step in testing our method, we use some
analytical likelihood function with a trained neural toy models as log-likelihood functions. These toy models
network. This substitution reduces the problem to a only generate samplers within the Bayesian inference,
simple matrix multiplication, where the optimal weights, without parameter estimation. However, it is useful to
obtained during ANN training, are stored in a binary file. check the ability of the neural networks to learn, given a
Consequently, the evaluation of the likelihood becomes set of live points, the shape of these functions in runtime,
significantly faster. This acceleration is particularly and their respective values for the Bayesian evidence.
advantageous in a Bayesian inference process, where the We use the following toy models, with the mentioned
likelihood function may need to be evaluated thousands hyperparameters:
or even millions of times, making the reduction in
• A gaussian,
computational time highly beneficial. 2
f (x, y) = − 12 (x2 + y2 − xy). Learning rate 5 × 10−3 ,
Algorithm 1 provides an overview of our proposed 100 epochs, batch size as 1.
methodology within a nested sampling execution. Con- • Eggbox function,
cerning the neural network implementation, our primary x
f (x, y) = (2 + cos( 2.0 y
) cos( 2.0 ))5.0 . Learning rate
focus is on the segment within the for loop. Once a prede-
1 × 10 , 100 epochs, batch size as 1.
−4
termined number of samples have been reached, or when
the flag dlogz_start is activated, the ANN leverages • Himmelblau’s function,
the current live points for its training. The benefit of f (x, y) = (x2 + y − 11)2 + (x + y 2 − 7)2 . Learning
utilizing only the set of live points is twofold: firstly, it rate 1 × 10−4 , 100 epochs, batch size as 1.
facilitates swift training, and secondly, it ensures that
the ANN learns likelihood values strictly within the prior We have used some toy models as log-likelihood func-
volume. This area is precisely where new samples should tions: Gaussian, egg-box, and Himmelblau. In Table I,
be located. you can see the results of the Bayesian evidence calcu-
It is important to note that the nested sampling process, lation with and without our method for the three toy
including the selection of priors, typically uniform or models, while in Figure I, you can see the samples of
Gaussian distributions, remains consistent with standard the three functions, which at first glance are very similar.
practices. Once the criteria for initiating ANN training Based on these results, we can notice that for all these
6
using_neuralike = False
if livegenetic == True (optional) then
Define Pmut and Pcross
Generate a population P with Nind individuals
Evolve population through Ngen generations
else
Generate Nlive live points
for i in range(iteration) do
if (dlogz < dlogz_start) OR (nsamples >= nsamples_start) then
if i % N == 0 AND using_neuralike == False then
Use nlive points as training dataset
Optional: Use genetic algorithms with nnogada to choose the best architecture
Use the best architecture to model the likelihood
if loss function < valid_loss then
using_neuralike = True
L = ANNmodel
else
continue with NS
if min(saved_logl) - logl_tolerance < neuralike < max(saved_logl) + logl_tolerance then
continue else
like=logL;
using_neuralike = False
end
Algorithm 1: Nested sampling with neuralike. dlogz_start and nsamples_start are the two ways to start neuralike, with a
dlogz value (recommended) or given a specific number of generated samples. The logl_tolerance parameter represents the neural
network prediction tolerance required to be considered valid. saved_logl denotes the log-likelihoods of the current live points, and
valid_loss determines the criterion for accepting or rejecting a neural network training. Any loss function values higher than
valid_loss will be rejected. The variable logL represents the analytical log-likelihood function, while L can either be logL or
AN N model, depending on the successful neural model.
models, the speed of sampling using neural networks is denotes its present-day (z = 0) value. In this case, the
slower than in the case of nested sampling alone; this EoS for the dark energy is w(z) = −1.
is because the analytical functions are being evaluated A step further to the standard model is to consider the
directly without sampling from an unknown posterior dark energy being dynamic, where the evolution of its
distribution; nevertheless, these examples are very useful EoS is usually parameterized. A commonly used form
to verify the accuracy in calculating Bayesian evidence of w(z) is to take into account the next contribution of
and sampling from the distribution. We can observe that a Taylor expansion in terms of the scale factor w(a) =
both the log-Bayesian evidence and the graphs of the w0 + (1 − a)wa or in terms of redshift w(z) = w0 + 1+z
z
wa
nested sampling process without and with neural net- (CPL model [68, 69]). The parameters w0 and wa are
works are consistent; however, as Table I shows, for more real numbers such that at the present epoch w|z=0 = w0
complex functions, we need a lower value of dlogz_start, and dw/dz|z=0 = −wa ; we recover ΛCDM when w0 = −1
which means that we need to start learning the neural and wa = 0. Hence the Friedmann equation for the CPL
network at a later stage of nested sampling. Therefore, parameterization turns out to be:
a lower dlogz_start parameter is needed to be more
accurate but slower, and it is precisely this parameter H(z)2 = H02 [Ωm,0 (1 + z)3 +
that regulates the speed-accuracy trade-off. 3wa z (12)
(1 − Ωm,0 )(1 + z)3(1+w0 +wa ) e− 1+z ].
where H is the Hubble parameter and Ωm is the matter • Cosmic chronometers. Cosmic chronometers,
density parameter; subscript 0 attached to any quantity also known as Hubble distance (HD) measurements,
7
Model log Z log Z neuralike dlogz_start Valid loss Samples ANN samples
Gaussian −2.13 ± 0.05 −2.16 ± 0.05 50 0.05 6774 6773
Eggbox −235.83 ± 0.11 −235.82 ± 0.11 10 0.05 11794 3688
Himmelblau −5.59 ± 0.09 −5.64 ± 0.09 5 0.05 10253 4528
TABLE I: Comparing Bayesian evidence for toy models with nested sampling alone and using neuralike. The column dlogz_start
indicates the dlogz value marking the start of neural network training; higher values suggest earlier integration of neural networks into
Bayesian sampling. Valid loss represents the threshold value of the loss function required for accepting a neural network as valid. The last
two columns display the total number of samples generated through the nested sampling process and the subset produced by the trained
neural networks.
FIG. 1: Comparison of neural likelihoods versus original likelihoods using toy models. Using 1000 live points.
are galaxies that evolve slowly and allow direct tended version of the Gold-2017 compilation avail-
measurements of the Hubble parameter H(z). We able in [85], which includes 22 independent measure-
use a compilation with 31 data points collected ments of f σ 8 (z) with their statistical errors obtained
over several years within redshifts between 0.09 and from redshift space distortion measurements across
1.965. [71–78]. various surveys.
• BAO. We employ data from Baryon Acoustic Oscil- • Planck-15 information. We also consider a com-
lation measurements (BAO) with redshifts z < 2.36. pressed version of Planck-15 information, where the
They are from SDSS Main Galaxy Sample (MGS) Cosmic Microwave Background (CMB) is treated
[79], Six-Degree Field Galaxy Survey (6dFGS) [80], as a BAO experiment located at redshift z = 1090,
SDSS DR12 Galaxy Consensus [81], BOSS DR14 measuring the angular scale of the sound horizon.
quasars (eBOSS) [82], Ly-α DR14 cross-correlation For more details, see the Reference [62].
[83] and Ly-α DR14 auto-correlation [84], .
We executed three cases of parameter estimation to
• Growth rate measurements. We used an ex- verify the performance of our method. We start with
8
one thousand live points and a model with five free pa- network earlier within the nested sampling process (i.e. in
rameters; then, we increase the live points and free pa- a higher value for dlogz_start). Therefore, we increase
rameters, to test our method with higher dimensionality the number of live points to 4000 and dlogz_start = 20;
and with higher computational power demand (larger the outputs are included in Figure 3 showing an excellent
number of live points). The results are compared with a concordance for the Bayesian evidence values with our
nested sampling run with the same data sets and the same method, and speed-up around the 28.4%. Table II con-
configuration (live points, stopping criterion, etc.) but tains the results of the Bayesian evidence, and it can be
without ANN; this comparison aims to test the accuracy noticed that the uncertainty of this case is in better agree-
and speedup achieved by our neuralike method. For ment with nested sampling than the two scenarios of Case
this comparison, we report the parameter estimation and 1. In addition, we can analyze Table III and conclude
Bayesian evidence obtained with and without our method that, effectively, its performance has a similar quality to
and, in addition, we calculate the Wasserstein distances Case 1 with dlogz_start = 5; however because it uses a
[86] between the samples of the posterior nested sampling higher dlogz_start value, the percentage of saved time
without and with neuralike for each free parameter con- is notorious.
sidering their respective sampling weights.
In the results, a baseline neural network architecture
was employed, configured with the following hyperparam- C. Case 3
eters: 3 hidden layers, a batch size of 32, a learning rate
of 0.001 (utilizing the Adam gradient descent algorithm Lastly, we included more data: f σ8 measurements and
for optimization), 500 epochs, and an early stopping pa- a point with Planck-15 information. To have more free
tience of 200 epochs. In scenarios where multiple neural parameters, eight in total, we consider contributions of
networks were required, the learning rate was reduced the neutrino masses Σmν , growth rate σ8 , and curvature
following the previously mentioned approach. As for eval- Ωk . In this case, we also used 4000 live points. With these
uating the accuracy of the neural networks, we adopted new considerations, we aim to test our method in higher
a valid_loss threshold of 0.05 for their training, and a dimensions and to involve a more complex likelihood
logl_tolerance of 0.05 for their predictions. function that demands more computational power with
each evaluation. We made several tests, but we include the
corresponding to dlogz_start = 5, in which we obtain
A. Case 1 excellent results as can be noticed in Table II. Due to
the complexity of the likelihood, the full nested sampling
First, we perform the Bayesian inference for the CPL process had to train three different neural networks, which
model using SNeIa from Pantheon compilation, with cos- allowed the use of erroneous predictions during sampling
mic chronometers and BAO data. In this case, we consider to be avoided.
only five free parameters: Ωm , Ωb h2 , h, wa , and w0 . We We needed a lower value for the dlogz_start parame-
use 1000 live points. Figure 3 shows our results and we ter due to the complexity of the model (given by the new
can notice that when dlogz_start = 10 the saved time free parameters); however, the saved time around of 19%
is around 19% and when dlogz_start = 5 it is around concerning the nested sampling alone is remarkable and
only 6%. According to Table II both cases are in agree- the Wasserstein distance shown in Table III indicates that
ment with the logZ value for nested sampling alone. If the posterior distributions between the nested sampling
we check Table III we can notice that, in general, the with and without our method are similar, it also can be
samples of the dlogz_start = 5 are more similar to the noticed in the posterior plots of the Figure 4.
nested sampling posterior distributions; it also can be
appreciated in the posterior plots shown in the Figure
2. Although the case of dlogz_start = 5 saves less time VII. CONCLUSIONS
than the case of dlogz_start = 10, it gains in accuracy.
In this paper, we have introduced a novel method that
incorporates a neural network trained on-the-fly to learn
B. Case 2 the likelihood function within a nested sampling process.
The main objective is to avoid the time-consuming ana-
Secondly, we consider the same model, free parameters, lytical likelihood function, thus increasing computational
and datasets as in Case 1. The difference in this second efficiency. We present the dlogz_start parameter as a
case is to analyze the behavior of our method with a larger tool to handle the trade-off of accuracy and computa-
number of live points. It has three new considerations: a) tional speed. In addition, we incorporate several deep
the training set for the neural network would be better learning techniques to minimize the risk of inaccurate
because has a larger size, b) the number of operations neural network predictions.
in parallel for nested sampling is also larger, and c) we To verify the effectiveness of our method, we employed
test whether the hypotheses based on a larger number of several toy models, demonstrating their ability to repli-
live points can obtain a better accuracy for the neural cate a probability distribution with remarkable accuracy
9
TABLE II: Exploring Bayesian Inference with Nested Sampling and neuralike. The definitions of the columns are consistent with those in
Table I. Additionally, the % saved time quantifies the speed-up achieved using our method.
Ωm Ω b h2 h w0 wa Ωk σ8 Σmν
Case 1a (dlogz_start = 10) 0.00480 0.00004 0.00428 0.01241 0.11283 − − −
Case 1b (dlogz_start = 5) 0.00091 0.00004 0.00518 0.00997 0.05761 − − −
Case 2 (dlogz_start = 20) 0.00095 0.00002 0.00111 0.00810 0.06279 − − −
Case 3 (dlogz_start = 5) 0.00055 0.00001 0.00054 0.00335 0.01396 0.00018 0.00971 0.01753
TABLE III: Wasserstein distances [86] between nested sampling posterior samples without and with neuralike, for each free parameter.
The closer the value of this distance is to zero, the more similar are the distributions compared. This distance is implemented in scipy and
takes into account the 1D posterior samples and their respective weights. Overall, parameters Ωm , Ωb h2 , and h exhibit relatively small
distances across all cases. However, in Case 1a, higher values of w0 and wa distances are observed due to a higher dlogz_start value and
fewer data points used. On the other hand, Case 3 demonstrates smaller (better) distances, attributed to the utilization of more data and a
lower dlogz_start value.
in the nested sampling framework. Furthermore, in the live points in Appendix A; however, future studies will
cosmological parameter estimation, by performing a com- address further research on this topic.
parative analysis using the CPL cosmological model and In this work, we only used observations from the late
various data sets, we highlighted the potential of our universe, as our neuralike method is integrated with
method to significantly improve the speed of nested sam- the SimpleMC code that employs mainly background cos-
pling processes, without compromising the statistical reli- mology. However, our method is easily applicable to the
ability of the results. We found that, as the number of use of other types of observations, such as CMB data, an
dimensions increased, our method produced a larger time aspect we are currently working on.
reduction with a lower dlogz_start value. We emphasize the importance of high accuracy in neu-
Despite commencing neural network training relatively ral network predictions in observational cosmology since
late in the nested sampling process, the overall time re- accurate parameter estimation is crucial for a robust phys-
duction was notable, as evidenced in Table II, showcasing ical interpretation of the results. In light of the machine
reductions ranging from 6% to 19%. Potential errors learning strategies proposed in this paper, we can have
in the neural network predictions were not found to be greater confidence in the use of neural networks to accel-
substantial because the training data set comprised the erate nested sampling processes, without compromising
live points. As such, the likelihood predictions are not the statistical quality of the results.
expected to deviate significantly from the actual prior
volume, which enhances the credibility and robustness of
our method and instills confidence in its application in Acknowledgments
nested sampling. In addition, our constant monitoring of
the ANN prediction accuracy with the actual likelihood IGV thanks the CONACYT postdoctoral grant, the
value allows us to be more confident in the results ob- ICF-UNAM support, and Will Handley for his invaluable
tained, because if the criteria were not met, the analytical advisory about nested sampling. JAV acknowledges the
function would be used again and, after certain samples, support provided by FOSEC SEP-CONACYT Investi-
another neural network would be retrained. gación Básica A1-S-21925, FORDECYT-PRONACES-
We also explore the potential utility of genetic algo- CONACYT 304001, and UNAM-DGAPA-PAPIIT
rithms in finding optimal neural network hyperparameters IN117723. This worked was performed thanks to the
and in generating initial live points for nested sampling. help of the computational unit of the ICF-UNAM and
Concerning the former, in scenarios where models are the clusters Chalcatzingo and Teopanzolco.
complex or high-dimensional, searching for an optimal
architecture can be beneficial; however, our neuralike
method allows this hyperparameter calibration to be op- Data Availability
tional so that hyperparameters can also be set by hand.
Regarding the latter, we provide some insight into the po- The implemented algorithm presented in this work is
tential advantages of using genetic algorithms to generate available in https://github.com/igomezv/neuralike
10
[1] Joël Akeret, Alexandre Refregier, Adam Amara, Sebas- [15] Isidro Gómez-Vargas, Ricardo Medel Esquivel, Ricardo
tian Seehars, and Caspar Hasner. Approximate bayesian García-Salcedo, and J Alberto Vázquez. Neural network
computation for forward modeling in cosmology. Jour- within a bayesian inference framework. J. Phys. Conf.
nal of Cosmology and Astroparticle Physics, 2015(08):043, Ser., 1723(1):012022, 2021.
2015. [16] Alessio Spurio Mancini, Davide Piras, Justin Alsing, Ben-
[2] Elise Jennings and Maeve Madigan. astroabc: an ap- jamin Joachimi, and Michael P Hobson. Cosmopower:
proximate bayesian computation sequential monte carlo emulating cosmological power spectra for accelerated
sampler for cosmological parameter estimation. Astron- bayesian inference from next-generation surveys. Monthly
omy and computing, 19:16–22, 2017. Notices of the Royal Astronomical Society, 511(2):1771–
[3] E.E.O. Ishida, S.D.P. Vitenti, M. Penna-Lima, J. Cisewski, 1788, 2022.
R.S. de Souza, A.M.M. Trindade, E. Cameron, and V.C. [17] T Auld, Michael Bridges, MP Hobson, and SF Gull. Fast
Busti. cosmoabc: Likelihood-free inference via population cosmological parameter estimation using neural networks.
monte carlo approximate bayesian computation. Astron- Monthly Notices of the Royal Astronomical Society: Let-
omy and Computing, 13:1–11, 2015. ters, 376(1):L11–L15, 2007. [arXiv: astro-ph/0608174].
[4] Aleksandr Petrosyan and Will Handley. Supernest: ac- [18] Philip Graff, Farhan Feroz, Michael P Hobson, and An-
celerated nested sampling applied to astrophysics and thony Lasenby. Bambi: blind accelerated multimodal
cosmology. Physical Sciences Forum, 5(1):51, 2023. bayesian inference. Monthly Notices of the Royal Astro-
[5] Joanna Dunkley, Martin Bucher, Martin Bucher, Mar- nomical Society, 421(1):169–180, 2012. [arXiv:1110.2997].
tin Bucher, Pedro G. Ferreira, Pedro G. Ferreira, Kav- [19] Philip Graff, Farhan Feroz, Michael P Hobson, and An-
ilan Moodley, Kavilan Moodley, Kavilan Moodley, and thony Lasenby. Skynet: an efficient and robust neu-
Constantinos Skordis. Fast and reliable markov chain ral network training tool for machine learning in astron-
monte carlo technique for cosmological parameter estima- omy. Monthly Notices of the Royal Astronomical Society,
tion. Monthly Notices of the Royal Astronomical Society, 441(2):1741–1759, 2014. [arXiv:1309.0790].
356(3):925–936, 2005. [20] Héctor J Hortúa, Riccardo Volpi, Dimitri Marinelli, and
[6] Thejs Brinckmann and Julien Lesgourgues. Montepython Luigi Malagò. Parameter estimation for the cosmic mi-
3: boosted mcmc sampler and other features. Physics of crowave background with bayesian neural networks. Phys-
the Dark Universe, 24:100260, 2019. ical Review D, 102(10):103509, 2020.
[7] Robert L Schuhmann, Benjamin Joachimi, and Hiranya V [21] Andreas Nygaard, Emil Brinch Holm, Steen Hannestad,
Peiris. Gaussianization for fast and accurate inference and Thomas Tram. Connect: A neural network based
from cosmological data. Monthly Notices of the Royal framework for emulating cosmological observables and
Astronomical Society, 459(2):1916–1928, 2016. cosmological parameter inference. Journal of Cosmology
[8] Antony Lewis. Efficient sampling of fast and slow cos- and Astroparticle Physics, 2023(05):025, 2023.
mological parameters. Physical Review D, 87(10):103529, [22] Augusto T Chantada, Susana J Landau, Pavlos Protopa-
2013. pas, Claudia G Scóccola, and Cecilia Garraffo. Nn bundle
[9] Masanori Sato, Kiyotomo Ichiki, and Tsutomu T Takeuchi. method applied to cosmology: an improvement in compu-
Copula cosmology: Constructing a likelihood function. tational times. arXiv preprint arXiv:2311.15955, 2023.
Physical review D, 83(2):023501, 2011. [23] Will Handley. pyBAMBI. https://pybambi.
[10] William A Fendt and Benjamin D Wandelt. Pico: param- readthedocs.io/en/latest/#, 2018. [Online: accessed
eters for the impatient cosmologist. The Astrophysical 9-January-2020].
Journal, 654(1):2, 2007. [24] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
[11] Marcos Pellejero-Ibanez, Raul E Angulo, Giovanni Aricó, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
Matteo Zennaro, Sergio Contreras, and Jens Stücker. Cos- Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An
mological parameter estimation via iterative emulation of imperative style, high-performance deep learning library.
likelihoods. Monthly Notices of the Royal Astronomical Advances in neural information processing systems, 32,
Society, 499(4):5257–5268, 2020. 2019.
[12] Justin Alsing, Tom Charnock, Stephen Feeney, and Ben- [25] John Skilling. Nested sampling. AIP Conference Proceed-
jamin Wandelt. Fast likelihood-free cosmology with neural ings, 735(1):395–405, 2004.
density estimators and active learning. Monthly Notices of [26] John Skilling et al. Nested sampling for general bayesian
the Royal Astronomical Society, 488(3):4440–4458, 2019. computation. Bayesian analysis, 1(4):833–859, 2006.
[arXiv:1903.00007]. [27] Adrian E Raftery. Approximate bayes factors and account-
[13] Adam Moss. Accelerated bayesian inference using deep ing for model uncertainty in generalised linear models.
learning. Monthly Notices of the Royal Astronomical Biometrika, 83(2):251–266, 1996.
Society, 496(1):328–338, 2020. [28] Andrew R Liddle. Information criteria for astrophysical
[14] Hector J Hortua, Riccardo Volpi, Dimitri Marinelli, and model selection. Monthly Notices of the Royal Astronom-
Luigi Malago. Accelerating mcmc algorithms through ical Society: Letters, 377(1):L74–L78, 2007.
bayesian deep networks. arXiv preprint arXiv:2011.14276, [29] Andrew R Liddle, Pia Mukherjee, and David Parkin-
2020. son. Cosmological model selection. arXiv preprint astro-
11
ety, 493(3):3132–3158, 2020. [arXiv:1904.02180]. evidence of the epoch of cosmic re-acceleration. Journal of
[64] Salvatore Ingrassia and Isabella Morlini. Neural network Cosmology and Astroparticle Physics, 2016(05):014, 2016.
modeling for small datasets. Technometrics, 47(3):297– [arXiv:1601.01701].
311, 2005. [78] AL Ratsimbazafy, SI Loubser, SM Crawford, CM Cress,
[65] Hong-Wei Ng, Viet Dung Nguyen, Vassilios Vonikakis, and BA Bassett, RC Nichol, and P Väisänen. Age-dating lumi-
Stefan Winkler. Deep learning for emotion recognition on nous red galaxies observed with the southern african large
small datasets using transfer learning. In Proceedings of telescope. Monthly Notices of the Royal Astronomical
the 2015 ACM on international conference on multimodal Society, 467(3):3239–3254, 2017. [arXiv:1702.00418].
interaction, pages 443–449, 2015. [79] Ashley J Ross, Lado Samushia, Cullan Howlett, Will J
[66] Antonello Pasini. Artificial neural networks for small Percival, Angela Burden, and Marc Manera. The cluster-
dataset analysis. Journal of thoracic disease, 7(5):953, ing of the sdss dr7 main galaxy sample–i. a 4 per cent
2015. distance measure at z= 0.15. Monthly Notices of the Royal
[67] Isidro Gómez-Vargas, Ricardo Medel-Esquivel, Ricardo Astronomical Society, 449(1):835–847, 2015.
García-Salcedo, and J Alberto Vázquez. Neural network [80] Florian Beutler, Chris Blake, Matthew Colless, D Heath
reconstructions for the hubble parameter, growth rate Jones, Lister Staveley-Smith, Lachlan Campbell, Quentin
and distance modulus. The European Physical Journal C, Parker, Will Saunders, and Fred Watson. The 6df galaxy
83(4):304, 2023. survey: baryon acoustic oscillations and the local hubble
[68] Michel Chevallier and David Polarski. Accelerating uni- constant. Monthly Notices of the Royal Astronomical
verses with scaling dark matter. International Journal Society, 416(4):3017–3032, 2011.
of Modern Physics D, 10(02):213–223, 2001. [arXiv: gr- [81] Shadab Alam, Metin Ata, Stephen Bailey, Florian Beutler,
qc/0009008]. Dmitry Bizyaev, Jonathan A Blazek, Adam S Bolton,
[69] Eric V Linder. Exploring the expansion history of the Joel R Brownstein, Angela Burden, Chia-Hsun Chuang,
universe. Physical Review Letters, 90(9):091301, 2003. et al. The clustering of galaxies in the completed sdss-
[arXiv: astro-ph/0208512]. iii baryon oscillation spectroscopic survey: cosmological
[70] Daniel Moshe Scolnic, DO Jones, A Rest, YC Pan, analysis of the dr12 galaxy sample. Monthly Notices of
R Chornock, RJ Foley, ME Huber, R Kessler, Gautham the Royal Astronomical Society, 470(3):2617–2652, 2017.
Narayan, AG Riess, et al. The complete light-curve sam- [arXiv:1607.03155].
ple of spectroscopically confirmed sne ia from pan-starrs1 [82] Metin Ata, Falk Baumgarten, Julian Bautista, Florian
and cosmological constraints from the combined pantheon Beutler, Dmitry Bizyaev, Michael R Blanton, Jonathan A
sample. The Astrophysical Journal, 859(2):101, 2018. Blazek, Adam S Bolton, Jonathan Brinkmann, Joel R
[arXiv:1710.00845]. Brownstein, et al. The clustering of the sdss-iv extended
[71] Raul Jimenez, Licia Verde, Tommaso Treu, and Daniel baryon oscillation spectroscopic survey dr14 quasar sam-
Stern. Constraints on the equation of state of dark energy ple: first measurement of baryon acoustic oscillations
and the hubble constant from stellar ages and the cos- between redshift 0.8 and 2.2. Monthly Notices of the
mic microwave background. The Astrophysical Journal, Royal Astronomical Society, 473(4):4773–4794, 2018.
593(2):622, 2003. [arXiv: astro-ph/0302560]. [83] Michael Blomqvist, Hélion Du Mas Des Bourboux, Vic-
[72] Joan Simon, Licia Verde, and Raul Jimenez. Constraints toria de Sainte Agathe, James Rich, Christophe Balland,
on the redshift dependence of the dark energy potential. Julian E Bautista, Kyle Dawson, Andreu Font-Ribera,
Physical Review D, 71(12):123001, 2005. [arXiv: astro- Julien Guy, Jean-Marc Le Goff, et al. Baryon acoustic
ph/0412269]. oscillations from the cross-correlation of lyα absorption
[73] Daniel Stern, Raul Jimenez, Licia Verde, Marc and quasars in eboss dr14. Astronomy & Astrophysics,
Kamionkowski, and S Adam Stanford. Cosmic chronome- 629:A86, 2019.
ters: constraining the equation of state of dark energy. i: [84] Victoria de Sainte Agathe, Christophe Balland, Hélion
h(z) measurements. Journal of Cosmology and Astropar- Du Mas Des Bourboux, Michael Blomqvist, Julien Guy,
ticle Physics, 2010(02):008, 2010. [arXiv:0907.3149]. James Rich, Andreu Font-Ribera, Matthew M Pieri, Ju-
[74] Michele Moresco, Licia Verde, Lucia Pozzetti, Raul lian E Bautista, Kyle Dawson, et al. Baryon acoustic
Jimenez, and Andrea Cimatti. New constraints on cos- oscillations at z= 2.34 from the correlations of lyα absorp-
mological parameters and neutrino properties using the tion in eboss dr14. Astronomy & Astrophysics, 629:A85,
expansion rate of the universe to z ∼ 1.75. Journal of 2019.
Cosmology and Astroparticle Physics, 2012(07):053, 2012. [85] Bryan Sagredo, Savvas Nesseris, and Domenico Sapone.
[arXiv:1201.6658]. Internal robustness of growth rate data. Physical Review
[75] Cong Zhang, Han Zhang, Shuo Yuan, Siqi Liu, Tong-Jie D, 98(8):083543, 2018. [arXiv:1806.10822].
Zhang, and Yan-Chun Sun. Four new observational h(z) [86] Aaditya Ramdas, Nicolás García Trillos, and Marco Cu-
data from luminous red galaxies in the sloan digital sky turi. On wasserstein two-sample testing and related fami-
survey data release seven. Research in Astronomy and lies of nonparametric tests. Entropy, 19(2):47, 2017.
Astrophysics, 14(10):1221, 2014. [arXiv:1207.4541]. [87] David W Hogg and Daniel Foreman-Mackey. Data analy-
[76] Michele Moresco. Raising the bar: new constraints on sis recipes: Using markov chain monte carlo. The Astro-
the hubble parameter with cosmic chronometers at z ∼ physical Journal Supplement Series, 236(1):11, 2018.
2. Monthly Notices of the Royal Astronomical Society:
Letters, 450(1):L16–L20, 2015. [arXiv:1503.01116].
[77] Michele Moresco, Lucia Pozzetti, Andrea Cimatti, Raul
Jimenez, Claudia Maraston, Licia Verde, Daniel Thomas,
Annalisa Citro, Rita Tojeiro, and David Wilkinson. A 6%
measurement of the hubble parameter at z ∼ 0.45: direct
13
TABLE IV: Nested sampling for the eggbox toy model and ΛCDM
using 100 live points. In the NS+GA cases, we generate the first
live points through genetic algorithms with a probability of
mutation equal to 0.5 and a probability of crossover of 0.8.
FIG. 2: Case 1. Posterior plots for CPL Pantheon+HD+BAO with the proposed methods in this work.
15
FIG. 3: Case 2. Posterior plots for CPL using Pantheon+HD+BAO with the proposed methods in this work. We use 4000 live points.
16
FIG. 4: Case 3. 2D posterior plots for CPL with curvature using Pantheon+HD+BAO+fσ8 +Planck with the proposed methods in this
work. Using 4000 points and considering 8 free parameters. In this case, because of the complexity, there were three neural networks
trained before to substitute the likelihood function, however, the Bayesian inference process using our method was 19.6% faster.