-
Gravitational Duals from Equations of State
Authors:
Yago Bea,
Raul Jimenez,
David Mateos,
Shuheng Liu,
Pavlos Protopapas,
Pedro Tarancón-Álvarez,
Pablo Tejerina-Pérez
Abstract:
Holography relates gravitational theories in five dimensions to four-dimensional quantum field theories in flat space. Under this map, the equation of state of the field theory is encoded in the black hole solutions of the gravitational theory. Solving the five-dimensional Einstein's equations to determine the equation of state is an algorithmic, direct problem. Determining the gravitational theor…
▽ More
Holography relates gravitational theories in five dimensions to four-dimensional quantum field theories in flat space. Under this map, the equation of state of the field theory is encoded in the black hole solutions of the gravitational theory. Solving the five-dimensional Einstein's equations to determine the equation of state is an algorithmic, direct problem. Determining the gravitational theory that gives rise to a prescribed equation of state is a much more challenging, inverse problem. We present a novel approach to solve this problem based on physics-informed neural networks. The resulting algorithm is not only data-driven but also informed by the physics of the Einstein's equations. We successfully apply it to theories with crossovers, first- and second-order phase transitions.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Generating Images of the M87* Black Hole Using GANs
Authors:
Arya Mohan,
Pavlos Protopapas,
Keerthi Kunnumkai,
Cecilia Garraffo,
Lindy Blackburn,
Koushik Chatterjee,
Sheperd S. Doeleman,
Razieh Emami,
Christian M. Fromm,
Yosuke Mizuno,
Angelo Ricarte
Abstract:
In this paper, we introduce a novel data augmentation methodology based on Conditional Progressive Generative Adversarial Networks (CPGAN) to generate diverse black hole (BH) images, accounting for variations in spin and electron temperature prescriptions. These generated images are valuable resources for training deep learning algorithms to accurately estimate black hole parameters from observati…
▽ More
In this paper, we introduce a novel data augmentation methodology based on Conditional Progressive Generative Adversarial Networks (CPGAN) to generate diverse black hole (BH) images, accounting for variations in spin and electron temperature prescriptions. These generated images are valuable resources for training deep learning algorithms to accurately estimate black hole parameters from observational data. Our model can generate BH images for any spin value within the range of [-1, 1], given an electron temperature distribution. To validate the effectiveness of our approach, we employ a convolutional neural network to predict the BH spin using both the GRMHD images and the images generated by our proposed model. Our results demonstrate a significant performance improvement when training is conducted with the augmented dataset while testing is performed using GRMHD simulated data, as indicated by the high R2 score. Consequently, we propose that GANs can be employed as cost effective models for black hole image generation and reliably augment training datasets for other parameterization algorithms.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Faster Bayesian inference with neural network bundles and new results for $f(R)$ models
Authors:
Augusto T. Chantada,
Susana J. Landau,
Pavlos Protopapas,
Claudia G. Scóccola,
Cecilia Garraffo
Abstract:
In the last few years, there has been significant progress in the development of machine learning methods tailored to astrophysics and cosmology. We have recently applied one of these, namely, the neural network bundle method, to the cosmological scenario. Moreover, we showed that in some cases the computational times of the Bayesian inference process can be reduced. In this paper, we present an i…
▽ More
In the last few years, there has been significant progress in the development of machine learning methods tailored to astrophysics and cosmology. We have recently applied one of these, namely, the neural network bundle method, to the cosmological scenario. Moreover, we showed that in some cases the computational times of the Bayesian inference process can be reduced. In this paper, we present an improvement to the neural network bundle method that results in a significant reduction of the computational times of the statistical analysis. The novel aspect consists of the use of the neural network bundle method to calculate the luminosity distance of type Ia supernovae, which is usually computed through an integral with numerical methods. In this work, we have applied this improvement to the Hu-Sawicki and Starobinsky $f(R)$ models. We also performed a statistical analysis with data from type Ia supernovae of the Pantheon+ compilation and cosmic chronometers. Another original aspect of this work is the different treatment we provide for the absolute magnitude of type Ia supernovae during the inference process, which results in different estimates of the distortion parameter than the ones obtained in the literature. We show that the statistical analyses carried out with our new method require lower computational times than the ones performed with both the numerical and the neural network method from our previous work. This reduction in time is more significant in the case of a difficult computational problem such as the ones addressed in this work.
△ Less
Submitted 7 June, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Positional Encodings for Light Curve Transformers: Playing with Positions and Attention
Authors:
Daniel Moreno-Cartagena,
Guillermo Cabrera-Vives,
Pavlos Protopapas,
Cristobal Donoso-Oliva,
Manuel Pérez-Carrasco,
Martina Cádiz-Leyton
Abstract:
We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and magnitude distributions using various positional encodings (PEs). We proposed a new approach to incorporate the temporal information directly to the output of the last attention layer. Our results indicated that using trainable PEs lead to significant improvements i…
▽ More
We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and magnitude distributions using various positional encodings (PEs). We proposed a new approach to incorporate the temporal information directly to the output of the last attention layer. Our results indicated that using trainable PEs lead to significant improvements in the transformer performances and training times. Our proposed PE on attention can be trained faster than the traditional non-trainable PE transformer while achieving competitive results when transfered to other datasets.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Improving astroBERT using Semantic Textual Similarity
Authors:
Felix Grezes,
Thomas Allen,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Golnaz Shapurian,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Shinyi Chen,
Jennifer Koch,
Taylor Jacovich,
Pavlos Protopapas
Abstract:
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we:
- announce the first…
▽ More
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we:
- announce the first public release of the astroBERT language model;
- show how astroBERT improves over existing public language models on astrophysics specific tasks;
- and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.
△ Less
Submitted 29 November, 2022;
originally announced December 2022.
-
Semi-Supervised Classification and Clustering Analysis for Variable Stars
Authors:
R. Pantoja,
M. Catelan,
K. Pichara,
P. Protopapas
Abstract:
The immense amount of time series data produced by astronomical surveys has called for the use of machine learning algorithms to discover and classify several million celestial sources. In the case of variable stars, supervised learning approaches have become commonplace. However, this needs a considerable collection of expert-labeled light curves to achieve adequate performance, which is costly t…
▽ More
The immense amount of time series data produced by astronomical surveys has called for the use of machine learning algorithms to discover and classify several million celestial sources. In the case of variable stars, supervised learning approaches have become commonplace. However, this needs a considerable collection of expert-labeled light curves to achieve adequate performance, which is costly to construct. To solve this problem, we introduce two approaches. First, a semi-supervised hierarchical method, which requires substantially less trained data than supervised methods. Second, a clustering analysis procedure that finds groups that may correspond to classes or sub-classes of variable stars. Both methods are primarily supported by dimensionality reduction of the data for visualization and to avoid the curse of dimensionality. We tested our methods with catalogs collected from OGLE, CSS, and Gaia surveys. The semi-supervised method reaches a performance of around 90\% for all of our three selected catalogs of variable stars using only $5\%$ of the data in the training. This method is suitable for classifying the main classes of variable stars when there is only a small amount of training data. Our clustering analysis confirms that most of the clusters found have a purity over 90\% with respect to classes and 80\% with respect to sub-classes, suggesting that this type of analysis can be used in large-scale variability surveys as an initial step to identify which classes or sub-classes of variable stars are present in the data and/or to build training sets, among many other possible applications.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Improving Astronomical Time-series Classification via Data Augmentation with Generative Adversarial Networks
Authors:
Germán García-Jara,
Pavlos Protopapas,
Pablo A. Estévez
Abstract:
Due to the latest advances in technology, telescopes with significant sky coverage will produce millions of astronomical alerts per night that must be classified both rapidly and automatically. Currently, classification consists of supervised machine learning algorithms whose performance is limited by the number of existing annotations of astronomical objects and their highly imbalanced class dist…
▽ More
Due to the latest advances in technology, telescopes with significant sky coverage will produce millions of astronomical alerts per night that must be classified both rapidly and automatically. Currently, classification consists of supervised machine learning algorithms whose performance is limited by the number of existing annotations of astronomical objects and their highly imbalanced class distributions. In this work, we propose a data augmentation methodology based on Generative Adversarial Networks (GANs) to generate a variety of synthetic light curves from variable stars. Our novel contributions, consisting of a resampling technique and an evaluation metric, can assess the quality of generative models in unbalanced datasets and identify GAN-overfitting cases that the Fréchet Inception Distance does not reveal. We applied our proposed model to two datasets taken from the Catalina and Zwicky Transient Facility surveys. The classification accuracy of variable stars is improved significantly when training with synthetic data and testing with real data with respect to the case of using only real data.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Cosmology-informed neural networks to solve the background dynamics of the Universe
Authors:
Augusto T. Chantada,
Susana J. Landau,
Pavlos Protopapas,
Claudia G. Scóccola,
Cecilia Garraffo
Abstract:
The field of machine learning has drawn increasing interest from various other fields due to the success of its methods at solving a plethora of different problems. An application of these has been to train artificial neural networks to solve differential equations without the need of a numerical solver. This particular application offers an alternative to conventional numerical methods, with adva…
▽ More
The field of machine learning has drawn increasing interest from various other fields due to the success of its methods at solving a plethora of different problems. An application of these has been to train artificial neural networks to solve differential equations without the need of a numerical solver. This particular application offers an alternative to conventional numerical methods, with advantages such as lower memory required to store solutions, parallelization, and, in some cases, a lower overall computational cost than its numerical counterparts. In this work, we train artificial neural networks to represent a bundle of solutions of the differential equations that govern the background dynamics of the Universe for four different models. The models we have chosen are $Λ\mathrm{CDM}$, the Chevallier-Polarski-Linder parametric dark energy model, a quintessence model with an exponential potential, and the Hu-Sawicki $f(R)$ model. We use the solutions that the networks provide to perform statistical analyses to estimate the values of each model's parameters with observational data; namely, estimates of the Hubble parameter from cosmic chronometers, type Ia supernovae data from the Pantheon compilation, and measurements from baryon acoustic oscillations. The results we obtain for all models match similar estimations done in the literature using numerical solvers. In addition, we estimate the error of the solutions that the trained networks provide by comparing them with the analytical solution when there is one, or to a high-precision numerical solution when there is not. Through these estimations we find that the error of the solutions is at most $\sim1\%$ in the region of the parameter space that concerns the $95\%$ confidence regions that we find using the data, for all models and all statistical analyses performed in this work.
△ Less
Submitted 20 March, 2023; v1 submitted 5 May, 2022;
originally announced May 2022.
-
ASTROMER: A transformer-based embedding for the representation of light curves
Authors:
C. Donoso-Oliva,
I. Becker,
P. Protopapas,
G. Cabrera-Vives,
Vishnu M.,
Harsh Vardhan
Abstract:
Taking inspiration from natural language embeddings, we present ASTROMER, a transformer-based model to create representations of light curves. ASTROMER was pre-trained in a self-supervised manner, requiring no human-labeled data. We used millions of R-band light sequences to adjust the ASTROMER weights. The learned representation can be easily adapted to other surveys by re-training ASTROMER on ne…
▽ More
Taking inspiration from natural language embeddings, we present ASTROMER, a transformer-based model to create representations of light curves. ASTROMER was pre-trained in a self-supervised manner, requiring no human-labeled data. We used millions of R-band light sequences to adjust the ASTROMER weights. The learned representation can be easily adapted to other surveys by re-training ASTROMER on new sources. The power of ASTROMER consists of using the representation to extract light curve embeddings that can enhance the training of other models, such as classifiers or regressors. As an example, we used ASTROMER embeddings to train two neural-based classifiers that use labeled variable stars from MACHO, OGLE-III, and ATLAS. In all experiments, ASTROMER-based classifiers outperformed a baseline recurrent neural network trained on light curves directly when limited labeled data was available. Furthermore, using ASTROMER embeddings decreases computational resources needed while achieving state-of-the-art results. Finally, we provide a Python library that includes all the functionalities employed in this work. The library, main code, and pre-trained weights are available at https://github.com/astromer-science
△ Less
Submitted 9 November, 2022; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Building astroBERT, a language model for Astronomy & Astrophysics
Authors:
Felix Grezes,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Golnaz Shapurian,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Stephen McDonald,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Nemanja Martinovic,
Shinyi Chen,
Chris Tanner,
Pavlos Protopapas
Abstract:
The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and…
▽ More
The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are developing our own named entity recognition tool. We present here our preliminary results and lessons learned.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
StelNet: Hierarchical Neural Network for Automatic Inference in Stellar Characterization
Authors:
Cecilia Garraffo,
Pavlos Protopapas,
Jeremy J. Drake,
Ignacio Becker,
Phillip Cargile
Abstract:
Characterizing the fundamental parameters of stars from observations is crucial for studying the stars themselves, their planets, and the galaxy as a whole. Stellar evolution theory predicting the properties of stars as a function of stellar age and mass enables translating observables into physical stellar parameters by fitting the observed data to synthetic isochrones. However, the complexity of…
▽ More
Characterizing the fundamental parameters of stars from observations is crucial for studying the stars themselves, their planets, and the galaxy as a whole. Stellar evolution theory predicting the properties of stars as a function of stellar age and mass enables translating observables into physical stellar parameters by fitting the observed data to synthetic isochrones. However, the complexity of overlapping evolutionary tracks often makes this task numerically challenging, and with a precision that can be highly variable, depending on the area of the parameter space the observation lies in. This work presents StelNet, a Deep Neural Network trained on stellar evolutionary tracks that quickly and accurately predicts mass and age from absolute luminosity and effective temperature for stars with close to solar metallicity. The underlying model makes no assumption on the evolutionary stage and includes the pre-main sequence phase. We use bootstrapping and train many models to quantify the uncertainty of the model. To break the model's intrinsic degeneracy resulting from overlapping evolutionary paths, we also built a hierarchical model that retrieves realistic posterior probability distributions of the stellar mass and age. We further test and train StelNet using a sample of stars with well-determined masses and ages from the literature.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
The effect of phased recurrent units in the classification of multiple catalogs of astronomical lightcurves
Authors:
C. Donoso-Oliva,
G. Cabrera-Vives,
P. Protopapas,
R. Carrasco-Davis,
P. A. Estevez
Abstract:
In the new era of very large telescopes, where data is crucial to expand scientific knowledge, we have witnessed many deep learning applications for the automatic classification of lightcurves. Recurrent neural networks (RNNs) are one of the models used for these applications, and the LSTM unit stands out for being an excellent choice for the representation of long time series. In general, RNNs as…
▽ More
In the new era of very large telescopes, where data is crucial to expand scientific knowledge, we have witnessed many deep learning applications for the automatic classification of lightcurves. Recurrent neural networks (RNNs) are one of the models used for these applications, and the LSTM unit stands out for being an excellent choice for the representation of long time series. In general, RNNs assume observations at discrete times, which may not suit the irregular sampling of lightcurves. A traditional technique to address irregular sequences consists of adding the sampling time to the network's input, but this is not guaranteed to capture sampling irregularities during training. Alternatively, the Phased LSTM unit has been created to address this problem by updating its state using the sampling times explicitly. In this work, we study the effectiveness of the LSTM and Phased LSTM based architectures for the classification of astronomical lightcurves. We use seven catalogs containing periodic and nonperiodic astronomical objects. Our findings show that LSTM outperformed PLSTM on 6/7 datasets. However, the combination of both units enhances the results in all datasets.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker
Authors:
F. Förster,
G. Cabrera-Vives,
E. Castillo-Navarrete,
P. A. Estévez,
P. Sánchez-Sáez,
J. Arredondo,
F. E. Bauer,
R. Carrasco-Davis,
M. Catelan,
F. Elorrieta,
S. Eyheramendy,
P. Huijse,
G. Pignata,
E. Reyes,
I. Reyes,
D. Rodríguez-Mancini,
D. Ruz-Mieres,
C. Valenzuela,
I. Alvarez-Maldonado,
N. Astorga,
J. Borissova,
A. Clocchiatti,
D. De Cicco,
C. Donoso-Oliva,
M. J. Graham
, et al. (15 additional authors not shown)
Abstract:
We introduce the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker, an astronomical alert broker designed to provide a rapid and self--consistent classification of large etendue telescope alert streams, such as that provided by the Zwicky Transient Facility (ZTF) and, in the future, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). ALeRCE is a Chilean--l…
▽ More
We introduce the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker, an astronomical alert broker designed to provide a rapid and self--consistent classification of large etendue telescope alert streams, such as that provided by the Zwicky Transient Facility (ZTF) and, in the future, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). ALeRCE is a Chilean--led broker run by an interdisciplinary team of astronomers and engineers, working to become intermediaries between survey and follow--up facilities. ALeRCE uses a pipeline which includes the real--time ingestion, aggregation, cross--matching, machine learning (ML) classification, and visualization of the ZTF alert stream. We use two classifiers: a stamp--based classifier, designed for rapid classification, and a light--curve--based classifier, which uses the multi--band flux evolution to achieve a more refined classification. We describe in detail our pipeline, data products, tools and services, which are made public for the community (see \url{https://alerce.science}). Since we began operating our real--time ML classification of the ZTF alert stream in early 2019, we have grown a large community of active users around the globe. We describe our results to date, including the real--time processing of $9.7\times10^7$ alerts, the stamp classification of $1.9\times10^7$ objects, the light curve classification of $8.5\times10^5$ objects, the report of 3088 supernova candidates, and different experiments using LSST-like alert streams. Finally, we discuss the challenges ahead to go from a single-stream of alerts such as ZTF to a multi--stream ecosystem dominated by LSST.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
Scalable End-to-end Recurrent Neural Network for Variable star classification
Authors:
Ignacio Becker,
Karim Pichara,
Márcio Catelan,
Pavlos Protopapas,
Carlos Aguirre,
Fatemeh Nikzat
Abstract:
During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine learning techniques. Traditionally, light curves are represented as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large datasets such as the LSST cannot be applied.…
▽ More
During the last decade, considerable effort has been made to perform automatic classification of variable stars using machine learning techniques. Traditionally, light curves are represented as a vector of descriptors or features used as input for many algorithms. Some features are computationally expensive, cannot be updated quickly and hence for large datasets such as the LSST cannot be applied. Previous work has been done to develop alternative unsupervised feature extraction algorithms for light curves, but the cost of doing so still remains high. In this work, we propose an end-to-end algorithm that automatically learns the representation of light curves that allows an accurate automatic classification. We study a series of deep learning architectures based on Recurrent Neural Networks and test them in automated classification scenarios. Our method uses minimal data preprocessing, can be updated with a low computational cost for new observations and light curves, and can scale up to massive datasets. We transform each light curve into an input matrix representation whose elements are the differences in time and magnitude, and the outputs are classification probabilities. We test our method in three surveys: OGLE-III, Gaia and WISE. We obtain accuracies of about $95\%$ in the main classes and $75\%$ in the majority of subclasses. We compare our results with the Random Forest classifier and obtain competitive accuracies while being faster and scalable. The analysis shows that the computational complexity of our approach grows up linearly with the light curve size, while the traditional approach cost grows as $N\log{(N)}$.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Streaming Classification of Variable Stars
Authors:
Lukas Zorich,
Karim Pichara,
Pavlos Protopapas
Abstract:
In the last years, automatic classification of variable stars has received substantial attention. Using machine learning techniques for this task has proven to be quite useful. Typically, machine learning classifiers used for this task require to have a fixed training set, and the training process is performed offline. Upcoming surveys such as the Large Synoptic Survey Telescope (LSST) will genera…
▽ More
In the last years, automatic classification of variable stars has received substantial attention. Using machine learning techniques for this task has proven to be quite useful. Typically, machine learning classifiers used for this task require to have a fixed training set, and the training process is performed offline. Upcoming surveys such as the Large Synoptic Survey Telescope (LSST) will generate new observations daily, where an automatic classification system able to create alerts online will be mandatory. A system with those characteristics must be able to update itself incrementally. Unfortunately, after training, most machine learning classifiers do not support the inclusion of new observations in light curves, they need to re-train from scratch. Naively re-training from scratch is not an option in streaming settings, mainly because of the expensive pre-processing routines required to obtain a vector representation of light curves (features) each time we include new observations. In this work, we propose a streaming probabilistic classification model; it uses a set of newly designed features that work incrementally. With this model, we can have a machine learning classifier that updates itself in real time with new observations. To test our approach, we simulate a streaming scenario with light curves from CoRot, OGLE and MACHO catalogs. Results show that our model achieves high classification performance, staying an order of magnitude faster than traditional classification approaches.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
An Information Theory Approach on Deciding Spectroscopic Follow Ups
Authors:
Javiera Astudillo,
Pavlos Protopapas,
Karim Pichara,
Pablo Huijse
Abstract:
Classification and characterization of variable phenomena and transient phenomena are critical for astrophysics and cosmology. These objects are commonly studied using photometric time series or spectroscopic data. Given that many ongoing and future surveys are in time-domain and given that adding spectra provide further insights but requires more observational resources, it would be valuable to k…
▽ More
Classification and characterization of variable phenomena and transient phenomena are critical for astrophysics and cosmology. These objects are commonly studied using photometric time series or spectroscopic data. Given that many ongoing and future surveys are in time-domain and given that adding spectra provide further insights but requires more observational resources, it would be valuable to know which objects should we prioritize to have spectrum in addition to time series. We propose a methodology in a probabilistic setting that determines a-priory which objects are worth taking spectrum to obtain better insights, where we focus 'insight' as the type of the object (classification). Objects for which we query its spectrum are reclassified using their full spectrum information. We first train two classifiers, one that uses photometric data and another that uses photometric and spectroscopic data together. Then for each photometric object we estimate the probability of each possible spectrum outcome. We combine these models in various probabilistic frameworks (strategies) which are used to guide the selection of follow up observations. The best strategy depends on the intended use, whether it is getting more confidence or accuracy. For a given number of candidate objects (127, equal to 5% of the dataset) for taking spectra, we improve 37% class prediction accuracy as opposed to 20% of a non-naive (non-random) best base-line strategy. Our approach provides a general framework for follow-up strategies and can be extended beyond classification and to include other forms of follow-ups beyond spectroscopy.
△ Less
Submitted 6 November, 2019;
originally announced November 2019.
-
An Algorithm for the Visualization of Relevant Patterns in Astronomical Light Curves
Authors:
Christian Pieringer,
Karim Pichara,
Márcio Catelán,
Pavlos Protopapas
Abstract:
Within the last years, the classification of variable stars with Machine Learning has become a mainstream area of research. Recently, visualization of time series is attracting more attention in data science as a tool to visually help scientists to recognize significant patterns in complex dynamics. Within the Machine Learning literature, dictionary-based methods have been widely used to encode re…
▽ More
Within the last years, the classification of variable stars with Machine Learning has become a mainstream area of research. Recently, visualization of time series is attracting more attention in data science as a tool to visually help scientists to recognize significant patterns in complex dynamics. Within the Machine Learning literature, dictionary-based methods have been widely used to encode relevant parts of image data. These methods intrinsically assign a degree of importance to patches in pictures, according to their contribution in the image reconstruction. Inspired by dictionary-based techniques, we present an approach that naturally provides the visualization of salient parts in astronomical light curves, making the analogy between image patches and relevant pieces in time series. Our approach encodes the most meaningful patterns such that we can approximately reconstruct light curves by just using the encoded information. We test our method in light curves from the OGLE-III and StarLight databases. Our results show that the proposed model delivers an automatic and intuitive visualization of relevant light curve parts, such as local peaks and drops in magnitude.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
Multiband galaxy morphologies for CLASH: a convolutional neural network transferred from CANDELS
Authors:
Manuel Pérez-Carrasco,
Guillermo Cabrera-Vives,
Monserrat Martinez-Marín,
Pierluigi Cerulo,
Ricardo Demarco,
Pavlos Protopapas,
Julio Godoy,
Marc Huertas-Company
Abstract:
We present visual-like morphologies over 16 photometric bands, from ultra-violet to near infrared, for 8,412 galaxies in the Cluster Lensing And Supernova survey with Hubble (CLASH) obtained by a convolutional neural network (CNN) model. Our model follows the CANDELS main morphological classification scheme, obtaining the probability for each galaxy at each CLASH band of being spheroid, disk, irre…
▽ More
We present visual-like morphologies over 16 photometric bands, from ultra-violet to near infrared, for 8,412 galaxies in the Cluster Lensing And Supernova survey with Hubble (CLASH) obtained by a convolutional neural network (CNN) model. Our model follows the CANDELS main morphological classification scheme, obtaining the probability for each galaxy at each CLASH band of being spheroid, disk, irregular, point source, or unclassifiable. Our catalog contains morphologies for each galaxy with Hmag < 24.5 in every filter where the galaxy is observed. We trained an initial CNN model using approximately 7,500 expert eyeball labels from The Cosmic Assembly Near-IR Deep Extragalactic Legacy Survey (CANDELS). We created eyeball labels for 100 randomly selected galaxies per each of the 16-filters set of CLASH (1,600 galaxy images in total), where each image was classified by at least five of us. We use these labels to fine-tune the network in order to accurately predict labels for the CLASH data and to evaluate the performance of our model. We achieve a root-mean-square error of 0.0991 on the test set. We show that our proposed fine-tuning technique reduces the number of labeled images needed for training, as compared to directly training over the CLASH data, and achieves a better performance. This approach is very useful to minimize eyeball labeling efforts when classifying unlabeled data from new surveys. This will become particularly useful for massive datasets such as the ones coming from near future surveys such as EUCLID or the LSST. Our catalog consists of prediction of probabilities for each galaxy by morphology in their different bands and is made publicly available at http://www.inf.udec.cl/~guille/data/Deep-CLASH.csv.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
The High Cadence Transient Survey (HITS): Compilation and characterization of light-curve catalogs
Authors:
Jorge Martínez-Palomera,
Francisco Förster,
Pavlos Protopapas,
Juan Carlos Maureira,
Paulina Lira,
Guillermo Cabrera-Vives,
Pablo Huijse,
Lluis Galbany,
Thomas de Jaeger,
Santiago González-Gaitán,
Gustavo Medina,
Giuliano Pignata,
Jaime San Martín,
Mario Hamuy,
Ricardo R. Muñoz
Abstract:
The High Cadence Transient Survey (HiTS) aims to discover and study transient objects with characteristic timescales between hours and days, such as pulsating, eclipsing and exploding stars. This survey represents a unique laboratory to explore large etendue observations from cadences of about 0.1 days and to test new computational tools for the analysis of large data. This work follows a fully \t…
▽ More
The High Cadence Transient Survey (HiTS) aims to discover and study transient objects with characteristic timescales between hours and days, such as pulsating, eclipsing and exploding stars. This survey represents a unique laboratory to explore large etendue observations from cadences of about 0.1 days and to test new computational tools for the analysis of large data. This work follows a fully \textit{Data Science} approach: from the raw data to the analysis and classification of variable sources. We compile a catalog of ${\sim}15$ million object detections and a catalog of ${\sim}2.5$ million light-curves classified by variability. The typical depth of the survey is $24.2$, $24.3$, $24.1$ and $23.8$ in $u$, $g$, $r$ and $i$ bands, respectively. We classified all point-like non-moving sources by first extracting features from their light-curves and then applying a Random Forest classifier. For the classification, we used a training set constructed using a combination of cross-matched catalogs, visual inspection, transfer/active learning and data augmentation. The classification model consists of several Random Forest classifiers organized in a hierarchical scheme. The classifier accuracy estimated on a test set is approximately $97\%$. In the unlabeled data, $3\,485$ sources were classified as variables, of which $1\,321$ were classified as periodic. Among the periodic classes we discovered with high confidence, 1 $δ$-scutti, 39 eclipsing binaries, 48 rotational variables and 90 RR-Lyrae and for the non-periodic classes we discovered 1 cataclysmic variables, 630 QSO, and 1 supernova candidates. The first data release can be accessed in the project archive of HiTS.
△ Less
Submitted 7 September, 2018; v1 submitted 3 September, 2018;
originally announced September 2018.
-
Deep Learning for Image Sequence Classification of Astronomical Events
Authors:
Rodrigo Carrasco-Davis,
Guillermo Cabrera-Vives,
Francisco Förster,
Pablo A. Estévez,
Pablo Huijse,
Pavlos Protopapas,
Ignacio Reyes,
Jorge Martínez-Palomera,
Cristóbal Donoso
Abstract:
We propose a new sequential classification model for astronomical objects based on a recurrent convolutional neural network (RCNN) which uses sequences of images as inputs. This approach avoids the computation of light curves or difference images. This is the first time that sequences of images are used directly for the classification of variable objects in astronomy. The second contribution of th…
▽ More
We propose a new sequential classification model for astronomical objects based on a recurrent convolutional neural network (RCNN) which uses sequences of images as inputs. This approach avoids the computation of light curves or difference images. This is the first time that sequences of images are used directly for the classification of variable objects in astronomy. The second contribution of this work is the image simulation process. We generate synthetic image sequences that take into account the instrumental and observing conditions, obtaining a realistic, set of movies for each astronomical object. The simulated dataset is used to train our RCNN classifier. This approach allows us to generate datasets to train and test our RCNN model for different astronomical surveys and telescopes. We aim at building a simulated dataset whose distribution is close enough to the real dataset, so that a fine tuning could match the distributions between real and simulated dataset. To test the RCNN classifier trained with the synthetic dataset, we used real-world data from the High cadence Transient Survey (HiTS) obtaining an average recall of 85%, improved to 94% after performing fine tuning with 10 real samples per class. We compare the results of our model with those of a light curve random forest classifier. The proposed RCNN with fine tuning has a similar performance on the HiTS dataset compared to the light curve classifier, trained on an augmented training set with 10 real samples per class. The RCNN approach presents several advantages in an alert stream classification scenario, such as a reduction of the data pre-processing, faster online evaluation and easier performance improvement using a few real data samples. These results encourage us to use this method for alert brokers systems that will process alert streams generated by new telescopes such as the Large Synoptic Survey Telescope.
△ Less
Submitted 7 November, 2018; v1 submitted 10 July, 2018;
originally announced July 2018.
-
Unraveling the Spectral Energy Distributions of Clustered YSOs
Authors:
Juan R. Martínez-Galarza,
Pavlos Protopapas,
Howard A. Smith,
Esteban F. E. Morales
Abstract:
Stars form in clustered environments, but how they form when the available resources are shared is still not well understood. A related question is whether the IMF is in fact universal across galactic environments, a galactic initial mass function (IGIMF), or whether it is an average of local IMFs. One of the long-standing problems in resolving this question and in the study of young clusters is o…
▽ More
Stars form in clustered environments, but how they form when the available resources are shared is still not well understood. A related question is whether the IMF is in fact universal across galactic environments, a galactic initial mass function (IGIMF), or whether it is an average of local IMFs. One of the long-standing problems in resolving this question and in the study of young clusters is observational: the emission from multiple sources is frequently seen as blended because at different wavelengths or with different telescopes the beam sizes are different. The confusion hinders our ability to fully characterize clustered star formation. Here we present a new method that uses a genetic algorithm and Bayesian inference to fit the blended SEDs and images of individual YSOs in confused clusters. We apply this method to the infrared photometry of a sample comprising 70 Spitzer-selected, low-mass ($M_{\rm{cl}}<100~\rm{M}_{\odot}$) young clusters in the galactic plane, and use the derived physical parameters to investigate the distributions of masses and evolutionary stages of clustered YSOs, and the implications of those distributions for studies of the IMF and the different models of star formation. We find that for low-mass clusters composed of class I and class II YSOs, there exists a non-trivial relationship between the total stellar mass of the cluster ($M_{\rm{cl}}$) and the mass of its most massive member ($m_{\rm{max}}$). The properties of the derived correlation are most compatible with the random sampling of a Kroupa IMF, with a fundamental high-mass limit of $150~\rm{M}_{\odot}$. Our results are also compatible with SPH models that predict a dynamical termination of the accretion in protostars, with massive stars undergoing this stopping at later times in their evolution.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.
-
Automatic Survey-Invariant Variable Star Classification
Authors:
Patricio Benavente,
Pavlos Protopapas,
Karim Pichara
Abstract:
Machine learning techniques have been successfully used to classify variable stars on widely-studied astronomical surveys. These datasets have been available to astronomers long enough, thus allowing them to perform deep analysis over several variable sources and generating useful catalogs with identified variable stars. The products of these studies are labeled data that enable supervised learnin…
▽ More
Machine learning techniques have been successfully used to classify variable stars on widely-studied astronomical surveys. These datasets have been available to astronomers long enough, thus allowing them to perform deep analysis over several variable sources and generating useful catalogs with identified variable stars. The products of these studies are labeled data that enable supervised learning models to be trained successfully. However, when these models are blindly applied to data from new sky surveys their performance drops significantly. Furthermore, unlabeled data becomes available at a much higher rate than its labeled counterpart, since labeling is a manual and time-consuming effort. Domain adaptation techniques aim to learn from a domain where labeled data is available, the \textit{source domain}, and through some adaptation perform well on a different domain, the \textit{target domain}. We propose a full probabilistic model that represents the joint distribution of features from two surveys as well as a probabilistic transformation of the features between one survey to the other. This allows us to transfer labeled data to a study where it is not available and to effectively run a variable star classification model in a new survey. Our model represents the features of each domain as a Gaussian mixture and models the transformation as a translation, rotation and scaling of each separate component. We perform tests using three different variability catalogs: EROS, MACHO, and HiTS, presenting differences among them, such as the amount of observations per star, cadence, observational time and optical bands observed, among others.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.
-
Uncertain classification of Variable Stars: handling observational GAPS and noise
Authors:
Nicolas Castro,
Pavlos Protopapas,
Karim Pichara
Abstract:
Automatic classification methods applied to sky surveys have revolutionized the astronomical target selection process. Most surveys generate a vast amount of time series, or \quotes{lightcurves}, that represent the brightness variability of stellar objects in time. Unfortunately, lightcurves' observations take several years to be completed, producing truncated time series that generally remain wit…
▽ More
Automatic classification methods applied to sky surveys have revolutionized the astronomical target selection process. Most surveys generate a vast amount of time series, or \quotes{lightcurves}, that represent the brightness variability of stellar objects in time. Unfortunately, lightcurves' observations take several years to be completed, producing truncated time series that generally remain without the application of automatic classifiers until they are finished. This happens because state of the art methods rely on a variety of statistical descriptors or features that present an increasing degree of dispersion when the number of observations decreases, which reduces their precision. In this paper we propose a novel method that increases the performance of automatic classifiers of variable stars by incorporating the deviations that scarcity of observations produces. Our method uses Gaussian Process Regression to form a probabilistic model of each lightcurve's observations. Then, based on this model, bootstrapped samples of the time series features are generated. Finally a bagging approach is used to improve the overall performance of the classification. We perform tests on the MACHO and OGLE catalogs, results show that our method classifies effectively some variability classes using a small fraction of the original observations. For example, we found that RR Lyrae stars can be classified with around 80\% of accuracy just by observing the first 5\% of the whole lightcurves' observations in MACHO and OGLE catalogs. We believe these results prove that, when studying lightcurves, it is important to consider the features' error and how the measurement process impacts it.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.
-
A dwarf planet class object in the 21:5 resonance with Neptune
Authors:
Matthew J. Holman,
Matthew J. Payne,
Wesley Fraser,
Pedro Lacerda,
Michele T. Bannister,
Michael Lackner,
Ying-Tung Chen,
Hsing Wen Lin,
Kenneth W. Smith,
Rositako Kotanekova,
David Young,
K. Chambers,
S. Chastel,
L. Denneau,
A. Fitzsimmons,
H. Flewelling,
Tommy Grav,
M. Huber,
Nick Induni,
Rolf-Peter Kudritzki,
Alex Krolewski,
R. Jedicke,
N. Kaiser,
E. Lilly,
E. Magnier
, et al. (11 additional authors not shown)
Abstract:
We report the discovery of a $H_r = 3.4\pm0.1$ dwarf planet candidate by the Pan-STARRS Outer Solar System Survey. 2010 JO$_{179}$ is red with $(g-r)=0.88 \pm 0.21$, roughly round, and slowly rotating, with a period of $30.6$ hr. Estimates of its albedo imply a diameter of 600--900~km. Observations sampling the span between 2005--2016 provide an exceptionally well-determined orbit for 2010 JO…
▽ More
We report the discovery of a $H_r = 3.4\pm0.1$ dwarf planet candidate by the Pan-STARRS Outer Solar System Survey. 2010 JO$_{179}$ is red with $(g-r)=0.88 \pm 0.21$, roughly round, and slowly rotating, with a period of $30.6$ hr. Estimates of its albedo imply a diameter of 600--900~km. Observations sampling the span between 2005--2016 provide an exceptionally well-determined orbit for 2010 JO$_{179}$, with a semi-major axis of $78.307\pm0.009$ au, distant orbits known to this precision are rare. We find that 2010 JO$_{179}$ librates securely within the 21:5 mean-motion resonance with Neptune on hundred-megayear time scales, joining the small but growing set of known distant dwarf planets on metastable resonant orbits. These imply a substantial trans-Neptunian population that shifts between stability in high-order resonances, the detached population, and the eroding population of the scattering disk.
△ Less
Submitted 15 September, 2017;
originally announced September 2017.
-
Robust period estimation using mutual information for multi-band light curves in the synoptic survey era
Authors:
Pablo Huijse,
Pablo A. Estevez,
Francisco Forster,
Scott F. Daniel,
Andrew J. Connolly,
Pavlos Protopapas,
Rodrigo Carrasco,
Jose C. Principe
Abstract:
The Large Synoptic Survey Telescope (LSST) will produce an unprecedented amount of light curves using six optical bands. Robust and efficient methods that can aggregate data from multidimensional sparsely-sampled time series are needed. In this paper we present a new method for light curve period estimation based on the quadratic mutual information (QMI). The proposed method does not assume a part…
▽ More
The Large Synoptic Survey Telescope (LSST) will produce an unprecedented amount of light curves using six optical bands. Robust and efficient methods that can aggregate data from multidimensional sparsely-sampled time series are needed. In this paper we present a new method for light curve period estimation based on the quadratic mutual information (QMI). The proposed method does not assume a particular model for the light curve nor its underlying probability density and it is robust to non-Gaussian noise and outliers. By combining the QMI from several bands the true period can be estimated even when no single-band QMI yields the period. Period recovery performance as a function of average magnitude and sample size is measured using 30,000 synthetic multi-band light curves of RR Lyrae and Cepheid variables generated by the LSST Operations and Catalog simulators. The results show that aggregating information from several bands is highly beneficial in LSST sparsely-sampled time series, obtaining an absolute increase in period recovery rate up to 50%. We also show that the QMI is more robust to noise and light curve length (sample size) than the multiband generalizations of the Lomb Scargle and Analysis of Variance periodograms, recovering the true period in 10-30% more cases than its competitors. A python package containing efficient Cython implementations of the QMI and other methods is provided.
△ Less
Submitted 11 September, 2017;
originally announced September 2017.
-
Detection of Time Lags Between Quasar Continuum Emission Bands based on Pan-STARRS Light-curves
Authors:
Yan-Fei Jiang,
Paul J. Green,
Jenny E. Greene,
Eric Morganson,
Yue Shen,
Anna Pancoast,
Chelsea L. MacLeod,
Scott F. Anderson,
W. N. Brandt,
C. J. Grier,
H. W. Rix,
John J. Ruan,
Pavlos Protopapas,
Caroline Scott,
W. S. Burgett,
K. W. Hodapp,
M. E. Huber,
N. Kaiser,
R. P. Kudritzki,
E. A. Magnier,
N. Metcalfe,
J. T. Tonry,
R. J. Wainscoat,
C. Waters
Abstract:
We study the time lags between the continuum emission of quasars at different wavelengths, based on more than four years of multi-band ($g$, $r$, $i$, $z$) light-curves in the Pan-STARRS Medium Deep Fields. As photons from different bands emerge from different radial ranges in the accretion disk, the lags constrain the sizes of the accretion disks. We select 240 quasars with redshifts…
▽ More
We study the time lags between the continuum emission of quasars at different wavelengths, based on more than four years of multi-band ($g$, $r$, $i$, $z$) light-curves in the Pan-STARRS Medium Deep Fields. As photons from different bands emerge from different radial ranges in the accretion disk, the lags constrain the sizes of the accretion disks. We select 240 quasars with redshifts $z \approx 1$ or $z \approx 0.3$ that are relatively emission line free. The light curves are sampled from day to month timescales, which makes it possible to detect lags on the scale of the light crossing time of the accretion disks. With the code JAVELIN, we detect typical lags of several days in the rest frame between the $g$ band and the $riz$ bands. The detected lags are $\sim 2-3$ times larger than the light crossing time estimated from the standard thin disk model, consistent with the recently measured lag in NGC5548 and micro-lensing measurements of quasars. The lags in our sample are found to increase with increasing luminosity. Furthermore, the increase in lags going from $g-r$ to $g-i$ and then to $g-z$ is slower than predicted in the thin disk model, particularly for high luminosity quasars. The radial temperature profile in the disk must be different from what is assumed. We also find evidence that the lags decrease with increasing line ratios between ultraviolet FeII lines and MgII, which may point to changes in the accretion disk structure at higher metallicity.
△ Less
Submitted 27 December, 2016;
originally announced December 2016.
-
Clustering Based Feature Learning on Variable Stars
Authors:
Cristóbal Mackenzie,
Karim Pichara,
Pavlos Protopapas
Abstract:
The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on t…
▽ More
The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our knowledge, the first unsupervised feature learning algorithm designed for variable stars. Our method first extracts a large number of lightcurve subsequences from a given set of photometric data, which are then clustered to find common local patterns in the time series. Representatives of these patterns, called exemplars, are then used to transform lightcurves of a labeled set into a new representation that can then be used to train an automatic classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias generated when the learning process is done only with labeled data. We test our method on MACHO and OGLE datasets; the results show that the classification performance we achieve is as good and in some cases better than the performance achieved using traditional features, while the computational cost is significantly lower.
△ Less
Submitted 29 February, 2016;
originally announced February 2016.
-
Meta Classification for Variable Stars
Authors:
Karim Pichara,
Pavlos Protopapas,
Daniel León
Abstract:
The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New science problems emerge and it is criti…
▽ More
The need for the development of automatic tools to explore astronomical databases has been recognized since the inception of CCDs and modern computers. Astronomers already have developed solutions to tackle several science problems, such as automatic classification of stellar objects, outlier detection, and globular clusters identification, among others. New science problems emerge and it is critical to be able to re-use the models learned before, without rebuilding everything from the beginning when the science problem changes. In this paper, we propose a new meta-model that automatically integrates existing classification models of variable stars. The proposed meta-model incorporates existing models that are trained in a different context, answering different questions and using different representations of data. Conventional mixture of experts algorithms in machine learning literature can not be used since each expert (model) uses different inputs. We also consider computational complexity of the model by using the most expensive models only when it is necessary. We test our model with EROS-2 and MACHO datasets, and we show that we solve most of the classification challenges only by training a meta-model to learn how to integrate the previous experts.
△ Less
Submitted 12 January, 2016;
originally announced January 2016.
-
Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases
Authors:
Pablo Huijse,
Pablo A. Estevez,
Pavlos Protopapas,
Jose C. Principe,
Pablo Zegers
Abstract:
Time-domain astronomy (TDA) is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new astronomical sky surveys. For example, the Large Synoptic Survey Telescope (LSST), which will begin operations in northern Chile in 2022, will generate a nearly 150 Petabyte imaging dataset of the southern hemisphere sky. The LSST will stream…
▽ More
Time-domain astronomy (TDA) is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new astronomical sky surveys. For example, the Large Synoptic Survey Telescope (LSST), which will begin operations in northern Chile in 2022, will generate a nearly 150 Petabyte imaging dataset of the southern hemisphere sky. The LSST will stream data at rates of 2 Terabytes per hour, effectively capturing an unprecedented movie of the sky. The LSST is expected not only to improve our understanding of time-varying astrophysical objects, but also to reveal a plethora of yet unknown faint and fast-varying phenomena. To cope with a change of paradigm to data-driven astronomy, the fields of astroinformatics and astrostatistics have been created recently. The new data-oriented paradigms for astronomy combine statistics, data mining, knowledge discovery, machine learning and computational intelligence, in order to provide the automated and robust methods needed for the rapid detection and classification of known astrophysical objects as well as the unsupervised characterization of novel phenomena. In this article we present an overview of machine learning and computational intelligence applications to TDA. Future big data challenges and new lines of research in TDA, focusing on the LSST, are identified and discussed from the viewpoint of computational intelligence/machine learning. Interdisciplinary collaboration will be required to cope with the challenges posed by the deluge of astronomical data coming from the LSST.
△ Less
Submitted 25 September, 2015;
originally announced September 2015.
-
FATS: Feature Analysis for Time Series
Authors:
Isadora Nun,
Pavlos Protopapas,
Brandon Sim,
Ming Zhu,
Rahul Dave,
Nicolas Castro,
Karim Pichara
Abstract:
In this paper, we present the FATS (Feature Analysis for Time Series) library. FATS is a Python library which facilitates and standardizes feature extraction for time series data. In particular, we focus on one application: feature extraction for astronomical light curve data, although the library is generalizable for other uses. We detail the methods and features implemented for light curve analy…
▽ More
In this paper, we present the FATS (Feature Analysis for Time Series) library. FATS is a Python library which facilitates and standardizes feature extraction for time series data. In particular, we focus on one application: feature extraction for astronomical light curve data, although the library is generalizable for other uses. We detail the methods and features implemented for light curve analysis, and present examples for its usage.
△ Less
Submitted 31 August, 2015; v1 submitted 29 May, 2015;
originally announced June 2015.
-
A Novel, Fully Automated Pipeline for Period Estimation in the EROS 2 Data Set
Authors:
Pavlos Protopapas,
Pablo Huijse,
Pablo A. Estevez,
Pablo Zegers,
Jose C. Principe
Abstract:
We present a new method to discriminate periodic from non-periodic irregularly sampled lightcurves. We introduce a periodic kernel and maximize a similarity measure derived from information theory to estimate the periods and a discriminator factor. We tested the method on a dataset containing 100,000 synthetic periodic and non-periodic lightcurves with various periods, amplitudes and shapes genera…
▽ More
We present a new method to discriminate periodic from non-periodic irregularly sampled lightcurves. We introduce a periodic kernel and maximize a similarity measure derived from information theory to estimate the periods and a discriminator factor. We tested the method on a dataset containing 100,000 synthetic periodic and non-periodic lightcurves with various periods, amplitudes and shapes generated using a multivariate generative model. We correctly identified periodic and non-periodic lightcurves with a completeness of 90% and a precision of 95%, for lightcurves with a signal-to-noise ratio (SNR) larger than 0.5. We characterize the efficiency and reliability of the model using these synthetic lightcurves and applied the method on the EROS-2 dataset. A crucial consideration is the speed at which the method can be executed. Using hierarchical search and some simplification on the parameter search we were able to analyze 32.8 million lightcurves in 18 hours on a cluster of GPGPUs. Using the sensitivity analysis on the synthetic dataset, we infer that 0.42% in the LMC and 0.61% in the SMC of the sources show periodic behavior. The training set, the catalogs and source code are all available in http://timemachine.iic.harvard.edu.
△ Less
Submitted 4 December, 2014;
originally announced December 2014.
-
Supervised detection of anomalous light-curves in massive astronomical catalogs
Authors:
Isadora Nun,
Karim Pichara,
Pavlos Protopapas,
Dae-Won Kim
Abstract:
The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. To process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new method to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full…
▽ More
The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. To process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new method to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all the information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. Our method is suitable for exploring massive datasets given that the training process is performed offline. We tested our algorithm on 20 millions light-curves from the MACHO catalog and generated a list of anomalous candidates. We divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post analysis stage by perfoming a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables and X-ray sources. For some outliers there were no additional information. Among them we identified three unknown variability types and few individual outliers that will be followed up for a deeper analysis.
△ Less
Submitted 27 May, 2015; v1 submitted 18 April, 2014;
originally announced April 2014.
-
The EPOCH Project: I. Periodic variable stars in the EROS-2 LMC database
Authors:
Dae-Won Kim,
Pavlos Protopapas,
Coryn A. L. Bailer-Jones,
Yong-Ik Byun,
Seo-Won Chang,
Jean-Baptiste Marquette,
Min-Su Shin
Abstract:
The EPOCH (EROS-2 periodic variable star classification using machine learning) project aims to detect periodic variable stars in the EROS-2 light curve database. In this paper, we present the first result of the classification of periodic variable stars in the EROS-2 LMC database. To classify these variables, we first built a training set by compiling known variables in the Large Magellanic Cloud…
▽ More
The EPOCH (EROS-2 periodic variable star classification using machine learning) project aims to detect periodic variable stars in the EROS-2 light curve database. In this paper, we present the first result of the classification of periodic variable stars in the EROS-2 LMC database. To classify these variables, we first built a training set by compiling known variables in the Large Magellanic Cloud area from the OGLE and MACHO surveys. We crossmatched these variables with the EROS-2 sources and extracted 22 variability features from 28 392 light curves of the corresponding EROS-2 sources. We then used the random forest method to classify the EROS-2 sources in the training set. We designed the model to separate not only $δ$ Scuti stars, RR Lyraes, Cepheids, eclipsing binaries, and long-period variables, the superclasses, but also their subclasses, such as RRab, RRc, RRd, and RRe for RR Lyraes, and similarly for the other variable types. The model trained using only the superclasses shows 99% recall and precision, while the model trained on all subclasses shows 87% recall and precision. We applied the trained model to the entire EROS-2 LMC database, which contains about 29 million sources, and found 117 234 periodic variable candidates. Out of these 117 234 periodic variables, 55 285 have not been discovered by either OGLE or MACHO variability studies. This set comprises 1 906 $δ$ Scuti stars, 6 607 RR Lyraes, 638 Cepheids, 178 Type II Cepheids, 34 562 eclipsing binaries, and 11 394 long-period variables. A catalog of these EROS-2 LMC periodic variable stars will be available online at http://stardb.yonsei.ac.kr and at the CDS website (http://vizier.u-strasbg.fr/viz-bin/VizieR).
△ Less
Submitted 28 March, 2014; v1 submitted 24 March, 2014;
originally announced March 2014.
-
The expansion rate of the intermediate Universe in light of Planck
Authors:
Licia Verde,
Pavlos Protopapas,
Raul Jimenez
Abstract:
We use cosmology-independent measurements of the expansion history in the redshift range 0.1 < z <1.2 and compare them with the Cosmic Microwave Background-derived expansion history predictions. The motivation is to investigate if the tension between the local (cosmology independent) Hubble constant H0 value and the Planck-derived H0 is also present at other redshifts. We conclude that there is no…
▽ More
We use cosmology-independent measurements of the expansion history in the redshift range 0.1 < z <1.2 and compare them with the Cosmic Microwave Background-derived expansion history predictions. The motivation is to investigate if the tension between the local (cosmology independent) Hubble constant H0 value and the Planck-derived H0 is also present at other redshifts. We conclude that there is no tension between Planck and cosmology independent-measurements of the Hubble parameter H(z) at 0.1 < z < 1.2 for the LCDM model (odds of tension are only 1:15, statistically not significant). Considering extensions of the LCDM model does not improve these odds (actually makes them worse), thus favouring the simpler model over its extensions. On the other hand the H(z) data are also not in tension with the local H0 measurements but the combination of all three data-sets shows a highly significant tension (odds ~ 1:400). Thus the new data deepen the mystery of the mismatch between Planck and local H0 measurements, and cannot univocally determine wether it is an effect localised at a particular redshift. Having said this, we find that assuming the NGC4258 maser distance as the correct anchor for H0, brings the odds to comfortable values.
Further, using only the expansion history measurements we constrain, within the LCDM model, H0 = 68.5 +- 3.5 and Omega_m = 0.32 +- 0.05 without relying on any CMB prior. We also address the question of how smooth the expansion history of the universe is given the cosmology independent data and conclude that there is no evidence for deviations from smoothness on the expansion history, neither variations with time in the value of the equation of state of dark energy.
△ Less
Submitted 10 March, 2014;
originally announced March 2014.
-
Pan-STARRS 1 observations of the unusual active Centaur P/2011 S1(Gibbs)
Authors:
H. W. Lin,
Y. T. Chen,
P. Lacerda,
W. H. Ip,
M. Holman,
P. Protopapas,
W. P. Chen,
W. S. Burgett,
K. C. Chambers,
H. Flewelling,
M. E. Huber,
R. Jedicke,
N. Kaiser,
E. A. Magnier,
N. Metcalfe,
P. A. Price
Abstract:
P/2011 S1 (Gibbs) is an outer solar system comet or active Centaur with a similar orbit to that of the famous 29P/Schwassmann-Wachmann 1. P/2011 S1 (Gibbs) has been observed by the Pan-STARRS 1 (PS1) sky survey from 2010 to 2012. The resulting data allow us to perform multi-color studies of the nucleus and coma of the comet. Analysis of PS1 images reveals that P/2011 S1 (Gibbs) has a small nucleus…
▽ More
P/2011 S1 (Gibbs) is an outer solar system comet or active Centaur with a similar orbit to that of the famous 29P/Schwassmann-Wachmann 1. P/2011 S1 (Gibbs) has been observed by the Pan-STARRS 1 (PS1) sky survey from 2010 to 2012. The resulting data allow us to perform multi-color studies of the nucleus and coma of the comet. Analysis of PS1 images reveals that P/2011 S1 (Gibbs) has a small nucleus $< 4$ km radius, with colors $g_{P1}-r_{P1} = 0.5 \pm 0.02$, $r_{P1}-i_{P1} = 0.12 \pm 0.02$ and $i_{P1}-z_{P1} = 0.46 \pm 0.03$. The comet remained active from 2010 to 2012, with a model-dependent mass-loss rate of $\sim100$ kg s$^{-1}$. The mass-loss rate per unit surface area of P/2011 S1 (Gibbs) is as high as that of 29P/Schwassmann-Wachmann 1, making it one of the most active Centaurs. The mass-loss rate also varies with time from $\sim 40$ kg s$^{-1}$ to 150 kg s$^{-1}$. Due to its rather circular orbit, we propose that P/2011 S1 (Gibbs) has 29P/Schwassmann-Wachmann 1-like outbursts that control the outgassing rate. The results indicate that it may have a similar surface composition to that of 29P/Schwassmann-Wachmann 1.
Our numerical simulations show that the future orbital evolution of P/2011 S1 (Gibbs) is more similar to that of the main population of Centaurs than to that of 29P/Schwassmann-Wachmann 1. The results also demonstrate that P/2011 S1 (Gibbs) is dynamically unstable and can only remain near its current orbit for roughly a thousand years.
△ Less
Submitted 25 February, 2014;
originally announced February 2014.
-
Automatic Classification of Variable Stars in Catalogs with missing data
Authors:
Karim Pichara,
Pavlos Protopapas
Abstract:
We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks, a probabilistic graphical model, that allows us to perform inference to pre- dict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilises sampling methods and exp…
▽ More
We present an automatic classification method for astronomical catalogs with missing data. We use Bayesian networks, a probabilistic graphical model, that allows us to perform inference to pre- dict missing values given observed data and dependency relationships between variables. To learn a Bayesian network from incomplete data, we use an iterative algorithm that utilises sampling methods and expectation maximization to estimate the distributions and probabilistic dependencies of variables from data with missing values. To test our model we use three catalogs with missing data (SAGE, 2MASS and UBVI) and one complete catalog (MACHO). We examine how classification accuracy changes when information from missing data catalogs is included, how our method compares to traditional missing data approaches and at what computational cost. Integrating these catalogs with missing data we find that classification of variable objects improves by few percent and by 15% for quasar detection while keeping the computational cost the same.
△ Less
Submitted 29 October, 2013;
originally announced October 2013.
-
Planck and the local Universe: quantifying the tension
Authors:
Licia Verde,
Pavlos Protopapas,
Raul Jimenez
Abstract:
We use the latest Planck constraints, and in particular constraints on the derived parameters (Hubble constant and age of the Universe) for the local universe and compare them with local measurements of the same quantities. We propose a way to quantify whether cosmological parameters constraints from two different experiments are in tension or not. Our statistic, T, is an evidence ratio and theref…
▽ More
We use the latest Planck constraints, and in particular constraints on the derived parameters (Hubble constant and age of the Universe) for the local universe and compare them with local measurements of the same quantities. We propose a way to quantify whether cosmological parameters constraints from two different experiments are in tension or not. Our statistic, T, is an evidence ratio and therefore can be interpreted with the widely used Jeffrey's scale. We find that in the framework of the LCDM model, the Planck inferred two dimensional, joint, posterior distribution for the Hubble constant and age of the Universe is in "strong" tension with the local measurements; the odds being ~ 1:50. We explore several possibilities for explaining this tension and examine the consequences both in terms of unknown errors and deviations from the LCDM model. In some one-parameter LCDM model extensions, tension is reduced whereas in other extensions, tension is instead increased. In particular, small total neutrino masses are favored and a total neutrino mass above 0.15 eV makes the tension "highly significant" (odds ~ 1:150). A consequence of accepting this interpretation of the tension is that the degenerate neutrino hierarchy is highly disfavoured by cosmological data and the direct hierarchy is slightly favored over the inverse.
△ Less
Submitted 28 June, 2013;
originally announced June 2013.
-
An improved quasar detection method in EROS-2 and MACHO LMC datasets
Authors:
Karim Pichara,
Pavlos Protopapas,
Dae-Won Kim,
Jean-Baptiste Marquette,
Patrick Tisserand
Abstract:
We present a new classification method for quasar identification in the EROS-2 and MACHO datasets based on a boosted version of Random Forest classifier. We use a set of variability features including parameters of a continuous auto regressive model. We prove that continuous auto regressive parameters are very important discriminators in the classification process. We create two training sets (one…
▽ More
We present a new classification method for quasar identification in the EROS-2 and MACHO datasets based on a boosted version of Random Forest classifier. We use a set of variability features including parameters of a continuous auto regressive model. We prove that continuous auto regressive parameters are very important discriminators in the classification process. We create two training sets (one for EROS-2 and one for MACHO datasets) using known quasars found in the LMC. Our model's accuracy in both EROS-2 and MACHO training sets is about 90% precision and 86% recall, improving the state of the art models accuracy in quasar detection. We apply the model on the complete, including 28 million objects, EROS-2 and MACHO LMC datasets, finding 1160 and 2551 candidates respectively. To further validate our list of candidates, we crossmatched our list with a previous 663 known strong candidates, getting 74% of matches for MACHO and 40% in EROS-2. The main difference on matching level is because EROS-2 is a slightly shallower survey which translates to significantly lower signal-to-noise ratio lightcurves.
△ Less
Submitted 1 April, 2013;
originally announced April 2013.
-
Statistical Properties of Galactic δ Scuti Stars: Revisited
Authors:
Seo-Won Chang,
Pavlos Protopapas,
Dae-Won Kim,
Yong-Ik Byun
Abstract:
We present statistical characteristics of 1,578 δ Scuti stars including nearby field stars and cluster member stars within the Milky Way. We obtained 46% of these stars (718 stars) from the works done by Rodríguez and collected the remaining 54% stars (860 stars) from other literatures. We updated the entries with the latest information of sky coordinate, color, rotational velocity, spectral type,…
▽ More
We present statistical characteristics of 1,578 δ Scuti stars including nearby field stars and cluster member stars within the Milky Way. We obtained 46% of these stars (718 stars) from the works done by Rodríguez and collected the remaining 54% stars (860 stars) from other literatures. We updated the entries with the latest information of sky coordinate, color, rotational velocity, spectral type, period, amplitude and binarity. The majority of our sample are well characterized in terms of typical period range (0.02-0.25 days), pulsation amplitudes (<0.5 mag) and spectral types (A-F type). Given this list of δ Scuti stars, we examined relations between their physical properties (i.e., periods, amplitudes, spectral types and rotational velocities) for field stars and cluster members, and confirmed that the correlations of properties are not significantly different from those reported in the Rodríguez's works. All the δ Scuti stars are cross-matched with several X-ray and UV catalogs, resulting in 27 X-ray and 41 UV-only counterparts. These counterparts are interesting targets for further study because of their rarity and uniqueness in showing δ Scuti-type variability and X-ray/UV emission at the same time. The compiled catalog can be accessed through the web interface http://stardb.yonsei.ac.kr/DeltaScuti
△ Less
Submitted 5 March, 2013;
originally announced March 2013.
-
The TAOS Project: Results From Seven Years of Survey Data
Authors:
Z. -W. Zhang,
M. J. Lehner,
J. -H. Wang,
C. -Y. Wen,
S. -Y. Wang,
S. -K. King,
Á. P. Granados,
C. Alcock,
T. Axelrod,
F. B. Bianco,
Y. -I. Byun,
W. P. Chen,
N. K. Coehlo,
K. H. Cook,
I. de Pater,
D. -W. Kim,
T. Lee,
J. J. Lissauer,
S. L. Marshall,
P. Protopapas,
J. A. Rice,
M. E. Schwamb
Abstract:
The Taiwanese-American Occultation Survey (TAOS) aims to detect serendipitous occultations of stars by small (about 1 km diameter) objects in the Kuiper Belt and beyond. Such events are very rare (<0.001 events per star per year) and short in duration (about 200 ms), so many stars must be monitored at a high readout cadence. TAOS monitors typically around 500 stars simultaneously at a 5 Hz readout…
▽ More
The Taiwanese-American Occultation Survey (TAOS) aims to detect serendipitous occultations of stars by small (about 1 km diameter) objects in the Kuiper Belt and beyond. Such events are very rare (<0.001 events per star per year) and short in duration (about 200 ms), so many stars must be monitored at a high readout cadence. TAOS monitors typically around 500 stars simultaneously at a 5 Hz readout cadence with four telescopes located at Lulin Observatory in central Taiwan. In this paper, we report the results of the search for small Kuiper Belt Objects (KBOs) in seven years of data. No occultation events were found, resulting in a 95% c.l. upper limit on the slope of the faint end of the KBO size distribution of q = 3.34 to 3.82, depending on the surface density at the break in the size distribution at a diameter of about 90 km.
△ Less
Submitted 25 January, 2013;
originally announced January 2013.
-
Semi-parametric Robust Event Detection for Massive Time-Domain Databases
Authors:
Alexander W Blocker,
Pavlos Protopapas
Abstract:
The detection and analysis of events within massive collections of time-series has become an extremely important task for time-domain astronomy. In particular, many scientific investigations (e.g. the analysis of microlensing and other transients) begin with the detection of isolated events in irregularly-sampled series with both non-linear trends and non-Gaussian noise. We outline a semi-parametr…
▽ More
The detection and analysis of events within massive collections of time-series has become an extremely important task for time-domain astronomy. In particular, many scientific investigations (e.g. the analysis of microlensing and other transients) begin with the detection of isolated events in irregularly-sampled series with both non-linear trends and non-Gaussian noise. We outline a semi-parametric, robust, parallel method for identifying variability and isolated events at multiple scales in the presence of the above complications. This approach harnesses the power of Bayesian modeling while maintaining much of the speed and scalability of more ad-hoc machine learning approaches. We also contrast this work with event detection methods from other fields, highlighting the unique challenges posed by astronomical surveys. Finally, we present results from the application of this method to 87.2 million EROS-2 sources, where we have obtained a greater than 100-fold reduction in candidates for certain types of phenomena while creating high-quality features for subsequent analyses.
△ Less
Submitted 19 January, 2013; v1 submitted 14 January, 2013;
originally announced January 2013.
-
An Information Theoretic Algorithm for Finding Periodicities in Stellar Light Curves
Authors:
Pablo Huijse,
Pablo A. Estevez,
Pavlos Protopapas,
Pablo Zegers,
Jose C. Principe
Abstract:
We propose a new information theoretic metric for finding periodicities in stellar light curves. Light curves are astronomical time series of brightness over time, and are characterized as being noisy and unevenly sampled. The proposed metric combines correntropy (generalized correlation) with a periodic kernel to measure similarity among samples separated by a given period. The new metric provide…
▽ More
We propose a new information theoretic metric for finding periodicities in stellar light curves. Light curves are astronomical time series of brightness over time, and are characterized as being noisy and unevenly sampled. The proposed metric combines correntropy (generalized correlation) with a periodic kernel to measure similarity among samples separated by a given period. The new metric provides a periodogram, called Correntropy Kernelized Periodogram (CKP), whose peaks are associated with the fundamental frequencies present in the data. The CKP does not require any resampling, slotting or folding scheme as it is computed directly from the available samples. CKP is the main part of a fully-automated pipeline for periodic light curve discrimination to be used in astronomical survey databases. We show that the CKP method outperformed the slotted correntropy, and conventional methods used in astronomy for periodicity discrimination and period estimation tasks, using a set of light curves drawn from the MACHO survey. The proposed metric achieved 97.2% of true positives with 0% of false positives at the confidence level of 99% for the periodicity discrimination task; and 88% of hits with 11.6% of multiples and 0.4% of misses in the period estimation task.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
IVOA Recommendation: Spectrum Data Model 1.1
Authors:
Jonathan McDowell,
Doug Tody,
Tamas Budavari,
Markus Dolensky,
Inga Kamp,
Kelly McCusker,
Pavlos Protopapas,
Arnold Rots,
Randy Thompson,
Frank Valdes,
Petr Skoda,
Bruno Rino,
Sebastien Derriere,
Jesus Salgado,
Omar Laurino,
the IVOA Data Access Layer,
Data Model Working Groups
Abstract:
We present a data model describing the structure of spectrophotometric datasets with spectral and temporal coordinates and associated metadata. This data model may be used to represent spectra, time series data, segments of SED (Spectral Energy Distributions) and other spectral or temporal associations.
We present a data model describing the structure of spectrophotometric datasets with spectral and temporal coordinates and associated metadata. This data model may be used to represent spectra, time series data, segments of SED (Spectral Energy Distributions) and other spectral or temporal associations.
△ Less
Submitted 13 April, 2012;
originally announced April 2012.
-
Infinite Shift-invariant Grouped Multi-task Learning for Gaussian Processes
Authors:
Yuyang Wang,
Roni Khardon,
Pavlos Protopapas
Abstract:
Multi-task learning leverages shared information among data sets to improve the learning performance of individual tasks. The paper applies this framework for data where each task is a phase-shifted periodic time series. In particular, we develop a novel Bayesian nonparametric model capturing a mixture of Gaussian processes where each task is a sum of a group-specific function and a component capt…
▽ More
Multi-task learning leverages shared information among data sets to improve the learning performance of individual tasks. The paper applies this framework for data where each task is a phase-shifted periodic time series. In particular, we develop a novel Bayesian nonparametric model capturing a mixture of Gaussian processes where each task is a sum of a group-specific function and a component capturing individual variation, in addition to each task being phase shifted. We develop an efficient \textsc{em} algorithm to learn the parameters of the model. As a special case we obtain the Gaussian mixture model and \textsc{em} algorithm for phased-shifted periodic time series. Furthermore, we extend the proposed model by using a Dirichlet Process prior and thereby leading to an infinite mixture model that is capable of doing automatic model selection. A Variational Bayesian approach is developed for inference in this model. Experiments in regression, classification and class discovery demonstrate the performance of the proposed models using both synthetic data and real-world time series data from astrophysics. Our methods are particularly useful when the time series are sparsely and non-synchronously sampled.
△ Less
Submitted 20 May, 2013; v1 submitted 5 March, 2012;
originally announced March 2012.
-
Period Estimation in Astronomical Time Series Using Slotted Correntropy
Authors:
Pablo Huijse,
Pablo A. Estévez,
Pablo Zegers,
José Príncipe,
Pavlos Protopapas
Abstract:
In this letter, we propose a method for period estimation in light curves from periodic variable stars using correntropy. Light curves are astronomical time series of stellar brightness over time, and are characterized as being noisy and unevenly sampled. We propose to use slotted time lags in order to estimate correntropy directly from irregularly sampled time series. A new information theoretic…
▽ More
In this letter, we propose a method for period estimation in light curves from periodic variable stars using correntropy. Light curves are astronomical time series of stellar brightness over time, and are characterized as being noisy and unevenly sampled. We propose to use slotted time lags in order to estimate correntropy directly from irregularly sampled time series. A new information theoretic metric is proposed for discriminating among the peaks of the correntropy spectral density. The slotted correntropy method outperformed slotted correlation, string length, VarTools (Lomb-Scargle periodogram and Analysis of Variance), and SigSpec applications on a set of light curves drawn from the MACHO survey.
△ Less
Submitted 13 December, 2011;
originally announced December 2011.
-
Nonparametric Bayesian Estimation of Periodic Functions
Authors:
Yuyang Wang,
Roni Khardon,
Pavlos Protopapas
Abstract:
Many real world problems exhibit patterns that have periodic behavior. For example, in astrophysics, periodic variable stars play a pivotal role in understanding our universe. An important step when analyzing data from such processes is the problem of identifying the period: estimating the period of a periodic function based on noisy observations made at irregularly spaced time points. This proble…
▽ More
Many real world problems exhibit patterns that have periodic behavior. For example, in astrophysics, periodic variable stars play a pivotal role in understanding our universe. An important step when analyzing data from such processes is the problem of identifying the period: estimating the period of a periodic function based on noisy observations made at irregularly spaced time points. This problem is still a difficult challenge despite extensive study in different disciplines. The paper makes several contributions toward solving this problem. First, we present a nonparametric Bayesian model for period finding, based on Gaussian Processes (GP), that does not make strong assumptions on the shape of the periodic function. As our experiments demonstrate, the new model leads to significantly better results in period estimation when the target function is non-sinusoidal. Second, we develop a new algorithm for parameter optimization for GP which is useful when the likelihood function is very sensitive to the setting of the hyper-parameters with numerous local minima, as in the case of period estimation. The algorithm combines gradient optimization with grid search and incorporates several mechanisms to overcome the high complexity of inference with GP. Third, we develop a novel approach for using domain knowledge, in the form of a probabilistic generative model, and incorporate it into the period estimation algorithm. Experimental results on astrophysics data validate our approach showing significant improvement over the state of the art in this domain.
△ Less
Submitted 6 March, 2012; v1 submitted 5 November, 2011;
originally announced November 2011.
-
A Refined QSO Selection Method Using Diagnostics Tests: 663 QSO Candidates in the LMC
Authors:
Dae-Won Kim,
Pavlos Protopapas,
Markos Trichas,
Michael Rowan-Robinson,
Roni Khardon,
Charles Alcock,
Yong-Ik Byun
Abstract:
We present 663 QSO candidates in the Large Magellanic Cloud (LMC) selected using multiple diagnostics. We started with a set of 2,566 QSO candidates from our previous work selected using time variability of the MACHO LMC lightcurves. We then obtained additional information for the candidates by crossmatching them with the Spitzer SAGE, the MACHO UBVI, the 2MASS, the Chandra and the XMM catalogs. U…
▽ More
We present 663 QSO candidates in the Large Magellanic Cloud (LMC) selected using multiple diagnostics. We started with a set of 2,566 QSO candidates from our previous work selected using time variability of the MACHO LMC lightcurves. We then obtained additional information for the candidates by crossmatching them with the Spitzer SAGE, the MACHO UBVI, the 2MASS, the Chandra and the XMM catalogs. Using this information, we specified six diagnostic features based on mid-IR colors, photometric redshifts using SED template fitting, and X-ray luminosities in order to further discriminate high confidence QSO candidates in the absence of spectra information. We then trained a one-class SVM (Support Vector Machine) model using the diagnostics features of the confirmed 58 MACHO QSOs. We applied the trained model to the original candidates and finally selected 663 high confidence QSO candidates. Furthermore, we crossmatched these 663 QSO candidates with the newly confirmed 144 QSOs and 275 non-QSOs in the LMC fields. On the basis of the counterpart analysis, we found that the false positive rate is less than 1%.
△ Less
Submitted 31 December, 2011; v1 submitted 25 October, 2011;
originally announced October 2011.
-
QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database
Authors:
Dae-Won Kim,
Pavlos Protopapas,
Yong-Ik Byun,
Charles Alcock,
Roni Khardon,
Markos Trichas
Abstract:
We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the M…
▽ More
We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false positive rate. The majority of the false positives are Be stars.
We applied the trained model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution (SAGE) LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.
△ Less
Submitted 19 April, 2011; v1 submitted 17 January, 2011;
originally announced January 2011.
-
Trans-Neptunian Objects with Hubble Space Telescope ACS/WFC
Authors:
Cesar I. Fuentes,
Matthew J. Holman,
David E. Trilling,
Pavlos Protopapas
Abstract:
We introduce a novel search technique that can identify trans-neptunian objects in three to five exposures of a pointing within a single Hubble Space Telescope orbit. The process is fast enough to allow the discovery of candidates soon after the data are available. This allows sufficient time to schedule follow up observations with HST within a month. We report the discovery of 14 slow-moving obje…
▽ More
We introduce a novel search technique that can identify trans-neptunian objects in three to five exposures of a pointing within a single Hubble Space Telescope orbit. The process is fast enough to allow the discovery of candidates soon after the data are available. This allows sufficient time to schedule follow up observations with HST within a month. We report the discovery of 14 slow-moving objects found within 5\circ of the ecliptic in archival data taken with the Wide Field Channel of the Advanced Camera for Surveys. The luminosity function of these objects is consistent with previous ground-based and space-based results. We show evidence that the size distribution of both high and low inclination populations is similar for objects smaller than 100 km, as expected from collisional evolution models, while their size distribution differ for brighter objects. We suggest the two populations formed in different parts of the protoplanetary disk and after being dynamically mixed have collisionally evolved together. Among the objects discovered there is an equal mass binary with an angular separation ~ 0."53.
△ Less
Submitted 12 August, 2010;
originally announced August 2010.
-
The TAOS Project Stellar Variability II. Detection of 15 Variable Stars
Authors:
S. Mondal,
C. C. Lin,
W. P. Chen,
Z. -W. Zhang,
C. Alcock,
T. Axelrod,
F. B. Bianco,
Y. -I. Byun,
N. K. Coehlo,
K. H. Cook,
R. Dave,
D. -W. Kim,
S. -K. King,
T. Lee,
M. J. Lehner,
H. -C. Lin,
S. L. Marshal,
P. Protopapas,
J. A. Rice,
M. E. Schwamb,
J. -H. Wang,
S. -Y. Wang,
C. -Y. Wen
Abstract:
The Taiwanese-American Occultation Survey (TAOS) project has collected more than a billion photometric measurements since 2005 January. These sky survey data-covering timescales from a fraction of a second to a few hundred days-are a useful source to study stellar variability. A total of 167 star fields, mostly along the ecliptic plane, have been selected for photometric monitoring with the TAOS…
▽ More
The Taiwanese-American Occultation Survey (TAOS) project has collected more than a billion photometric measurements since 2005 January. These sky survey data-covering timescales from a fraction of a second to a few hundred days-are a useful source to study stellar variability. A total of 167 star fields, mostly along the ecliptic plane, have been selected for photometric monitoring with the TAOS telescopes. This paper presents our initial analysis of a search for periodic variable stars from the time-series TAOS data on one particular TAOS field, No. 151 (RA = 17$^{\rm h}30^{\rm m}6\fs$67, Dec = 27\degr17\arcmin 30\arcsec, J2000), which had been observed over 47 epochs in 2005. A total of 81 candidate variables are identified in the 3 square degree field, with magnitudes in the range 8 < R < 16. On the basis of the periodicity and shape of the lightcurves, 29 variables, 15 of which were previously unknown, are classified as RR Lyrae, Cepheid, delta Scuti, SX Phonencis, semi-regular and eclipsing binaries.
△ Less
Submitted 12 March, 2010;
originally announced March 2010.