Statistics
See recent articles
- [1] arXiv:2407.18287 [pdf, other]
-
Title: Estimating the number of clusters of a Block Markov ChainComments: 46 pages, 13 figures, 6 tables, 7 algorithmsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
Clustering algorithms frequently require the number of clusters to be chosen in advance, but it is usually not clear how to do this. To tackle this challenge when clustering within sequential data, we present a method for estimating the number of clusters when the data is a trajectory of a Block Markov Chain. Block Markov Chains are Markov Chains that exhibit a block structure in their transition matrix. The method considers a matrix that counts the number of transitions between different states within the trajectory, and transforms this into a spectral embedding whose dimension is set via singular value thresholding. The number of clusters is subsequently estimated via density-based clustering of this spectral embedding, an approach inspired by literature on the Stochastic Block Model. By leveraging and augmenting recent results on the spectral concentration of random matrices with Markovian dependence, we show that the method is asymptotically consistent - in spite of the dependencies between the count matrix's entries, and even when the count matrix is sparse. We also present a numerical evaluation of our method, and compare it to alternatives.
- [2] arXiv:2407.18314 [pdf, other]
-
Title: Higher Partials of fStressComments: 27 pages, R code, C codeSubjects: Computation (stat.CO); Methodology (stat.ME)
We define *fDistances*, which generalize Euclidean distances, squared distances, and log distances. The least squares loss function to fit fDistances to dissimilarity data is *fStress*. We give formulas and R/C code to compute partial derivatives of orders one to four of fStress, relying heavily on the use of Faà di Bruno's chain rule formula for higher derivatives.
- [3] arXiv:2407.18341 [pdf, other]
-
Title: Shrinking Coarsened Win Ratio and Testing of Composite EndpointSubjects: Methodology (stat.ME)
Composite endpoints consisting of both terminal and non-terminal events, such as death and hospitalization, are frequently used as primary endpoints in cardiovascular clinical trials. The Win Ratio method (WR) proposed by Pocock et al. (2012) [1] employs a hierarchical structure to combine fatal and non-fatal events by giving death information an absolute priority, which adversely affects power if the treatment effect is mainly on the non-fatal outcomes. We hereby propose the Shrinking Coarsened Win Ratio method (SCWR) that releases the strict hierarchical structure of the standard WR by adding stages with coarsened thresholds shrinking to zero. A weighted adaptive approach is developed to determine the thresholds in SCWR. This method preserves the good statistical properties of the standard WR and has a greater capacity to detect treatment effects on non-fatal events. We show that SCWR has an overall more favorable performance than WR in our simulation that addresses the influence of follow-up time, the association between events, and the treatment effect levels, as well as a case study based on the Digitalis Investigation Group clinical trial data.
- [4] arXiv:2407.18360 [pdf, other]
-
Title: Organizational Effectiveness: A New Strategy to Leverage Multisite Randomized Trials for Valid AssessmentGuanglei Hong (University of Chicago), Jonah Deutsch (Mathematica), Peter Kress (Mathematica), Jose Eos Trinidad (University of California-Berkeley), Zhengyan Xu (University of Pennsylvania)Subjects: Applications (stat.AP)
In education, health, and human services, an intervention program is usually implemented by many local organizations. Determining which organizations are more effective is essential for theoretically characterizing effective practices and for intervening to enhance the capacity of ineffective organizations. In multisite randomized trials, site-specific intention-to-treat (ITT) effects are likely invalid indicators for organizational effectiveness and may lead to inequitable decisions. This is because sites differ in their local ecological conditions including client composition, alternative programs, and community context. Applying the potential outcomes framework, this study proposes a mathematical definition for the relative effectiveness of an organization. The estimand contrasts the performance of a focal organization with those that share the features of its local ecological conditions. The identification relies on relatively weak assumptions by leveraging observed control group outcomes that capture the confounding impacts of alternative programs and community context. We propose a two-step mixed-effects modeling (2SME) procedure. Simulations demonstrate significant improvements when compared with site-specific ITT analyses or analyses that only adjust for between-site differences in the observed baseline participant composition. We illustrate its use through an evaluation of the relative effectiveness of individual Job Corps centers by reanalyzing data from the National Job Corps Study, a multisite randomized trial that included 100 Job Corps centers nationwide serving disadvantaged youths. The new strategy promises to alleviate consequential misclassifications of some of the most effective Job Corps centers as least effective and vice versa.
- [5] arXiv:2407.18377 [pdf, other]
-
Title: Bayesian Nowcasting Data Breach IBNR IncidentsSubjects: Applications (stat.AP)
The reporting delay in data breach incidents poses a formidable challenge for Incurred But Not Reported (IBNR) studies, complicating reserve estimation for actuarial professionals. This work presents a novel Bayesian nowcasting model designed to accurately model and predict the number of IBNR data breach incidents. Leveraging a Bayesian modeling framework, the model integrates time and heterogeneous effects to enhance predictive accuracy. Synthetic and empirical studies demonstrate the superior performance of the proposed model, highlighting its efficacy in addressing the complexities of IBNR estimation. Furthermore, we examine reserve estimation for IBNR incidents using the proposed model, shedding light on its implications for actuarial practice.
- [6] arXiv:2407.18389 [pdf, other]
-
Title: Doubly Robust Targeted Estimation of Conditional Average Treatment Effects for Time-to-event Outcomes with Competing RisksComments: 42 pages, 8 figuresSubjects: Methodology (stat.ME); Applications (stat.AP)
In recent years, precision treatment strategy have gained significant attention in medical research, particularly for patient care. We propose a novel framework for estimating conditional average treatment effects (CATE) in time-to-event data with competing risks, using ICU patients with sepsis as an illustrative example. Our approach, based on cumulative incidence functions and targeted maximum likelihood estimation (TMLE), achieves both asymptotic efficiency and double robustness. The primary contribution of this work lies in our derivation of the efficient influence function for the targeted causal parameter, CATE. We established the theoretical proofs for these properties, and subsequently confirmed them through simulations. Our TMLE framework is flexible, accommodating various regression and machine learning models, making it applicable in diverse scenarios. In order to identify variables contributing to treatment effect heterogeneity and to facilitate accurate estimation of CATE, we developed two distinct variable importance measures (VIMs). This work provides a powerful tool for optimizing personalized treatment strategies, furthering the pursuit of precision medicine.
- [7] arXiv:2407.18432 [pdf, other]
-
Title: Accounting for reporting delays in real-time phylodynamic analyses with preferential samplingComments: 17 pages, 5 figures in the main textSubjects: Methodology (stat.ME); Populations and Evolution (q-bio.PE)
The COVID-19 pandemic demonstrated that fast and accurate analysis of continually collected infectious disease surveillance data is crucial for situational awareness and policy making. Coalescent-based phylodynamic analysis can use genetic sequences of a pathogen to estimate changes in its effective population size, a measure of genetic diversity. These changes in effective population size can be connected to the changes in the number of infections in the population of interest under certain conditions. Phylodynamics is an important set of tools because its methods are often resilient to the ascertainment biases present in traditional surveillance data (e.g., preferentially testing symptomatic individuals). Unfortunately, it takes weeks or months to sequence and deposit the sampled pathogen genetic sequences into a database, making them available for such analyses. These reporting delays severely decrease precision of phylodynamic methods closer to present time, and for some models can lead to extreme biases. Here we present a method that affords reliable estimation of the effective population size trajectory closer to the time of data collection, allowing for policy decisions to be based on more recent data. Our work uses readily available historic times between sampling and sequencing for a population of interest, and incorporates this information into the sampling model to mitigate the effects of reporting delay in real-time analyses. We illustrate our methodology on simulated data and on SARS-CoV-2 sequences collected in the state of Washington in 2021.
- [8] arXiv:2407.18572 [pdf, other]
-
Title: Bernoulli amputationSubjects: Applications (stat.AP); Statistics Theory (math.ST); Other Statistics (stat.OT)
An approach to amputation, the process of introducing missing values to a complete dataset, is presented. It allows to construct missingness indicators in a flexible and principled way via copulas and Bernoulli margins and to incorporate dependence in missingness patterns. Besides more classical missingness models such as missing completely at random, missing at random, and missing not at random, the approach is able to model structured missingness such as block missingness and, via mixtures, monotone missingness, which are patterns of missing data frequently found in real-life datasets. Properties such as joint missingness probabilities or missingness correlation are derived mathematically. The approach is demonstrated with mathematical examples and empirical illustrations in terms of a well-known dataset.
- [9] arXiv:2407.18612 [pdf, other]
-
Title: Integration of Structural Equation Modeling and Bayesian Networks in the Context of Causal Inference: A Case Study on Personal Positive Youth DevelopmentComments: 38 pages, 9 figures, 2 tablesSubjects: Methodology (stat.ME)
In this study, the combined use of structural equation modeling (SEM) and Bayesian network modeling (BNM) in causal inference analysis is revisited. The perspective highlights the debate between proponents of using BNM as either an exploratory phase or even as the sole phase in the definition of structural models, and those advocating for SEM as the superior alternative for exploratory analysis. The individual strengths and limitations of SEM and BNM are recognized, but this exploration evaluates the contention between utilizing SEM's robust structural inference capabilities and the dynamic probabilistic modeling offered by BNM. A case study of the work of, \citet{balaguer_2022} in a structural model for personal positive youth development (\textit{PYD}) as a function of positive parenting (\textit{PP}) and perception of the climate and functioning of the school (\textit{CFS}) is presented. The paper at last presents a clear stance on the analytical primacy of SEM in exploratory causal analysis, while acknowledging the potential of BNM in subsequent phases.
- [10] arXiv:2407.18650 [pdf, other]
-
Title: Achieving interpretable machine learning by functional decomposition of black-box models into explainable predictor effectsDavid Köhler (1), David Rügamer (2 and 3), Matthias Schmid (1) ((1) Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn, (2) Department of Statistics, LMU Munich, (3) Munich Center for Machine Learning)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Machine learning (ML) has seen significant growth in both popularity and importance. The high prediction accuracy of ML models is often achieved through complex black-box architectures that are difficult to interpret. This interpretability problem has been hindering the use of ML in fields like medicine, ecology and insurance, where an understanding of the inner workings of the model is paramount to ensure user acceptance and fairness. The need for interpretable ML models has boosted research in the field of interpretable machine learning (IML). Here we propose a novel approach for the functional decomposition of black-box predictions, which is considered a core concept of IML. The idea of our method is to replace the prediction function by a surrogate model consisting of simpler subfunctions. Similar to additive regression models, these functions provide insights into the direction and strength of the main feature contributions and their interactions. Our method is based on a novel concept termed stacked orthogonality, which ensures that the main effects capture as much functional behavior as possible and do not contain information explained by higher-order interactions. Unlike earlier functional IML approaches, it is neither affected by extrapolation nor by hidden feature interactions. To compute the subfunctions, we propose an algorithm based on neural additive modeling and an efficient post-hoc orthogonalization procedure.
- [11] arXiv:2407.18685 [pdf, other]
-
Title: On the impossibility of detecting a late change-point in the preferential attachment random graph modelSubjects: Statistics Theory (math.ST); Probability (math.PR)
We consider the problem of late change-point detection under the preferential attachment random graph model with time dependent attachment function. This can be formulated as a hypothesis testing problem where the null hypothesis corresponds to a preferential attachment model with a constant affine attachment parameter $\delta_0$ and the alternative corresponds to a preferential attachment model where the affine attachment parameter changes from $\delta_0$ to $\delta_1$ at a time $\tau_n = n - \Delta_n$ where $0\leq \Delta_n \leq n$ and $n$ is the size of the graph. It was conjectured in Bet et al. that when observing only the unlabeled graph, detection of the change is not possible for $\Delta_n = o(n^{1/2})$. In this work, we make a step towards proving the conjecture by proving the impossibility of detecting the change when $\Delta_n = o(n^{1/3})$. We also study change-point detection in the case where the labelled graph is observed and show that change-point detection is possible if and only if $\Delta_n \to \infty$, thereby exhibiting a strong difference between the two settings.
- [12] arXiv:2407.18721 [pdf, other]
-
Title: Ensemble Kalman inversion approximate Bayesian computationSubjects: Methodology (stat.ME); Computation (stat.CO)
Approximate Bayesian computation (ABC) is the most popular approach to inferring parameters in the case where the data model is specified in the form of a simulator. It is not possible to directly implement standard Monte Carlo methods for inference in such a model, due to the likelihood not being available to evaluate pointwise. The main idea of ABC is to perform inference on an alternative model with an approximate likelihood (the ABC likelihood), estimated at each iteration from points simulated from the data model. The central challenge of ABC is then to trade-off bias (introduced by approximating the model) with the variance introduced by estimating the ABC likelihood. Stabilising the variance of the ABC likelihood requires a computational cost that is exponential in the dimension of the data, thus the most common approach to reducing variance is to perform inference conditional on summary statistics. In this paper we introduce a new approach to estimating the ABC likelihood: using iterative ensemble Kalman inversion (IEnKI) (Iglesias, 2016; Iglesias et al., 2018). We first introduce new estimators of the marginal likelihood in the case of a Gaussian data model using the IEnKI output, then show how this may be used in ABC. Performance is illustrated on the Lotka-Volterra model, where we observe substantial improvements over standard ABC and other commonly-used approaches.
- [13] arXiv:2407.18755 [pdf, other]
-
Title: Score matching through the roof: linear, nonlinear, and latent variables causal discoverySubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Methodology (stat.ME)
Causal discovery from observational data holds great promise, but existing methods rely on strong assumptions about the underlying causal structure, often requiring full observability of all relevant variables. We tackle these challenges by leveraging the score function $\nabla \log p(X)$ of observed variables for causal discovery and propose the following contributions. First, we generalize the existing results of identifiability with the score to additive noise models with minimal requirements on the causal mechanisms. Second, we establish conditions for inferring causal relations from the score even in the presence of hidden variables; this result is two-faced: we demonstrate the score's potential as an alternative to conditional independence tests to infer the equivalence class of causal graphs with hidden variables, and we provide the necessary conditions for identifying direct causes in latent variable models. Building on these insights, we propose a flexible algorithm for causal discovery across linear, nonlinear, and latent variable models, which we empirically validate.
- [14] arXiv:2407.18801 [pdf, other]
-
Title: Usual stochastic orderings of the second-order statistics with dependent heterogeneous semi-parametric distribution random variablesSubjects: Statistics Theory (math.ST)
This manuscript investigates the stochastic comparisons of the second-order statistics from dependent and heterogeneous general semi-parametric family of distributions observations. Some sufficient conditions on the usual stochastic order of the second-order statistics from dependent and heterogeneous observations are established under the p-larger order and the reciprocally majorization order. Some numerical examples are given to illustrate the theoretical findings. In addition, the results of the Theorem are applied to two important models. Finally, we use a group of real data for empirical analysis to carry out reliability analysis.
- [15] arXiv:2407.18802 [pdf, other]
-
Title: Log-Concave Coupling for Sampling Neural Net PosteriorsComments: This research was presented at the International Symposium on Information Theory (ISIT). Athens, Greece, July 11, 2024. The material was also presented in the 2024 Shannon LectureSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
In this work, we present a sampling algorithm for single hidden layer neural networks. This algorithm is built upon a recursive series of Bayesian posteriors using a method we call Greedy Bayes. Sampling of the Bayesian posterior for neuron weight vectors $w$ of dimension $d$ is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for $w$ with an auxiliary random variable $\xi$.
The resulting reverse conditional $w|\xi$ of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on $w$ with suitably small variance, the resulting marginal density of the auxiliary variable $\xi$ is proven to be strictly log concave for all dimensions $d$. For a uniform prior on the unit $\ell_1$ ball, evidence is given that the density of $\xi$ is again strictly log concave for sufficiently large $d$.
The score of the marginal density of the auxiliary random variable $\xi$ is determined by an expectation over $w|\xi$ and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of $\xi$ permits methods of sampling $\xi$ by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of $\xi$ is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from $w|\xi$, produces neuron weights $w$ whose marginal distribution is from the desired posterior. - [16] arXiv:2407.18808 [pdf, other]
-
Title: Learning Chaotic Systems and Long-Term Predictions with Neural Jump ODEsSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Dynamical Systems (math.DS); Probability (math.PR)
The Path-dependent Neural Jump ODE (PD-NJ-ODE) is a model for online prediction of generic (possibly non-Markovian) stochastic processes with irregular (in time) and potentially incomplete (with respect to coordinates) observations. It is a model for which convergence to the $L^2$-optimal predictor, which is given by the conditional expectation, is established theoretically. Thereby, the training of the model is solely based on a dataset of realizations of the underlying stochastic process, without the need of knowledge of the law of the process. In the case where the underlying process is deterministic, the conditional expectation coincides with the process itself. Therefore, this framework can equivalently be used to learn the dynamics of ODE or PDE systems solely from realizations of the dynamical system with different initial conditions. We showcase the potential of our method by applying it to the chaotic system of a double pendulum. When training the standard PD-NJ-ODE method, we see that the prediction starts to diverge from the true path after about half of the evaluation time. In this work we enhance the model with two novel ideas, which independently of each other improve the performance of our modelling setup. The resulting dynamics match the true dynamics of the chaotic system very closely. The same enhancements can be used to provably enable the PD-NJ-ODE to learn long-term predictions for general stochastic datasets, where the standard model fails. This is verified in several experiments.
- [17] arXiv:2407.18835 [pdf, other]
-
Title: Robust Estimation of Polychoric CorrelationComments: 50 pages (30 main text), 13 figures (8 main text), 9 tables (4 main text). arXiv admin note: text overlap with arXiv:2403.11954Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Other Statistics (stat.OT)
Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust to partial misspecification of the polychoric model, that is, the model is only misspecified for an unknown fraction of observations, for instance (but not limited to) careless respondents. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation and is consistent as well as asymptotically normally distributed. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.
- [18] arXiv:2407.18885 [pdf, other]
-
Title: Simulation Experiment Design for Calibration via Active LearningSubjects: Methodology (stat.ME)
Simulation models often have parameters as input and return outputs to understand the behavior of complex systems. Calibration is the process of estimating the values of the parameters in a simulation model in light of observed data from the system that is being simulated. When simulation models are expensive, emulators are built with simulation data as a computationally efficient approximation of an expensive model. An emulator then can be used to predict model outputs, instead of repeatedly running an expensive simulation model during the calibration process. Sequential design with an intelligent selection criterion can guide the process of collecting simulation data to build an emulator, making the calibration process more efficient and effective. This article proposes two novel criteria for sequentially acquiring new simulation data in an active learning setting by considering uncertainties on the posterior density of parameters. Analysis of several simulation experiments and real-data simulation experiments from epidemiology demonstrates that proposed approaches result in improved posterior and field predictions.
- [19] arXiv:2407.18905 [pdf, other]
-
Title: The nph2ph-transform: applications to the statistical analysis of completed clinical trialsSubjects: Methodology (stat.ME)
We present several illustrations from completed clinical trials on a statistical approach that allows us to gain useful insights regarding the time dependency of treatment effects. Our approach leans on a simple proposition: all non-proportional hazards (NPH) models are equivalent to a proportional hazards model. The nph2ph transform brings an NPH model into a PH form. We often find very simple approximations for this transform, enabling us to analyze complex NPH observations as though they had arisen under proportional hazards. Many techniques become available to us, and we use these to understand treatment effects better.
New submissions for Monday, 29 July 2024 (showing 19 of 19 entries )
- [20] arXiv:2407.18257 (cross-list from cs.NE) [pdf, other]
-
Title: Estimation of Distribution Algorithms with Matrix Transpose in Bayesian LearningSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Estimation of distribution algorithms (EDAs) constitute a new branch of evolutionary optimization algorithms, providing effective and efficient optimization performance in a variety of research areas. Recent studies have proposed new EDAs that employ mutation operators in standard EDAs to increase the population diversity. We present a new mutation operator, a matrix transpose, specifically designed for Bayesian structure learning, and we evaluate its performance in Bayesian structure learning. The results indicate that EDAs with transpose mutation give markedly better performance than conventional EDAs.
- [21] arXiv:2407.18268 (cross-list from physics.ao-ph) [pdf, other]
-
Title: Spatial analysis of tails of air pollution PDFs in EuropeComments: arXiv admin note: substantial text overlap with arXiv:2203.04296Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Dynamical Systems (math.DS); Applications (stat.AP)
Outdoor air pollution is estimated to cause a huge number of premature deaths worldwide, it catalyses many diseases on a variety of time scales, and it has a detrimental effect on the environment. In light of these impacts it is necessary to obtain a better understanding of the dynamics and statistics of measured air pollution concentrations, including temporal fluctuations of observed concentrations and spatial heterogeneities. Here we present an extensive analysis for measured data from Europe. The observed probability density functions (PDFs) of air pollution concentrations depend very much on the spatial location and on the pollutant substance. We analyse a large number of time series data from 3544 different European monitoring sites and show that the PDFs of nitric oxide ($NO$), nitrogen dioxide ($NO_{2}$) and particulate matter ($PM_{10}$ and $PM_{2.5}$) concentrations generically exhibit heavy tails. These are asymptotically well approximated by $q$-exponential distributions with a given entropic index $q$ and width parameter $\lambda$. We observe that the power-law parameter $q$ and the width parameter $\lambda$ vary widely for the different spatial locations. We present the results of our data analysis in the form of a map that shows which parameters $q$ and $\lambda$ are most relevant in a given region. A variety of interesting spatial patterns is observed that correlate to properties of the geographical region. We also present results on typical time scales associated with the dynamical behaviour.
- [22] arXiv:2407.18313 (cross-list from math.NA) [pdf, other]
-
Title: Majorizing Stress Formula TwoComments: 16 pages, 4 figures, R codeSubjects: Numerical Analysis (math.NA); Computation (stat.CO); Machine Learning (stat.ML)
Modifications of the smacof algorithm for multidimensional scaling are proposed that provide a convergent majorization algorithm for Kruskal's stress formula two.
- [23] arXiv:2407.18397 (cross-list from cs.LG) [pdf, other]
-
Title: Gaussian Process Kolmogorov-Arnold NetworksComments: related code: this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we introduce a probabilistic extension to Kolmogorov Arnold Networks (KANs) by incorporating Gaussian Process (GP) as non-linear neurons, which we refer to as GP-KAN. A fully analytical approach to handling the output distribution of one GP as an input to another GP is achieved by considering the function inner product of a GP function sample with the input distribution. These GP neurons exhibit robust non-linear modelling capabilities while using few parameters and can be easily and fully integrated in a feed-forward network structure. They provide inherent uncertainty estimates to the model prediction and can be trained directly on the log-likelihood objective function, without needing variational lower bounds or approximations. In the context of MNIST classification, a model based on GP-KAN of 80 thousand parameters achieved 98.5% prediction accuracy, compared to current state-of-the-art models with 1.5 million parameters.
- [24] arXiv:2407.18402 (cross-list from cs.LG) [pdf, other]
-
Title: The seismic purifier: An unsupervised approach to seismic signal detection via representation learningComments: Submitted to IEEE-TGRSSubjects: Machine Learning (cs.LG); Geophysics (physics.geo-ph); Machine Learning (stat.ML)
In this paper, we develop an unsupervised learning approach to earthquake detection. We train a specific class of deep auto-encoders that learn to reproduce the input waveforms after a data-compressive bottleneck, and then use a simple triggering algorithm at the bottleneck to label waveforms as noise or signal.
Our approach is motivated by the intuition that efficient compression of data should represent signals differently from noise, and is facilitated by a time-axis-preserving approach to auto-encoding and intuitively-motivated choices on the architecture and triggering.
We demonstrate that the detection performance of the unsupervised approach is comparable to, and in some cases better than, some of the state-of-the-art supervised methods. Moreover, it has strong \emph{cross-dataset generalization}. By experimenting with various modifications, we demonstrate that the detection performance is insensitive to various technical choices made in the algorithm.
Our approach has the potential to be useful for other signal detection problems with time series data. - [25] arXiv:2407.18488 (cross-list from cs.LG) [pdf, other]
-
Title: Conversational Dueling Bandits in Generalized Linear ModelsSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities. Such systems utilize a multi-armed bandit framework to learn user preferences in an online manner and have received great success in recent years. However, existing conversational bandit methods have several limitations. First, they only enable users to provide explicit binary feedback on the recommended items or categories, leading to ambiguity in interpretation. In practice, users are usually faced with more than one choice. Relative feedback, known for its informativeness, has gained increasing popularity in recommendation system design. Moreover, current contextual bandit methods mainly work under linear reward assumptions, ignoring practical non-linear reward structures in generalized linear models. Therefore, in this paper, we introduce relative feedback-based conversations into conversational recommendation systems through the integration of dueling bandits in generalized linear models (GLM) and propose a novel conversational dueling bandit algorithm called ConDuel. Theoretical analyses of regret upper bounds and empirical validations on synthetic and real-world data underscore ConDuel's efficacy. We also demonstrate the potential to extend our algorithm to multinomial logit bandits with theoretical and experimental guarantees, which further proves the applicability of the proposed framework.
- [26] arXiv:2407.18504 (cross-list from q-fin.CP) [pdf, other]
-
Title: Multilevel Monte Carlo in Sample Average Approximation: Convergence, Complexity and ApplicationSubjects: Computational Finance (q-fin.CP); Computation (stat.CO)
In this paper, we examine the Sample Average Approximation (SAA) procedure within a framework where the Monte Carlo estimator of the expectation is biased. We also introduce Multilevel Monte Carlo (MLMC) in the SAA setup to enhance the computational efficiency of solving optimization problems. In this context, we conduct a thorough analysis, exploiting Cramér's large deviation theory, to establish uniform convergence, quantify the convergence rate, and determine the sample complexity for both standard Monte Carlo and MLMC paradigms. Additionally, we perform a root-mean-squared error analysis utilizing tools from empirical process theory to derive sample complexity without relying on the finite moment condition typically required for uniform convergence results. Finally, we validate our findings and demonstrate the advantages of the MLMC estimator through numerical examples, estimating Conditional Value-at-Risk (CVaR) in the Geometric Brownian Motion and nested expectation framework.
- [27] arXiv:2407.18519 (cross-list from cs.LG) [pdf, other]
-
Title: TCGPN: Temporal-Correlation Graph Pre-trained Network for Stock ForecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistical Finance (q-fin.ST); Machine Learning (stat.ML)
Recently, the incorporation of both temporal features and the correlation across time series has become an effective approach in time series prediction. Spatio-Temporal Graph Neural Networks (STGNNs) demonstrate good performance on many Temporal-correlation Forecasting Problem. However, when applied to tasks lacking periodicity, such as stock data prediction, the effectiveness and robustness of STGNNs are found to be unsatisfactory. And STGNNs are limited by memory savings so that cannot handle problems with a large number of nodes. In this paper, we propose a novel approach called the Temporal-Correlation Graph Pre-trained Network (TCGPN) to address these limitations. TCGPN utilize Temporal-correlation fusion encoder to get a mixed representation and pre-training method with carefully designed temporal and correlation pre-training tasks. Entire structure is independent of the number and order of nodes, so better results can be obtained through various data enhancements. And memory consumption during training can be significantly reduced through multiple sampling. Experiments are conducted on real stock market data sets CSI300 and CSI500 that exhibit minimal periodicity. We fine-tune a simple MLP in downstream tasks and achieve state-of-the-art results, validating the capability to capture more robust temporal correlation patterns.
- [28] arXiv:2407.18597 (cross-list from cs.LG) [pdf, other]
-
Title: Reinforcement Learning for Sustainable Energy: A SurveyComments: 22 pages excluding references, 40 pages including references, 7 imagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Systems and Control (eess.SY); Machine Learning (stat.ML)
The transition to sustainable energy is a key challenge of our time, requiring modifications in the entire pipeline of energy production, storage, transmission, and consumption. At every stage, new sequential decision-making challenges emerge, ranging from the operation of wind farms to the management of electrical grids or the scheduling of electric vehicle charging stations. All such problems are well suited for reinforcement learning, the branch of machine learning that learns behavior from data. Therefore, numerous studies have explored the use of reinforcement learning for sustainable energy. This paper surveys this literature with the intention of bridging both the underlying research communities: energy and machine learning. After a brief introduction of both fields, we systematically list relevant sustainability challenges, how they can be modeled as a reinforcement learning problem, and what solution approaches currently exist in the literature. Afterwards, we zoom out and identify overarching reinforcement learning themes that appear throughout sustainability, such as multi-agent, offline, and safe reinforcement learning. Lastly, we also cover standardization of environments, which will be crucial for connecting both research fields, and highlight potential directions for future work. In summary, this survey provides an extensive overview of reinforcement learning methods for sustainable energy, which may play a vital role in the energy transition.
- [29] arXiv:2407.18609 (cross-list from cs.LG) [pdf, other]
-
Title: Denoising L\'evy Probabilistic ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Investigating noise distribution beyond Gaussian in diffusion generative models is an open problem. The Gaussian case has seen success experimentally and theoretically, fitting a unified SDE framework for score-based and denoising formulations. Recent studies suggest heavy-tailed noise distributions can address mode collapse and manage datasets with class imbalance, heavy tails, or outliers. Yoon et al. (NeurIPS 2023) introduced the Lévy-Ito model (LIM), extending the SDE framework to heavy-tailed SDEs with $\alpha$-stable noise. Despite its theoretical elegance and performance gains, LIM's complex mathematics may limit its accessibility and broader adoption. This study takes a simpler approach by extending the denoising diffusion probabilistic model (DDPM) with $\alpha$-stable noise, creating the denoising Lévy probabilistic model (DLPM). Using elementary proof techniques, we show DLPM reduces to running vanilla DDPM with minimal changes, allowing the use of existing implementations with minimal changes. DLPM and LIM have different training algorithms and, unlike the Gaussian case, they admit different backward processes and sampling algorithms. Our experiments demonstrate that DLPM achieves better coverage of data distribution tail, improved generation of unbalanced datasets, and faster computation times with fewer backward steps.
- [30] arXiv:2407.18633 (cross-list from math.PR) [pdf, other]
-
Title: On stable central limit theorems for multivariate discrete-time martingalesSubjects: Probability (math.PR); Statistics Theory (math.ST)
We provide a systematic approach to stable central limit theorems for d-dimensional martingale difference arrays and martingale difference sequences. The conditions imposed are straightforward extensions of the univariate case.
- [31] arXiv:2407.18648 (cross-list from physics.app-ph) [pdf, other]
-
Title: Fast and Reliable Probabilistic Reflectometry Inversion with Prior-Amortized Neural Posterior EstimationVladimir Starostin, Maximilian Dax, Alexander Gerlach, Alexander Hinderhofer, Álvaro Tejero-Cantero, Frank SchreiberSubjects: Applied Physics (physics.app-ph); Soft Condensed Matter (cond-mat.soft); Machine Learning (cs.LG); Machine Learning (stat.ML)
Reconstructing the structure of thin films and multilayers from measurements of scattered X-rays or neutrons is key to progress in physics, chemistry, and biology. However, finding all structures compatible with reflectometry data is computationally prohibitive for standard algorithms, which typically results in unreliable analysis with only a single potential solution identified. We address this lack of reliability with a probabilistic deep learning method that identifies all realistic structures in seconds, setting new standards in reflectometry. Our method, Prior-Amortized Neural Posterior Estimation (PANPE), combines simulation-based inference with novel adaptive priors that inform the inference network about known structural properties and controllable experimental conditions. PANPE networks support key scenarios such as high-throughput sample characterization, real-time monitoring of evolving structures, or the co-refinement of several experimental data sets, and can be adapted to provide fast, reliable, and flexible inference across many other inverse problems.
- [32] arXiv:2407.18698 (cross-list from cs.CL) [pdf, other]
-
Title: Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text GenerationSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, $k-$sampling, nucleus $p-$sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks. Our code base, datasets, and models are publicly available.
- [33] arXiv:2407.18699 (cross-list from cs.CR) [pdf, other]
-
Title: A Public Dataset For the ZKsync RollupComments: 12 pages, 12 figuresSubjects: Cryptography and Security (cs.CR); Applications (stat.AP)
Despite blockchain data being publicly available, practical challenges and high costs often hinder its effective use by researchers, thus limiting data-driven research and exploration in the blockchain space. This is especially true when it comes to Layer~2 (L2) ecosystems, and ZKsync, in particular. To address these issues, we have curated a dataset from 1 year of activity extracted from a ZKsync Era archive node and made it freely available to external parties. In this paper, we provide details on this dataset and how it was created, showcase a few example analyses that can be performed with it, and discuss some future research directions. We also publish and share the code used in our analysis on GitHub to promote reproducibility and to support further research.
- [34] arXiv:2407.18707 (cross-list from cs.LG) [pdf, other]
-
Title: Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior SelectionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $\epsilon >0$ our approach is able to return a mixture of Gaussian processes that is $\epsilon$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.
- [35] arXiv:2407.18896 (cross-list from eess.SP) [pdf, other]
-
Title: Multi-Channel Factor Analysis: Identifiability and AsymptoticsJournal-ref: IEEE Transactions on Signal Processing (2024)Subjects: Signal Processing (eess.SP); Statistics Theory (math.ST)
Recent work by Ramírez et al. [2] has introduced Multi-Channel Factor Analysis (MFA) as an extension of factor analysis to multi-channel data that allows for latent factors common to all channels as well as factors specific to each channel. This paper validates the MFA covariance model and analyzes the statistical properties of the MFA estimators. In particular, a thorough investigation of model identifiability under varying latent factor structures is conducted, and sufficient conditions for generic global identifiability of MFA are obtained. The development of these identifiability conditions enables asymptotic analysis of estimators obtained by maximizing a Gaussian likelihood, which are shown to be consistent and asymptotically normal even under misspecification of the latent factor distribution.
- [36] arXiv:2407.18909 (cross-list from astro-ph.CO) [pdf, other]
-
Title: Hybrid summary statistics: neural weak lensing inference beyond the power spectrumComments: 16 pages, 11 figures. Submitted to JCAP. We provide publicly available code at this https URLSubjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Machine Learning (stat.ML); Other Statistics (stat.OT)
In inference problems, we often have domain knowledge which allows us to define summary statistics that capture most of the information content in a dataset. In this paper, we present a hybrid approach, where such physics-based summaries are augmented by a set of compressed neural summary statistics that are optimised to extract the extra information that is not captured by the predefined summaries. The resulting statistics are very powerful inputs to simulation-based or implicit inference of model parameters. We apply this generalisation of Information Maximising Neural Networks (IMNNs) to parameter constraints from tomographic weak gravitational lensing convergence maps to find summary statistics that are explicitly optimised to complement angular power spectrum estimates. We study several dark matter simulation resolutions in low- and high-noise regimes. We show that i) the information-update formalism extracts at least $3\times$ and up to $8\times$ as much information as the angular power spectrum in all noise regimes, ii) the network summaries are highly complementary to existing 2-point summaries, and iii) our formalism allows for networks with smaller, physically-informed architectures to match much larger regression networks with far fewer simulations needed to obtain asymptotically optimal inference.
Cross submissions for Monday, 29 July 2024 (showing 17 of 17 entries )
- [37] arXiv:2307.15330 (replaced) [pdf, other]
-
Title: Group integrative dynamic factor models with application to multiple subject brain connectivitySubjects: Methodology (stat.ME); Applications (stat.AP)
This work introduces a novel framework for dynamic factor model-based group-level analysis of multiple subjects time series data, called GRoup Integrative DYnamic factor (GRIDY) models. The framework identifies and characterizes inter-subject similarities and differences between two pre-determined groups by considering a combination of group spatial information and individual temporal dynamics. Furthermore, it enables the identification of intra-subject similarities and differences over time by employing different model configurations for each subject. Methodologically, the framework combines a novel principal angle-based rank selection algorithm and a non-iterative integrative analysis framework. Inspired by simultaneous component analysis, this approach also reconstructs identifiable latent factor series with flexible covariance structures. The performance of the GRIDY models is evaluated through simulations conducted under various scenarios. An application is also presented to compare resting-state functional MRI data collected from multiple subjects in autism spectrum disorder and control groups.
- [38] arXiv:2401.11096 (replaced) [pdf, other]
-
Title: Asymptotic Normality of the Conditional Value-at-Risk based Pickands EstimatorSubjects: Statistics Theory (math.ST); Other Statistics (stat.OT)
The Pickands estimator for the extreme value index is beneficial due to its universal consistency, location, and scale invariance, which sets it apart from other types of estimators. However, similar to many extreme value index estimators, it is marked by poor asymptotic efficiency. Chen (2021) introduces a Conditional Value-at-Risk (CVaR)-based Pickands estimator, establishes its consistency, and demonstrates through simulations that this estimator significantly reduces mean squared error while preserving its location and scale invariance. The initial focus of this paper is on demonstrating the weak convergence of the empirical CVaR in functional space. Subsequently, based on the established weak convergence, the paper presents the asymptotic normality of the CVaR-based Pickands estimator. It further supports these theoretical findings with empirical evidence obtained through simulation studies.
- [39] arXiv:2402.08012 (replaced) [pdf, other]
-
Title: Online Differentially Private Synthetic Data GenerationComments: 20 pagesSubjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Probability (math.PR)
We present a polynomial-time algorithm for online differentially private synthetic data generation. For a data stream within the hypercube $[0,1]^d$ and an infinite time horizon, we develop an online algorithm that generates a differentially private synthetic dataset at each time $t$. This algorithm achieves a near-optimal accuracy bound of $O(\log(t)t^{-1/d})$ for $d\geq 2$ and $O(\log^{4.5}(t)t^{-1})$ for $d=1$ in the 1-Wasserstein distance. This result extends the previous work on the continual release model for counting queries to Lipschitz queries. Compared to the offline case, where the entire dataset is available at once, our approach requires only an extra polylog factor in the accuracy bound.
- [40] arXiv:2404.04398 (replaced) [pdf, other]
-
Title: Bayesian Methods for Modeling Cumulative Exposure to Extensive Environmental Health HazardsSubjects: Methodology (stat.ME); Applications (stat.AP)
Measuring the impact of an environmental point source exposure on the risk of disease, like cancer or childhood asthma, is well-developed. Modeling how an environmental health hazard that is extensive in space, like a wastewater canal, impacts disease risk is not. We propose a novel Bayesian generative semiparametric model for characterizing the cumulative spatial exposure to an environmental health hazard that is not well-represented by a single point in space. The model couples a dose-response model with a log-Gaussian Cox process integrated against a distance kernel with an unknown length-scale. We show that this model is a well-defined Bayesian inverse model, namely that the posterior exists under a Gaussian process prior for the log-intensity of exposure, and that a simple integral approximation adequately controls the computational error. We quantify the finite-sample properties and the computational tractability of the discretization scheme in a simulation study. Finally, we apply the model to survey data on household risk of childhood diarrheal illness from exposure to a system of wastewater canals in Mezquital Valley, Mexico.
- [41] arXiv:2405.05638 (replaced) [pdf, other]
-
Title: A Correlation-induced Finite Difference EstimatorSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Estimating stochastic gradients is pivotal in fields like service systems within operations research. The classical method for this estimation is the finite difference approximation, which entails generating samples at perturbed inputs. Nonetheless, practical challenges persist in determining the perturbation and obtaining an optimal finite difference estimator in the sense of possessing the smallest mean squared error (MSE). To tackle this problem, we propose a double sample-recycling approach in this paper. Firstly, pilot samples are recycled to estimate the optimal perturbation. Secondly, recycling these pilot samples again and generating new samples at the estimated perturbation, lead to an efficient finite difference estimator. We analyze its bias, variance and MSE. Our analyses demonstrate a reduction in asymptotic variance, and in some cases, a decrease in asymptotic bias, compared to the optimal finite difference estimator. Therefore, our proposed estimator consistently coincides with, or even outperforms the optimal finite difference estimator. In numerical experiments, we apply the estimator in several examples, and numerical results demonstrate its robustness, as well as coincidence with the theory presented, especially in the case of small sample sizes.
- [42] arXiv:2405.06763 (replaced) [pdf, other]
-
Title: Post-selection inference for causal effects after causal discoverySubjects: Methodology (stat.ME)
Algorithms for constraint-based causal discovery select graphical causal models among a space of possible candidates (e.g., all directed acyclic graphs) by executing a sequence of conditional independence tests. These may be used to inform the estimation of causal effects (e.g., average treatment effects) when there is uncertainty about which covariates ought to be adjusted for, or which variables act as confounders versus mediators. However, naively using the data twice, for model selection and estimation, would lead to invalid confidence intervals. Moreover, if the selected graph is incorrect, the inferential claims may apply to a selected functional that is distinct from the actual causal effect. We propose an approach to post-selection inference that is based on a resampling and screening procedure, which essentially performs causal discovery multiple times with randomly varying intermediate test statistics. Then, an estimate of the target causal effect and corresponding confidence sets are constructed from a union of individual graph-based estimates and intervals. We show that this construction has asymptotically correct coverage for the true causal effect parameter. Importantly, the guarantee holds for a fixed population-level effect, not a data-dependent or selection-dependent quantity. Most of our exposition focuses on the PC-algorithm for learning directed acyclic graphs and the multivariate Gaussian case for simplicity, but the approach is general and modular, so it may be used with other conditional independence based discovery algorithms and distributional families.
- [43] arXiv:2405.18722 (replaced) [pdf, other]
-
Title: Adaptive and Efficient Learning with Blockwise Missing and Semi-Supervised DataSubjects: Methodology (stat.ME)
Data fusion is an important way to realize powerful and generalizable analyses across multiple sources. However, different capability of data collection across the sources has become a prominent issue in practice. This could result in the blockwise missingness (BM) of covariates troublesome for integration. Meanwhile, the high cost of obtaining gold-standard labels can cause the missingness of response on a large proportion of samples, known as the semi-supervised (SS) problem. In this paper, we consider a challenging scenario confronting both the BM and SS issues, and propose a novel Data-adaptive projecting Estimation approach for data FUsion in the SEmi-supervised setting (DEFUSE). Starting with a complete-data-only estimator, it involves two successive projection steps to reduce its variance without incurring bias. Compared to existing approaches, DEFUSE achieves a two-fold improvement. First, it leverages the BM labeled sample more efficiently through a novel data-adaptive projection approach robust to model misspecification on the missing covariates, leading to better variance reduction. Second, our method further incorporates the large unlabeled sample to enhance the estimation efficiency through imputation and projection. Compared to the previous SS setting with complete covariates, our work reveals a more essential role of the unlabeled sample in the BM setting. These advantages are justified in asymptotic and simulation studies. We also apply DEFUSE for the risk modeling and inference of heart diseases with the MIMIC-III electronic medical record (EMR) data.
- [44] arXiv:2405.19995 (replaced) [pdf, other]
-
Title: Symmetries in Overparametrized Neural Networks: A Mean-Field ViewSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
We develop a Mean-Field (MF) view of the learning dynamics of overparametrized Artificial Neural Networks (NN) under data symmetric in law wrt the action of a general compact group $G$. We consider for this a class of generalized shallow NNs given by an ensemble of $N$ multi-layer units, jointly trained using stochastic gradient descent (SGD) and possibly symmetry-leveraging (SL) techniques, such as Data Augmentation (DA), Feature Averaging (FA) or Equivariant Architectures (EA). We introduce the notions of weakly and strongly invariant laws (WI and SI) on the parameter space of each single unit, corresponding, respectively, to $G$-invariant distributions, and to distributions supported on parameters fixed by the group action (which encode EA). This allows us to define symmetric models compatible with taking $N\to\infty$ and give an interpretation of the asymptotic dynamics of DA, FA and EA in terms of Wasserstein Gradient Flows describing their MF limits. When activations respect the group action, we show that, for symmetric data, DA, FA and freely-trained models obey the exact same MF dynamic, which stays in the space of WI laws and minimizes therein the population risk. We also give a counterexample to the general attainability of an optimum over SI laws. Despite this, quite remarkably, we show that the set of SI laws is also preserved by the MF dynamics even when freely trained. This sharply contrasts the finite-$N$ setting, in which EAs are generally not preserved by unconstrained SGD. We illustrate the validity of our findings as $N$ gets larger in a teacher-student experimental setting, training a student NN to learn from a WI, SI or arbitrary teacher model through various SL schemes. We last deduce a data-driven heuristic to discover the largest subspace of parameters supporting SI distributions for a problem, that could be used for designing EA with minimal generalization error.
- [45] arXiv:2209.04329 (replaced) [pdf, other]
-
Title: Heterogeneous Treatment Effect Bounds under Sample Selection with an Application to the Effects of Social Media on Political PolarizationSubjects: Econometrics (econ.EM); Machine Learning (stat.ML)
We propose a method for estimation and inference for bounds for heterogeneous causal effect parameters in general sample selection models where the treatment can affect whether an outcome is observed and no exclusion restrictions are available. The method provides conditional effect bounds as functions of policy relevant pre-treatment variables. It allows for conducting valid statistical inference on the unidentified conditional effects. We use a flexible debiased/double machine learning approach that can accommodate non-linear functional forms and high-dimensional confounders. Easily verifiable high-level conditions for estimation, misspecification robust confidence intervals, and uniform confidence bands are provided as well. We re-analyze data from a large scale field experiment on Facebook on counter-attitudinal news subscription with attrition. Our method yields substantially tighter effect bounds compared to conventional methods and suggests depolarization effects for younger users.
- [46] arXiv:2306.08620 (replaced) [pdf, other]
-
Title: Anticipatory Music TransformerComments: TMLR accepted versionSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.
- [47] arXiv:2307.16502 (replaced) [pdf, other]
-
Title: Percolated stochastic block model via EM algorithm and belief propagation with non-backtracking spectraComments: 30 pages, 16 figuresSubjects: Combinatorics (math.CO); Methodology (stat.ME)
Whereas Laplacian and modularity based spectral clustering is apt to dense graphs, recent results show that for sparse ones, the non-backtracking spectrum is the best candidate to find assortative clusters of nodes. Here belief propagation in the sparse stochastic block model is derived with arbitrary given model parameters that results in a non-linear system of equations; with linear approximation, the spectrum of the non-backtracking matrix is able to specify the number $k$ of clusters. Then the model parameters themselves can be estimated by the EM algorithm. Bond percolation in the assortative model is considered in the following two senses: the within- and between-cluster edge probabilities decrease with the number of nodes and edges coming into existence in this way are retained with probability $\beta$. As a consequence, the optimal $k$ is the number of the structural real eigenvalues (greater than $\sqrt{c}$, where $c$ is the average degree) of the non-backtracking matrix of the graph. Assuming, these eigenvalues $\mu_1 >\dots > \mu_k$ are distinct, the multiple phase transitions obtained for $\beta$ are $\beta_i =\frac{c}{\mu_i^2}$; further, at $\beta_i$ the number of detectable clusters is $i$, for $i=1,\dots ,k$. Inflation-deflation techniques are also discussed to classify the nodes themselves, which can be the base of the sparse spectral clustering.
- [48] arXiv:2309.03731 (replaced) [pdf, other]
-
Title: Using representation balancing to learn conditional-average dose responses from clustered dataComments: 21 pages, 7 figures, v2: updated methodology and experimentsSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
Estimating a unit's responses to interventions with an associated dose, the "conditional average dose response" (CADR), is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such a response typically needs to be estimated from observational data, which introduces several challenges. That is why the machine learning (ML) community has proposed several tailored CADR estimators. Yet, the proposal of most of these methods requires strong assumptions on the distribution of data and the assignment of interventions, which go beyond the standard assumptions in causal inference. Whereas previous works have so far focused on smooth shifts in covariate distributions across doses, in this work, we will study estimating CADR from clustered data and where different doses are assigned to different segments of a population. On a novel benchmarking dataset, we show the impacts of clustered data on model performance and propose an estimator, CBRNet, that learns cluster-agnostic and hence dose-agnostic covariate representations through representation balancing for unbiased CADR inference. We run extensive experiments to illustrate the workings of our method and compare it with the state of the art in ML for CADR estimation.
- [49] arXiv:2310.09319 (replaced) [pdf, other]
-
Title: Topological Data Analysis in smart manufacturing: State of the art and futuredirectionsComments: Work accepted for publication in the Journal of Manufacturing SystemsSubjects: Machine Learning (cs.LG); Algebraic Topology (math.AT); Applications (stat.AP)
Topological Data Analysis (TDA) is a discipline that applies algebraic topology techniques to analyze complex, multi-dimensional data. Although it is a relatively new field, TDA has been widely and successfully applied across various domains, such as medicine, materials science, and biology. This survey provides an overview of the state of the art of TDA within a dynamic and promising application area: industrial manufacturing and production, particularly within the Industry 4.0 context. We have conducted a rigorous and reproducible literature search focusing on TDA applications in industrial production and manufacturing settings. The identified works are categorized based on their application areas within the manufacturing process and the types of input data. We highlight the principal advantages of TDA tools in this context, address the challenges encountered and the future potential of the field. Furthermore, we identify TDA methods that are currently underexploited in specific industrial areas and discuss how their application could be beneficial, with the aim of stimulating further research in this field. This work seeks to bridge the theoretical advancements in TDA with the practical needs of industrial production. Our goal is to serve as a guide for practitioners and researchers applying TDA in industrial production and manufacturing systems. We advocate for the untapped potential of TDA in this domain and encourage continued exploration and research.
- [50] arXiv:2311.05241 (replaced) [pdf, other]
-
Title: When Meta-Learning Meets Online and Continual Learning: A SurveySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Over the past decade, deep neural networks have demonstrated significant success using the training scheme that involves mini-batch stochastic gradient descent on extensive datasets. Expanding upon this accomplishment, there has been a surge in research exploring the application of neural networks in other learning scenarios. One notable framework that has garnered significant attention is meta-learning. Often described as "learning to learn," meta-learning is a data-driven approach to optimize the learning algorithm. Other branches of interest are continual learning and online learning, both of which involve incrementally updating a model with streaming data. While these frameworks were initially developed independently, recent works have started investigating their combinations, proposing novel problem settings and learning algorithms. However, due to the elevated complexity and lack of unified terminology, discerning differences between the learning frameworks can be challenging even for experienced researchers. To facilitate a clear understanding, this paper provides a comprehensive survey that organizes various problem settings using consistent terminology and formal descriptions. By offering an overview of these learning paradigms, our work aims to foster further advancements in this promising area of research.
- [51] arXiv:2401.13929 (replaced) [pdf, other]
-
Title: HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning ExperimentsSubjects: Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)
Major depressive disorder (MDD) presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of learning strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task (PRT) within the EMBARC study, we propose a novel RL-HMM framework for analyzing reward-based decision-making. Our model accommodates learning strategy switching between two distinct approaches under a hidden Markov model (HMM): subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient EM algorithm for parameter estimation and employ a nonparametric bootstrap for inference. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.
- [52] arXiv:2406.01253 (replaced) [pdf, other]
-
Title: animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacousticsJulian C. Schäfer-Zimmermann, Vlad Demartsev, Baptiste Averly, Kiran Dhanjal-Adams, Mathieu Duteil, Gabriella Gall, Marius Faiß, Lily Johnson-Ulrich, Dan Stowell, Marta B. Manser, Marie A. Roch, Ariana Strandburg-PeshkinSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM); Applications (stat.AP)
Bioacoustic research, vital for understanding animal behavior, conservation, and ecology, faces a monumental challenge: analyzing vast datasets where animal vocalizations are rare. While deep learning techniques are becoming standard, adapting them to bioacoustics remains difficult. We address this with animal2vec, an interpretable large transformer model, and a self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. It learns from unlabeled audio and then refines its understanding with labeled data. Furthermore, we introduce and publicly release MeerKAT: Meerkat Kalahari Audio Transcripts, a dataset of meerkat (Suricata suricatta) vocalizations with millisecond-resolution annotations, the largest labeled dataset on non-human terrestrial mammals currently available. Our model outperforms existing methods on MeerKAT and the publicly available NIPS4Bplus birdsong dataset. Moreover, animal2vec performs well even with limited labeled data (few-shot learning). animal2vec and MeerKAT provide a new reference point for bioacoustic research, enabling scientists to analyze large amounts of data even with scarce ground truth information.
- [53] arXiv:2406.02611 (replaced) [pdf, other]
-
Title: LOLA: LLM-Assisted Online Learning Algorithm for Content ExperimentsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In the rapidly evolving digital content landscape, media firms and news publishers require automated and efficient methods to enhance user engagement. This paper introduces the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery. Leveraging a large-scale dataset from Upworthy, which includes 17,681 headline A/B tests, we first investigate three pure-LLM approaches: prompt-based methods, embedding-based classification models, and fine-tuned open-source LLMs. We find that prompt-based approaches perform poorly, achieving no more than 65\% accuracy in identifying the catchier headline. In contrast, both OpenAI-embedding-based classification models and fine-tuned Llama-3 with 8 billion parameters achieve an accuracy of around 82-84\%. We then introduce LOLA, which combines the best pure-LLM approach with the Upper Confidence Bound algorithm to allocate traffic and maximize clicks adaptively. Our numerical experiments on Upworthy data show that LOLA outperforms the standard A/B test method (the current status quo at Upworthy), pure bandit algorithms, and pure-LLM approaches, particularly in scenarios with limited experimental traffic. Our approach is scalable and applicable to content experiments across various settings where firms seek to optimize user engagement, including digital advertising and social media recommendations.
- [54] arXiv:2406.10242 (replaced) [pdf, other]
-
Title: Physics-Guided Actor-Critic Reinforcement Learning for Swimming in TurbulenceComments: 23 pages, 6 figuresSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Fluid Dynamics (physics.flu-dyn); Machine Learning (stat.ML)
Turbulent diffusion causes particles placed in proximity to separate. We investigate the required swimming efforts to maintain a particle close to its passively advected counterpart. We explore optimally balancing these efforts with the intended goal by developing and comparing a novel Physics-Informed Reinforcement Learning (PIRL) strategy with prescribed control (PC) and standard physics-agnostic Reinforcement Learning strategies. Our PIRL scheme, coined the Actor-Physicist, is an adaptation of the Actor-Critic algorithm in which the Neural Network parameterized Critic is replaced with an analytically derived physical heuristic function (the physicist). This strategy is then compared with an analytically computed optimal PC policy derived from a stochastic optimal control formulation and standard physics-agnostic Actor-Critic type algorithms.
- [55] arXiv:2407.09186 (replaced) [pdf, other]
-
Title: Variational Inference via Smoothed Particle HydrodynamicsSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
A new variational inference method, SPH-ParVI, based on smoothed particle hydrodynamics (SPH), is proposed for sampling partially known densities (e.g. up to a constant) or sampling using gradients. SPH-ParVI simulates the flow of a fluid under external effects driven by the target density; transient or steady state of the fluid approximates the target density. The continuum fluid is modelled as an interacting particle system (IPS) via SPH, where each particle carries smoothed properties, interacts and evolves as per the Navier-Stokes equations. This mesh-free, Lagrangian simulation method offers fast, flexible, scalable and deterministic sampling and inference for a class of probabilistic models such as those encountered in Bayesian inference and generative modelling.
- [56] arXiv:2407.15245 (replaced) [pdf, other]
-
Title: Weyl Calculus and Exactly Solvable Schr\"{o}dinger Bridges with Quadratic State CostSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Mathematical Physics (math-ph); Machine Learning (stat.ML)
Schrödinger bridge--a stochastic dynamical generalization of optimal mass transport--exhibits a learning-control duality. Viewed as a stochastic control problem, the Schrödinger bridge finds an optimal control policy that steers a given joint state statistics to another while minimizing the total control effort subject to controlled diffusion and deadline constraints. Viewed as a stochastic learning problem, the Schrödinger bridge finds the most-likely distribution-valued trajectory connecting endpoint distributional observations, i.e., solves the two point boundary-constrained maximum likelihood problem over the manifold of probability distributions. Recent works have shown that solving the Schrödinger bridge problem with state cost requires finding the Markov kernel associated with a reaction-diffusion PDE where the state cost appears as a state-dependent reaction rate. We explain how ideas from Weyl calculus in quantum mechanics, specifically the Weyl operator and the Weyl symbol, can help determine such Markov kernels. We illustrate these ideas by explicitly finding the Markov kernel for the case of quadratic state cost via Weyl calculus, recovering our earlier results but avoiding tedious computation with Hermite polynomials.
- [57] arXiv:2407.15439 (replaced) [pdf, other]
-
Title: Merit-based Fair Combinatorial Semi-Bandit with Unrestricted Feedback DelaysComments: 28 pages, 9 figures, accepted for 27th European Conference on Artificial Intelligence (ECAI 2024), Source code addedSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the stochastic combinatorial semi-bandit problem with unrestricted feedback delays under merit-based fairness constraints. This is motivated by applications such as crowdsourcing, and online advertising, where immediate feedback is not immediately available and fairness among different choices (or arms) is crucial. We consider two types of unrestricted feedback delays: reward-independent delays where the feedback delays are independent of the rewards, and reward-dependent delays where the feedback delays are correlated with the rewards. Furthermore, we introduce merit-based fairness constraints to ensure a fair selection of the arms. We define the reward regret and the fairness regret and present new bandit algorithms to select arms under unrestricted feedback delays based on their merits. We prove that our algorithms all achieve sublinear expected reward regret and expected fairness regret, with a dependence on the quantiles of the delay distribution. We also conduct extensive experiments using synthetic and real-world data and show that our algorithms can fairly select arms with different feedback delays.