Methodology
See recent articles
- [1] arXiv:2407.17534 [pdf, other]
-
Title: Extension of W-method and A-learner for multiple binary outcomesSubjects: Methodology (stat.ME)
In this study, we compared two groups, in which subjects were assigned to either the treatment or the control group. In such trials, if the efficacy of the treatment cannot be demonstrated in a population that meets the eligibility criteria, identifying the subgroups for which the treatment is effective is desirable. Such subgroups can be identified by estimating heterogeneous treatment effects (HTE). In recent years, methods for estimating HTE have increasingly relied on complex models. Although these models improve the estimation accuracy, they often sacrifice interpretability. Despite significant advancements in the methods for continuous or univariate binary outcomes, methods for multiple binary outcomes are less prevalent, and existing interpretable methods, such as the W-method and A-learner, while capable of estimating HTE for a single binary outcome, still fail to capture the correlation structure when applied to multiple binary outcomes. We thus propose two methods for estimating HTE for multiple binary outcomes: one based on the W-method and the other based on the A-learner. We also demonstrate that the conventional A-learner introduces bias in the estimation of the treatment effect. The proposed method employs a framework based on reduced-rank regression to capture the correlation structure among multiple binary outcomes. We correct for the bias inherent in the A-learner estimates and investigate the impact of this bias through numerical simulations. Finally, we demonstrate the effectiveness of the proposed method using a real data application.
- [2] arXiv:2407.17592 [pdf, other]
-
Title: Robust Maximum $L_q$-Likelihood Covariance Estimation for Replicated Spatial DataSubjects: Methodology (stat.ME); Computation (stat.CO)
Parameter estimation with the maximum $L_q$-likelihood estimator (ML$q$E) is an alternative to the maximum likelihood estimator (MLE) that considers the $q$-th power of the likelihood values for some $q<1$. In this method, extreme values are down-weighted because of their lower likelihood values, which yields robust estimates. In this work, we study the properties of the ML$q$E for spatial data with replicates. We investigate the asymptotic properties of the ML$q$E for Gaussian random fields with a Matérn covariance function, and carry out simulation studies to investigate the numerical performance of the ML$q$E. We show that it can provide more robust and stable estimation results when some of the replicates in the spatial data contain outliers. In addition, we develop a mechanism to find the optimal choice of the hyper-parameter $q$ for the ML$q$E. The robustness of our approach is further verified on a United States precipitation dataset. Compared with other robust methods for spatial data, our proposal is more intuitive and easier to understand, yet it performs well when dealing with datasets containing outliers.
- [3] arXiv:2407.17666 [pdf, html, other]
-
Title: Causal estimands and identification of time-varying effects in non-stationary time series from N-of-1 mobile device dataXiaoxuan Cai, Li Zeng, Charlotte Fowler, Lisa Dixon, Dost Ongur, Justin T. Baker, Jukka-Pekka Onnela, Linda ValeriSubjects: Methodology (stat.ME)
Mobile technology (mobile phones and wearable devices) generates continuous data streams encompassing outcomes, exposures and covariates, presented as intensive longitudinal or multivariate time series data. The high frequency of measurements enables granular and dynamic evaluation of treatment effect, revealing their persistence and accumulation over time. Existing methods predominantly focus on the contemporaneous effect, temporal-average, or population-average effects, assuming stationarity or invariance of treatment effects over time, which are inadequate both conceptually and statistically to capture dynamic treatment effects in personalized mobile health data. We here propose new causal estimands for multivariate time series in N-of-1 studies. These estimands summarize how time-varying exposures impact outcomes in both short- and long-term. We propose identifiability assumptions and a g-formula estimator that accounts for exposure-outcome and outcome-covariate feedback. The g-formula employs a state space model framework innovatively to accommodate time-varying behavior of treatment effects in non-stationary time series. We apply the proposed method to a multi-year smartphone observational study of bipolar patients and estimate the dynamic effect of phone-based communication on mood of patients with bipolar disorder in an N-of-1 setting. Our approach reveals substantial heterogeneity in treatment effects over time and across individuals. A simulation-based strategy is also proposed for the development of a short-term, dynamic, and personalized treatment recommendation based on patient's past information, in combination with a novel positivity diagnostics plot, validating proper causal inference in time series data.
- [4] arXiv:2407.17694 [pdf, html, other]
-
Title: Doubly Robust Conditional Independence Testing with Generative Neural NetworksSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
This article addresses the problem of testing the conditional independence of two generic random vectors $X$ and $Y$ given a third random vector $Z$, which plays an important role in statistical and machine learning applications. We propose a new non-parametric testing procedure that avoids explicitly estimating any conditional distributions but instead requires sampling from the two marginal conditional distributions of $X$ given $Z$ and $Y$ given $Z$. We further propose using a generative neural network (GNN) framework to sample from these approximated marginal conditional distributions, which tends to mitigate the curse of dimensionality due to its adaptivity to any low-dimensional structures and smoothness underlying the data. Theoretically, our test statistic is shown to enjoy a doubly robust property against GNN approximation errors, meaning that the test statistic retains all desirable properties of the oracle test statistic utilizing the true marginal conditional distributions, as long as the product of the two approximation errors decays to zero faster than the parametric rate. Asymptotic properties of our statistic and the consistency of a bootstrap procedure are derived under both null and local alternatives. Extensive numerical experiments and real data analysis illustrate the effectiveness and broad applicability of our proposed test.
- [5] arXiv:2407.17804 [pdf, other]
-
Title: Bayesian Spatiotemporal WomblingComments: 198 pagesSubjects: Methodology (stat.ME)
Stochastic process models for spatiotemporal data underlying random fields find substantial utility in a range of scientific disciplines. Subsequent to predictive inference on the values of the random field (or spatial surface indexed continuously over time) at arbitrary space-time coordinates, scientific interest often turns to gleaning information regarding zones of rapid spatial-temporal change. We develop Bayesian modeling and inference for directional rates of change along a given surface. These surfaces, which demarcate regions of rapid change, are referred to as ``wombling'' surface boundaries. Existing methods for studying such changes have often been associated with curves and are not easily extendable to surfaces resulting from curves evolving over time. Our current contribution devises a fully model-based inferential framework for analyzing differential behavior in spatiotemporal responses by formalizing the notion of a ``wombling'' surface boundary using conventional multi-linear vector analytic frameworks and geometry followed by posterior predictive computations using triangulated surface approximations. We illustrate our methodology with comprehensive simulation experiments followed by multiple applications in environmental and climate science; pollutant analysis in environmental health; and brain imaging.
- [6] arXiv:2407.17848 [pdf, html, other]
-
Title: Bayesian Benchmarking Small Area Estimation via Entropic TiltingComments: 26 pagesSubjects: Methodology (stat.ME)
Benchmarking estimation and its risk evaluation is a practically important issue in small area estimation. While hierarchical Bayesian methods have been widely adopted in small area estimation, a unified Bayesian approach to benchmarking estimation has not been fully discussed. This work employs an entropic tilting method to modify the posterior distribution of the small area parameters to meet the benchmarking constraint, which enables us to obtain benchmarked point estimation as well as reasonable uncertainty quantification. Using conditionally independent structures of the posterior, we first introduce general Monte Carlo methods for obtaining a benchmarked posterior and then show that the benchmarked posterior can be obtained in an analytical form for some representative small area models. We demonstrate the usefulness of the proposed method through simulation and empirical studies.
- [7] arXiv:2407.17920 [pdf, html, other]
-
Title: Tobit Exponential Smoothing, towards an enhanced demand planning in the presence of censored dataComments: 18 pages, 5 figures, 3 tables,Subjects: Methodology (stat.ME)
ExponenTial Smoothing (ETS) is a widely adopted forecasting technique in both research and practical applications. One critical development in ETS was the establishment of a robust statistical foundation based on state space models with a single source of error. However, an important challenge in ETS that remains unsolved is censored data estimation. This issue is critical in supply chain management, in particular, when companies have to deal with stockouts. This work solves that problem by proposing the Tobit ETS, which extends the use of ETS models to handle censored data efficiently. This advancement builds upon the linear models taxonomy and extends it to encompass censored data scenarios. The results show that the Tobit ETS reduces considerably the forecast bias. Real and simulation data are used from the airline and supply chain industries to corroborate the findings.
- [8] arXiv:2407.18077 [pdf, html, other]
-
Title: An Alternating Direction Method of Multipliers Algorithm for the Weighted Fused LASSO Signal ApproximatorSubjects: Methodology (stat.ME)
We present an Alternating Direction Method of Multipliers (ADMM) algorithm designed to solve the Weighted Generalized Fused LASSO Signal Approximator (wFLSA). First, we show that wFLSAs can always be reformulated as a Generalized LASSO problem. With the availability of algorithms tailored to the Generalized LASSO, the issue appears to be, in principle, resolved. However, the computational complexity of these algorithms is high, with a time complexity of $O(p^4)$ for a single iteration, where $p$ represents the number of coefficients. To overcome this limitation, we propose an ADMM algorithm specifically tailored for wFLSA-equivalent problems, significantly reducing the complexity to $O(p^2)$. Our algorithm is publicly accessible through the R package wflsa.
- [9] arXiv:2407.18166 [pdf, html, other]
-
Title: Identification and multiply robust estimation of causal effects via instrumental variables from an auxiliary heterogeneous populationSubjects: Methodology (stat.ME)
Evaluating causal effects in a primary population of interest with unmeasured confounders is challenging. Although instrumental variables (IVs) are widely used to address unmeasured confounding, they may not always be available in the primary population. Fortunately, IVs might have been used in previous observational studies on similar causal problems, and these auxiliary studies can be useful to infer causal effects in the primary population, even if they represent different populations. However, existing methods often assume homogeneity or equality of conditional average treatment effects between the primary and auxiliary populations, which may be limited in practice. This paper aims to remove the homogeneity requirement and establish a novel identifiability result allowing for different conditional average treatment effects across populations. We also construct a multiply robust estimator that remains consistent despite partial misspecifications of the observed data model and achieves local efficiency if all nuisance models are correct. The proposed approach is illustrated through simulation studies. We finally apply our approach by leveraging data from lower income individuals with cigarette price as a valid IV to evaluate the causal effect of smoking on physical functional status in higher income group where strong IVs are not available.
New submissions for Friday, 26 July 2024 (showing 9 of 9 entries )
- [10] arXiv:2407.17565 (cross-list from astro-ph.GA) [pdf, html, other]
-
Title: Periodicity significance testing with null-signal templates: reassessment of PTF's SMBH binary candidatesComments: 13 pages, 12 figuresSubjects: Astrophysics of Galaxies (astro-ph.GA); High Energy Astrophysical Phenomena (astro-ph.HE); Instrumentation and Methods for Astrophysics (astro-ph.IM); Methodology (stat.ME)
Periodograms are widely employed for identifying periodicity in time series data, yet they often struggle to accurately quantify the statistical significance of detected periodic signals when the data complexity precludes reliable simulations. We develop a data-driven approach to address this challenge by introducing a null-signal template (NST). The NST is created by carefully randomizing the period of each cycle in the periodogram template, rendering it non-periodic. It has the same frequentist properties as a periodic signal template regardless of the noise probability distribution, and we show with simulations that the distribution of false positives is the same as with the original periodic template, regardless of the underlying data. Thus, performing a periodicity search with the NST acts as an effective simulation of the null (no-signal) hypothesis, without having to simulate the noise properties of the data. We apply the NST method to the supermassive black hole binaries (SMBHB) search in the Palomar Transient Factory (PTF), where Charisi et al. had previously proposed 33 high signal to (white) noise candidates utilizing simulations to quantify their significance. Our approach reveals that these simulations do not capture the complexity of the real data. There are no statistically significant periodic signal detections above the non-periodic background. To improve the search sensitivity we introduce a Gaussian quadrature based algorithm for the Bayes Factor with correlated noise as a test statistic, in contrast to the standard signal to white noise. We show with simulations that this improves sensitivity to true signals by more than an order of magnitude. However, using the Bayes Factor approach also results in no statistically significant detections in the PTF data.
- [11] arXiv:2407.17658 (cross-list from stat.AP) [pdf, html, other]
-
Title: Semiparametric Piecewise Accelerated Failure Time Model for the Analysis of Immune-Oncology Clinical TrialsSubjects: Applications (stat.AP); Methodology (stat.ME)
Effectiveness of immune-oncology chemotherapies has been presented in recent clinical trials. The Kaplan-Meier estimates of the survival functions of the immune therapy and the control often suggested the presence of the lag-time until the immune therapy began to act. It implies the use of hazard ratio under the proportional hazards assumption would not be appealing, and many alternatives have been investigated such as the restricted mean survival time. In addition to such overall summary of the treatment contrast, the lag-time is also an important feature of the treatment effect. Identical survival functions up to the lag-time implies patients who are likely to die before the lag-time would not benefit the treatment and identifying such patients would be very important. We propose the semiparametric piecewise accelerated failure time model and its inference procedure based on the semiparametric maximum likelihood method. It provides not only an overall treatment summary, but also a framework to identify patients who have less benefit from the immune-therapy in a unified way. Numerical experiments confirm that each parameter can be estimated with minimal bias. Through a real data analysis, we illustrate the evaluation of the effect of immune-oncology therapy and the characterization of covariates in which patients are unlikely to receive the benefit of treatment.
Cross submissions for Friday, 26 July 2024 (showing 2 of 2 entries )
- [12] arXiv:2106.11043 (replaced) [pdf, html, other]
-
Title: Scalable Bayesian inference for time series via divide-and-conquerSubjects: Methodology (stat.ME); Computation (stat.CO)
Bayesian computational algorithms tend to scale poorly as data size increases. This has motivated divide-and-conquer-based approaches for scalable inference. These divide the data into subsets, perform inference for each subset in parallel, and then combine these inferences. While appealing theoretical properties and practical performance have been demonstrated for independent observations, scalable inference for dependent data remains challenging. In this work, we study the problem of Bayesian inference from very long time series. The literature in this area focuses mainly on approximate approaches that usually lack rigorous theoretical guarantees and may provide arbitrarily poor accuracy in practice. We propose a simple and scalable divide-and-conquer method, and provide accuracy guarantees. Numerical simulations and real data applications demonstrate the effectiveness of our approach.
- [13] arXiv:2202.07277 (replaced) [pdf, other]
-
Title: Performing global sensitivity analysis on simulations of a continuous-time Markov chain model motivated by epidemiologyHenri Mermoz Kouye (INRAE, MaIAGE, AIRSEA), Gildas Mazo (INRAE, MaIAGE), Clémentine Prieur (AIRSEA), Elisabeta Vergu (INRAE, MaIAGE)Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
In this paper we apply a methodology introduced in Navarro Jimenez et al (2016) in the framework of chemical reaction networks to perform a global sensitivity analysis on simulations of a continuous-time Markov chain model motivated by epidemiology. Our goal is to quantify not only the effects of uncertain parameters such as epidemic parameters (transmission rate, mean sojourn duration in compartments), but also those of intrinsic randomness and interactions between epidemic parameters and intrinsic randomness. For that purpose, following what was proposed in Navarro Jimenez et al, we leverage three exact simulation algorithms for continuous-time Markov chains from the state of the art which we combine with common tools from variance-based sensitivity analysis as introduced in Sobol (1993). Also, we discuss the impact of the choice of the simulation algorithm used for the simulations on the results of sensitivity analysis. Such a discussion is new, at least to our knowledge. In a numerical section, we implement and compare three sensitivity analyses based on simulations obtained from different exact simulation algorithms of a SARS-CoV-2 epidemic model.
- [14] arXiv:2202.08728 (replaced) [pdf, html, other]
-
Title: Nonparametric extensions of randomized response for private confidence setsComments: 50 pages, 7 figures, to appear in the 2023 International Conference on Machine Learning with an Oral PresentationSubjects: Methodology (stat.ME); Cryptography and Security (cs.CR); Statistics Theory (math.ST); Machine Learning (stat.ML)
This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP). Given bounded observations $(X_1, \dots, X_n)$ with mean $\mu^\star$ that are privatized into $(Z_1, \dots, Z_n)$, we present confidence intervals (CI) and time-uniform confidence sequences (CS) for $\mu^\star$ when only given access to the privatized data. To achieve this, we study a nonparametric and sequentially interactive generalization of Warner's famous ``randomized response'' mechanism, satisfying LDP for arbitrary bounded random variables, and then provide CIs and CSs for their means given access to the resulting privatized observations. For example, our results yield private analogues of Hoeffding's inequality in both fixed-time and time-uniform regimes. We extend these Hoeffding-type CSs to capture time-varying (non-stationary) means, and conclude by illustrating how these methods can be used to conduct private online A/B tests.
- [15] arXiv:2305.14275 (replaced) [pdf, html, other]
-
Title: Variational Inference with Coverage Guarantees in Simulation-Based InferenceSubjects: Methodology (stat.ME); Machine Learning (cs.LG)
Amortized variational inference is an often employed framework in simulation-based inference that produces a posterior approximation that can be rapidly computed given any new observation. Unfortunately, there are few guarantees about the quality of these approximate posteriors. We propose Conformalized Amortized Neural Variational Inference (CANVI), a procedure that is scalable, easily implemented, and provides guaranteed marginal coverage. Given a collection of candidate amortized posterior approximators, CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor. CANVI ensures that the resulting predictor constructs regions that contain the truth with a user-specified level of probability. CANVI is agnostic to design decisions in formulating the candidate approximators and only requires access to samples from the forward model, permitting its use in likelihood-free settings. We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation. Finally, we demonstrate the accurate calibration and high predictive efficiency of CANVI on a suite of simulation-based inference benchmark tasks and an important scientific task: analyzing galaxy emission spectra.
- [16] arXiv:2309.16373 (replaced) [pdf, html, other]
-
Title: Regularization and Model Selection for Ordinal-on-Ordinal Regression with Applications to Food Products' Testing and Survey DataSubjects: Methodology (stat.ME); Applications (stat.AP)
Ordinal data are quite common in applied statistics. Although some model selection and regularization techniques for categorical predictors and ordinal response models have been developed over the past few years, less work has been done concerning ordinal-on-ordinal regression. Motivated by a consumer test and a survey on the willingness to pay for luxury food products consisting of Likert-type items, we propose a strategy for smoothing and selecting ordinally scaled predictors in the cumulative logit model. First, the group lasso is modified by the use of difference penalties on neighboring dummy coefficients, thus taking into account the predictors' ordinal structure. Second, a fused lasso-type penalty is presented for the fusion of predictor categories and factor selection. The performance of both approaches is evaluated in simulation studies and on real-world data.
- [17] arXiv:2310.16213 (replaced) [pdf, html, other]
-
Title: Bayes Factors Based on Test Statistics and Non-Local Moment Prior DensitiesSubjects: Methodology (stat.ME)
We describe Bayes factors based on z, t, $\chi^2$, and F statistics when non-local moment prior distributions are used to define alternative hypotheses. The non-local alternative prior distributions are centered on standardized effects. The prior densities include a dispersion parameter that can be used to model prior precision and the variation of effect sizes across replicated experiments. We examine the convergence rates of Bayes factors under true null and true alternative hypotheses and show how these Bayes factors can be used to construct Bayes factor functions. An example illustrates the application of resulting Bayes factors to psychological experiments.
- [18] arXiv:2402.19214 (replaced) [pdf, html, other]
-
Title: A Bayesian approach with Gaussian priors to the inverse problem of source identification in elliptic PDEsComments: 21 Pages, 8 figures, 5 tables. To appear in BAYSM 2023 proceedingsSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We consider the statistical linear inverse problem of making inference on an unknown source function in an elliptic partial differential equation from noisy observations of its solution. We employ nonparametric Bayesian procedures based on Gaussian priors, leading to convenient conjugate formulae for posterior inference. We review recent results providing theoretical guarantees on the quality of the resulting posterior-based estimation and uncertainty quantification, and we discuss the application of the theory to the important classes of Gaussian series priors defined on the Dirichlet-Laplacian eigenbasis and Matérn process priors. We provide an implementation of posterior inference for both classes of priors, and investigate its performance in a numerical simulation study.
- [19] arXiv:2405.07910 (replaced) [pdf, other]
-
Title: A Unification of Exchangeability and Continuous Exposure and Confounder Measurement Errors: Probabilistic ExchangeabilityComments: Revisited/RevisedSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Exchangeability concerning a continuous exposure, X, may be assumed to identify average exposure effects of X, AEE(X). When X is measured with error (Xep), three challenges arise. First, exchangeability regarding Xep does not equal exchangeability regarding X. Second, the non-differential error assumption (NDEA) could be overly stringent in practice. Third, a definition of exchangeability that implies that AEE(Xep) can differ from AEE(X) is lacking. To address them, this article proposes unifying exchangeability and exposure/confounder measurement errors with three novel concepts. The first, Probabilistic Exchangeability (PE) is an exchangeability assumption that allows for the difference between AEE(Xep) and AEE(X). The second concept, Emergent Pseudo Confounding (EPC), describes the bias introduced by exposure measurement error through mechanisms like confounding mechanisms. The third, Emergent Confounding, describes when bias due to confounder measurement error arises. PE requires adjustment for E(P)C, which can be performed like confounding adjustment. Under PE, the coefficient of determination (R2) in the regression of Xep against X may sometimes be sufficient to measure the difference between AEE(Xep) and AEE(X) in risk difference and ratio scales. This paper provides comprehensive insight into when AEE(Xep) is a surrogate of AEE(X). Differential errors could be addressed and may not compromise causal inference