Skip to main content

Showing 1–50 of 592 results for author: Wang, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.17276  [pdf, other

    stat.ML cs.LG

    Minimax and Communication-Efficient Distributed Best Subset Selection with Oracle Property

    Authors: Jingguo Lan, Hongmei Lin, Xueqin Wang

    Abstract: The explosion of large-scale data in fields such as finance, e-commerce, and social media has outstripped the processing capabilities of single-machine systems, driving the need for distributed statistical inference methods. Traditional approaches to distributed inference often struggle with achieving true sparsity in high-dimensional datasets and involve high computational costs. We propose a nov… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.08567  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

    Authors: Xue Wang, Tian Zhou, Jianqing Zhu, Jialin Liu, Kun Yuan, Tao Yao, Wotao Yin, Rong Jin, HanQin Cai

    Abstract: Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challengin… ▽ More

    Submitted 23 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  3. arXiv:2408.08493  [pdf, other

    cs.LG stat.ML

    Fishers Harvest Parallel Unlearning in Inherited Model Networks

    Authors: Xiao Liu, Mingyuan Li, Xu Wang, Guangsheng Yu, Wei Ni, Lixiang Li, Haipeng Peng, Renping Liu

    Abstract: Unlearning in various learning frameworks remains challenging, with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance. A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Ac… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  4. arXiv:2408.05990  [pdf, other

    stat.ML cs.LG

    Parameters Inference for Nonlinear Wave Equations with Markovian Switching

    Authors: Yi Zhang, Zhikun Zhang, Xiangjun Wang

    Abstract: Traditional partial differential equations with constant coefficients often struggle to capture abrupt changes in real-world phenomena, leading to the development of variable coefficient PDEs and Markovian switching models. Recently, research has introduced the concept of PDEs with Markov switching models, established their well-posedness and presented numerical methods. However, there has been li… ▽ More

    Submitted 30 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  5. arXiv:2408.05740  [pdf, other

    cs.LG cs.AI stat.ML

    MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent Imputation

    Authors: Jianping Zhou, Junhao Li, Guanjie Zheng, Xinbing Wang, Chenghu Zhou

    Abstract: Missing values are prevalent in multivariate time series, compromising the integrity of analyses and degrading the performance of downstream tasks. Consequently, research has focused on multivariate time series imputation, aiming to accurately impute the missing values based on available observations. A key research question is how to ensure imputation consistency, i.e., intra-consistency between… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, accepted by CIKM2024

  6. arXiv:2408.01697  [pdf, other

    cs.LG cs.AI stat.ML

    Invariant Graph Learning Meets Information Bottleneck for Out-of-Distribution Generalization

    Authors: Wenyu Mao, Jiancan Wu, Haoyang Liu, Yongduo Sui, Xiang Wang

    Abstract: Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning since graph neural networks (GNNs) often suffer from severe performance degradation under distribution shifts. Invariant learning, aiming to extract invariant features across varied distributions, has recently emerged as a promising approach for OOD generation. Despite the great success of invariant learning… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  7. arXiv:2408.01509  [pdf, other

    stat.AP

    Reconstructing and Forecasting Marine Dynamic Variable Fields across Space and Time Globally and Gaplessly

    Authors: Zhixi Xiong, Yukang Jiang, Wenfang Lu, Xueqin Wang, Ting Tian

    Abstract: Spatiotemporal projections in marine science are essential for understanding ocean systems and their impact on Earth's climate. However, existing AI-based and statistics-based inversion methods face challenges in leveraging ocean data, generating continuous outputs, and incorporating physical constraints. We propose the Marine Dynamic Reconstruction and Forecast Neural Networks (MDRF-Net), which i… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 35 pages, 6 figures

  8. arXiv:2407.04248  [pdf, other

    stat.ML cs.LG

    Machine Learning for Complex Systems with Abnormal Pattern by Exception Maximization Outlier Detection Method

    Authors: Zhikun Zhang, Yiting Duan, Xiangjun Wang, Mingyuan Zhang

    Abstract: This paper proposes a novel fast online methodology for outlier detection called the exception maximization outlier detection method(EMODM), which employs probabilistic models and statistical algorithms to detect abnormal patterns from the outputs of complex systems. The EMODM is based on a two-state Gaussian mixture model and demonstrates strong performance in probability anomaly detection workin… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  9. Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I

    Authors: Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang, Michael Bendersky

    Abstract: The traditional evaluation of information retrieval (IR) systems is generally very costly as it requires manual relevance annotation from human experts. Recent advancements in generative artificial intelligence -- specifically large language models (LLMs) -- can generate relevance annotations at an enormous scale with relatively small computational costs. Potentially, this could alleviate the cost… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: KDD '24

  10. arXiv:2406.15523  [pdf, other

    cs.LG stat.ML

    Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

    Authors: Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Kaize Ding, Rui Miao, Ying Wang, Shirui Pan, Xin Wang

    Abstract: To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  11. arXiv:2406.12017  [pdf, other

    stat.ML cs.LG stat.CO

    Sparsity-Constraint Optimization via Splicing Iteration

    Authors: Zezhi Wang, Jin Zhu, Junxian Zhu, Borui Tang, Hongmei Lin, Xueqin Wang

    Abstract: Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEratio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 34 pages

  12. arXiv:2406.10473  [pdf, other

    stat.ME

    Design-based variance estimation of the Hájek effect estimator in stratified and clustered experiments

    Authors: Xinhe Wang, Ben B. Hansen

    Abstract: Randomized controlled trials (RCTs) are used to evaluate treatment effects. When individuals are grouped together, clustered RCTs are conducted. Stratification is recommended to reduce imbalance of baseline covariates between treatment and control. In practice, this can lead to comparisons between clusters of very different sizes. As a result, direct adjustment estimators that average differences… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  13. arXiv:2406.08097  [pdf, other

    cs.LG stat.AP stat.ME

    Inductive Global and Local Manifold Approximation and Projection

    Authors: Jungeum Kim, Xiao Wang

    Abstract: Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis. The significant impact of t-SNE and UMAP has catalyzed intense research interest, seeking further innovations toward visualizing not only the local but also the global structure information of the data. Moreover, there have been… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.06426  [pdf, other

    stat.ME

    Biomarker-Guided Adaptive Enrichment Design with Threshold Detection for Clinical Trials with Time-to-Event Outcome

    Authors: Kaiyuan Hua, Hwanhee Hong, Xiaofei Wang

    Abstract: Biomarker-guided designs are increasingly used to evaluate personalized treatments based on patients' biomarker status in Phase II and III clinical trials. With adaptive enrichment, these designs can improve the efficiency of evaluating the treatment effect in biomarker-positive patients by increasing their proportion in the randomized trial. While time-to-event outcomes are often used as the prim… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  15. arXiv:2406.03849  [pdf

    cs.LG stat.AP stat.ML

    A Noise-robust Multi-head Attention Mechanism for Formation Resistivity Prediction: Frequency Aware LSTM

    Authors: Yongan Zhang, Junfeng Zhao, Jian Li, Xuanran Wang, Youzhuang Sun, Yuntian Chen, Dongxiao Zhang

    Abstract: The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  16. arXiv:2405.20970  [pdf, other

    stat.ML cs.LG

    PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

    Authors: Xiaoke Wang, Xiaochen Yang, Rui Zhu, Jing-Hao Xue

    Abstract: Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 24 pages, 6 figures

  17. arXiv:2405.18932  [pdf, other

    stat.ML cs.LG

    A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

    Authors: Gaoxiang Zhao, Lu Wang, Xiaoqiang Wang

    Abstract: The effectiveness of anomaly signal detection can be significantly undermined by the inherent uncertainty of relying on one specified model. Under the framework of model average methods, this paper proposes a novel criterion to select the weights on aggregation of multiple models, wherein the focal loss function accounts for the classification of extremely imbalanced data. This strategy is further… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  18. arXiv:2405.15991  [pdf, other

    cs.LG cs.AI stat.ML

    Rényi Neural Processes

    Authors: Xuesong Wang, He Zhao, Edwin V. Bonilla

    Abstract: Neural Processes (NPs) are variational frameworks that aim to represent stochastic processes with deep neural networks. Despite their obvious benefits in uncertainty estimation for complex distributions via data-driven priors, NPs enforce network parameter sharing between the conditional prior and posterior distributions, thereby risking introducing a misspecified prior. We hereby propose Rényi Ne… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  19. arXiv:2405.12838  [pdf, ps, other

    quant-ph stat.CO

    Quantum Non-Identical Mean Estimation: Efficient Algorithms and Fundamental Limits

    Authors: Jiachen Hu, Tongyang Li, Xinzhao Wang, Yecheng Xue, Chenyi Zhang, Han Zhong

    Abstract: We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve qua… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 31 pages, 0 figure. To appear in the 19th Theory of Quantum Computation, Communication and Cryptography (TQC 2024)

  20. arXiv:2405.08699  [pdf

    stat.ML cs.LG

    Weakly-supervised causal discovery based on fuzzy knowledge and complex data complementarity

    Authors: Wenrui Li, Wei Zhang, Qinghao Zhang, Xuegong Zhang, Xiaowo Wang

    Abstract: Causal discovery based on observational data is important for deciphering the causal mechanism behind complex systems. However, the effectiveness of existing causal discovery methods is limited due to inferior prior knowledge, domain inconsistencies, and the challenges of high-dimensional datasets with small sample sizes. To address this gap, we propose a novel weakly-supervised fuzzy knowledge an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  21. arXiv:2405.07138  [pdf, other

    stat.ME

    Large-dimensional Robust Factor Analysis with Group Structure

    Authors: Yong He, Xiaoyang Ma, Xingheng Wang, Yalin Wang

    Abstract: In this paper, we focus on exploiting the group structure for large-dimensional factor models, which captures the homogeneous effects of common factors on individuals within the same group. In view of the fact that datasets in macroeconomics and finance are typically heavy-tailed, we propose to identify the unknown group structure using the agglomerative hierarchical clustering algorithm and an in… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  22. arXiv:2405.06613  [pdf, other

    stat.ME

    Simultaneously detecting spatiotemporal changes with penalized Poisson regression models

    Authors: Zerui Zhang, Xin Wang, Xin Zhang, Jing Zhang

    Abstract: In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. This study aims to address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data. We introduce an innovative method based on the Poisson regression model, employing doubly fused penalization to unveil the u… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  23. arXiv:2405.03734  [pdf, other

    cs.HC cs.AI stat.AP

    FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering

    Authors: Silan Hu, Xiaoning Wang

    Abstract: Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduce… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  24. arXiv:2404.19242  [pdf, other

    cs.CV eess.IV stat.ME

    A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

    Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

    Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

  25. arXiv:2404.18008  [pdf, other

    cs.LG stat.AP

    Implicit Generative Prior for Bayesian Neural Networks

    Authors: Yijia Liu, Xiao Wang

    Abstract: Predictive uncertainty quantification is crucial for reliable decision-making in various applied domains. Bayesian neural networks offer a powerful framework for this task. However, defining meaningful priors and ensuring computational efficiency remain significant challenges, especially for complex real-world applications. This paper addresses these challenges by proposing a novel neural adaptive… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  26. arXiv:2404.14786  [pdf, other

    cs.AI cs.LG stat.ME

    RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model

    Authors: Peiwen Li, Xin Wang, Zeyang Zhang, Yuan Meng, Fang Shen, Yue Li, Jialong Wang, Yang Li, Wenweu Zhu

    Abstract: In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  27. arXiv:2404.13557  [pdf, other

    stat.ML cs.LG

    Preconditioned Neural Posterior Estimation for Likelihood-free Inference

    Authors: Xiaoyu Wang, Ryan P. Kelly, David J. Warne, Christopher Drovandi

    Abstract: Simulation based inference (SBI) methods enable the estimation of posterior distributions when the likelihood function is intractable, but where model simulation is feasible. Popular neural approaches to SBI are the neural posterior estimator (NPE) and its sequential version (SNPE). These methods can outperform statistical SBI approaches such as approximate Bayesian computation (ABC), particularly… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 31 pages, 11 figures

  28. arXiv:2404.11579  [pdf, other

    stat.ME

    Spatial Heterogeneous Additive Partial Linear Model: A Joint Approach of Bivariate Spline and Forest Lasso

    Authors: Xin Zhang, Shan Yu, Zhengyuan Zhu, Xin Wang

    Abstract: Identifying spatial heterogeneous patterns has attracted a surge of research interest in recent years, due to its important applications in various scientific and engineering fields. In practice the spatially heterogeneous components are often mixed with components which are spatially smooth, making the task of identifying the heterogeneous regions more challenging. In this paper, we develop an ef… ▽ More

    Submitted 3 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  29. arXiv:2404.08667  [pdf, other

    eess.SY stat.AP

    Traffic State Estimation and Uncertainty Quantification at Signalized Intersections with Low Penetration Rate Vehicle Trajectory Data

    Authors: Xingmin Wang, Zihao Wang, Zachary Jerome, Henry X. Liu

    Abstract: This paper studies the traffic state estimation problem at signalized intersections with low penetration rate vehicle trajectory data. While many existing studies have proposed different methods to estimate unknown traffic states and parameters (e.g., penetration rate, queue length) with this data, most of them only provide a point estimation without knowing the uncertainty of these estimated valu… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  30. arXiv:2404.08128  [pdf, other

    stat.ME

    Inference of treatment effect and its regional modifiers using restricted mean survival time in multi-regional clinical trials

    Authors: Kaiyuan Hua, Hwanhee Hong, Xiaofei Wang

    Abstract: Multi-regional clinical trials (MRCTs) play an increasingly crucial role in global pharmaceutical development by expediting data gathering and regulatory approval across diverse patient populations. However, differences in recruitment practices and regional demographics often lead to variations in study participant characteristics, potentially biasing treatment effect estimates and undermining tre… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  31. arXiv:2404.01191  [pdf, other

    stat.ME

    A Semiparametric Approach for Robust and Efficient Learning with Biobank Data

    Authors: Molei Liu, Xinyi Wang, Chuan Hong

    Abstract: With the increasing availability of electronic health records (EHR) linked with biobank data for translational research, a critical step in realizing its potential is to accurately classify phenotypes for patients. Existing approaches to achieve this goal are based on error-prone EHR surrogate outcomes, assisted and validated by a small set of labels obtained via medical chart review, which may al… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  32. arXiv:2403.18540  [pdf, other

    stat.ML cs.LG stat.CO

    skscope: Fast Sparsity-Constrained Optimization in Python

    Authors: Zezhi Wang, Jin Zhu, Peng Chen, Huiyang Peng, Xiaoke Zhang, Anran Wang, Junxian Zhu, Xueqin Wang

    Abstract: Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two… ▽ More

    Submitted 22 August, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 4 pages;add experiment

  33. Green's matching: an efficient approach to parameter estimation in complex dynamic systems

    Authors: Jianbin Tan, Guoyu Zhang, Xueqin Wang, Hui Huang, Fang Yao

    Abstract: Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statist… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 40 pages, 4 figures

    Journal ref: Journal of the Royal Statistical Society: Series B, 2024

  34. arXiv:2403.06942  [pdf, other

    eess.SY cs.LG stat.ML

    Grid Monitoring and Protection with Continuous Point-on-Wave Measurements and Generative AI

    Authors: Lang Tong, Xinyi Wang, Qing Zhao

    Abstract: Purpose This article presents a case for a next-generation grid monitoring and control system, leveraging recent advances in generative artificial intelligence (AI), machine learning, and statistical inference. Advancing beyond earlier generations of wide-area monitoring systems built upon supervisory control and data acquisition (SCADA) and synchrophasor technologies, we argue for a monitoring an… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  35. arXiv:2403.04015  [pdf, other

    cs.LG cs.AI stat.ML

    Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent

    Authors: Xinyuan Wang, Dongjie Wang, Wangyang Ying, Rui Xie, Haifeng Chen, Yanjie Fu

    Abstract: Feature selection prepares the AI-readiness of data by eliminating redundant features. Prior research falls into two primary categories: i) Supervised Feature Selection, which identifies the optimal feature subset based on their relevance to the target variable; ii) Unsupervised Feature Selection, which reduces the feature space dimensionality by capturing the essential information within the feat… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  36. arXiv:2402.14438  [pdf, ps, other

    stat.ME

    Efficiency-improved doubly robust estimation with non-confounding predictive covariates

    Authors: Shanshan Luo, Mengchen Shi, Wei Li, Xueli Wang, Zhi Geng

    Abstract: In observational studies, covariates with substantial missing data are often omitted, despite their strong predictive capabilities. These excluded covariates are generally believed not to simultaneously affect both treatment and outcome, indicating that they are not genuine confounders and do not impact the identification of the average treatment effect (ATE). In this paper, we introduce an altern… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  37. arXiv:2402.13870  [pdf, ps, other

    cs.LG eess.SP stat.AP

    Generative Probabilistic Time Series Forecasting and Applications in Grid Operations

    Authors: Xinyi Wang, Lang Tong, Qing Zhao

    Abstract: Generative probabilistic forecasting produces future time series samples according to the conditional probability distribution given past time series observations. Such techniques are essential in risk-based decision-making and planning under uncertainty with broad applications in grid operations, including electricity price forecasting, risk-based economic dispatch, and stochastic optimizations.… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted at CISS 2024. arXiv admin note: text overlap with arXiv:2306.03782

  38. arXiv:2402.11283  [pdf, other

    math.NA stat.ML

    Deep adaptive sampling for surrogate modeling without labeled data

    Authors: Xili Wang, Kejun Tang, Jiayu Zhai, Xiaoliang Wan, Chao Yang

    Abstract: Surrogate modeling is of great practical significance for parametric differential equation systems. In contrast to classical numerical methods, using physics-informed deep learning methods to construct simulators for such systems is a promising direction due to its potential to handle high dimensionality, which requires minimizing a loss over a training set of random samples. However, the random s… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  39. arXiv:2402.10797  [pdf, other

    cs.MS cs.LG stat.CO stat.ML

    BlackJAX: Composable Bayesian inference in JAX

    Authors: Alberto Cabezas, Adrien Corenflos, Junpeng Lao, Rémi Louf, Antoine Carnec, Kaustubh Chaudhari, Reuben Cohn-Gordon, Jeremie Coullon, Wei Deng, Sam Duffield, Gerardo Durán-Martín, Marcin Elantkowski, Dan Foreman-Mackey, Michele Gregori, Carlos Iguaran, Ravin Kumar, Martin Lysy, Kevin Murphy, Juan Camilo Orduz, Karm Patel, Xi Wang, Rob Zinkov

    Abstract: BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well w… ▽ More

    Submitted 22 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Companion paper for the library https://github.com/blackjax-devs/blackjax Update: minor changes and updated the list of authors to include technical contributors

  40. arXiv:2402.08412  [pdf, other

    stat.ML cs.LG math.DS math.ST

    Interacting Particle Systems on Networks: joint inference of the network and the interaction kernel

    Authors: Quanjun Lang, Xiong Wang, Fei Lu, Mauro Maggioni

    Abstract: Modeling multi-agent systems on networks is a fundamental challenge in a wide variety of disciplines. We jointly infer the weight matrix of the network and the interaction kernel, which determine respectively which agents interact with which others and the rules of such interactions from data consisting of multiple trajectories. The estimator we propose leads naturally to a non-convex optimization… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 53 pages, 17 figures

    MSC Class: 62F12; 82C22

  41. arXiv:2402.02687  [pdf, other

    cs.LG cs.AI stat.ML

    Poisson Process for Bayesian Optimization

    Authors: Xiaoxing Wang, Jiaxing Li, Chao Xue, Wei Liu, Weifeng Liu, Xiaokang Yang, Junchi Yan, Dacheng Tao

    Abstract: BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidat… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  42. arXiv:2402.01121  [pdf, other

    stat.ME

    Non-linear Mendelian randomization with Two-stage prediction estimation and Control function estimation

    Authors: Xinpei Wang, Tao Huang, Jinzhu Jia

    Abstract: Most of the existing Mendelian randomization (MR) methods are limited by the assumption of linear causality between exposure and outcome, and the development of new non-linear MR methods is highly desirable. We introduce two-stage prediction estimation and control function estimation from econometrics to MR and extend them to non-linear causality. We give conditions for parameter identification an… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 9 pages, 4 figures

  43. arXiv:2401.15680  [pdf, other

    stat.ME

    How to achieve model-robust inference in stepped wedge trials with model-based methods?

    Authors: Bingkai Wang, Xueqi Wang, Fan Li

    Abstract: A stepped wedge design is a unidirectional crossover design where clusters are randomized to distinct treatment sequences. While model-based analysis of stepped wedge designs is standard practice to evaluate treatment effects accounting for clustering and adjusting for covariates, their properties under misspecification have not been systematically explored. In this article, we focus on model-base… ▽ More

    Submitted 13 August, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  44. arXiv:2401.11070  [pdf, other

    stat.ME

    Efficient Data Reduction Strategies for Big Data and High-Dimensional LASSO Regressions

    Authors: Xin Wang, Min Yang, William Li

    Abstract: The IBOSS approach proposed by Wang et al. (2019) selects the most informative subset of n points. It assumes that the ordinary least squares method is used and requires that the number of variables, p, is not large. However, in many practical problems, p is very large and penalty-based model fitting methods such as LASSO is used. We study the big data problems, in which both n and p are large. In… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  45. arXiv:2311.17303  [pdf, other

    cs.LG cs.AI stat.ME

    Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge

    Authors: Xiaoge Zhang, Xiao-Lin Wang, Fenglei Fan, Yiu-Ming Cheung, Indranil Bose

    Abstract: In this paper, we develop a generic methodology to encode hierarchical causality structure among observed variables into a neural network in order to improve its predictive performance. The proposed methodology, called causality-informed neural network (CINN), leverages three coherent steps to systematically map the structural causal knowledge into the layer-to-layer design of neural network while… ▽ More

    Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  46. arXiv:2311.16852  [pdf, other

    math.ST math.PR stat.ML

    Optimal minimax rate of learning interaction kernels

    Authors: Xiong Wang, Inbar Seroussi, Fei Lu

    Abstract: Nonparametric estimation of nonlocal interaction kernels is crucial in various applications involving interacting particle systems. The inference challenge, situated at the nexus of statistical learning and inverse problems, comes from the nonlocal dependency. A central question is whether the optimal minimax rate of convergence for this problem aligns with the rate of $M^{-\frac{2β}{2β+1}}$ in cl… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 42 pages, 1 figure

    MSC Class: 62G08; 62G20; 60B20

  47. arXiv:2311.14332  [pdf, other

    cs.LG stat.ML

    GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation

    Authors: Yakun Chen, Xianzhi Wang, Guandong Xu

    Abstract: The analysis of spatiotemporal data is increasingly utilized across diverse domains, including transportation, healthcare, and meteorology. In real-world settings, such data often contain missing elements due to issues like sensor malfunctions and data transmission errors. The objective of spatiotemporal imputation is to estimate these missing values by understanding the inherent spatial and tempo… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  48. arXiv:2311.02618  [pdf, other

    stat.AP

    Regionalization of China's PM2.5 through Robust Spatio temporal Functional Clustering Method

    Authors: Tingyin Wang, Xueqin Wang, Xiaobo Guo, Heping Zhang

    Abstract: The patterns of particulate matter with diameters that are generally 2.5 micrometers and smaller (PM2.5) are heterogeneous in China nationwide but can be homogeneous region-wide. To reduce the adverse effects from PM2.5, policymakers need to develop location-specific regulations based on nationwide clustering analysis of PM2.5 concentrations. However, such an analysis is challenging because the da… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  49. arXiv:2311.02574  [pdf, other

    stat.ME

    Semi-supervised Estimation of Event Rate with Doubly-censored Survival Data

    Authors: Yang Wang, Qingning Zhou, Tianxi Cai, Xuan Wang

    Abstract: Electronic Health Record (EHR) has emerged as a valuable source of data for translational research. To leverage EHR data for risk prediction and subsequently clinical decision support, clinical endpoints are often time to onset of a clinical condition of interest. Precise information on clinical event times is often not directly available and requires labor-intensive manual chart review to ascerta… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: 44 pages, 9 figures

  50. arXiv:2310.13162  [pdf, other

    stat.ME

    Network Meta-Analysis of Time-to-Event Endpoints with Individual Participant Data using Restricted Mean Survival Time Regression

    Authors: Kaiyuan Hua, Xiaofei Wang, Hwanhee Hong

    Abstract: Restricted mean survival time (RMST) models have gained popularity when analyzing time-to-event outcomes because RMST models offer more straightforward interpretations of treatment effects with fewer assumptions than hazard ratios commonly estimated from Cox models. However, few network meta-analysis (NMA) methods have been developed using RMST. In this paper, we propose advanced RMST NMA models w… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.