Skip to main content

Showing 1–50 of 180 results for author: Li, Q

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.12996  [pdf, other

    stat.ML cs.LG

    Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance

    Authors: Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang

    Abstract: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2406.18189  [pdf, other

    stat.ME math.ST

    Functional knockoffs selection with applications to functional data analysis in high dimensions

    Authors: Xinghao Qiao, Mingya Long, Qizhai Li

    Abstract: The knockoffs is a recently proposed powerful framework that effectively controls the false discovery rate (FDR) for variable selection. However, none of the existing knockoff solutions are directly suited to handle multivariate or high-dimensional functional data, which has become increasingly prevalent in various scientific applications. In this paper, we propose a novel functional model-X knock… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.08209  [pdf, other

    stat.ML cs.LG math.OC

    Forward-Euler time-discretization for Wasserstein gradient flows can be wrong

    Authors: Yewei Xu, Qin Li

    Abstract: In this note, we examine the forward-Euler discretization for simulating Wasserstein gradient flows. We provide two counter-examples showcasing the failure of this discretization even for a simple case where the energy functional is defined as the KL divergence against some nicely structured probability densities. A simple explanation of this failure is also discussed.

    Submitted 12 June, 2024; originally announced June 2024.

    MSC Class: 65M12

  4. arXiv:2405.17079  [pdf, other

    stat.ML cs.LG

    Learning with User-Level Local Differential Privacy

    Authors: Puning Zhao, Li Shen, Rongfei Fan, Qingming Li, Huiwen Wu, Jiafei Wu, Zhe Liu

    Abstract: User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially dif… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2404.09194  [pdf, other

    stat.ME

    Bayesian modeling of co-occurrence microbial interaction networks

    Authors: Tejasv Bedi, Bencong Zhu, Michael L. Neugent, Kevin C. Lutz, Nicole J. De Nisco, Qiwei Li

    Abstract: The human body consists of microbiomes associated with the development and prevention of several diseases. These microbial organisms form several complex interactions that are informative to the scientific community for explaining disease progression and prevention. Contrary to the traditional view of the microbiome as a singular, assortative network, we introduce a novel statistical approach usin… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 25 pages

  6. arXiv:2403.17670  [pdf, other

    stat.ME

    A family of Chatterjee's correlation coefficients and their properties

    Authors: Muhong Gao, Qizhai Li

    Abstract: Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $ξ^{(h,F)}_n$, characterized by a… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 27 pages, 4 figures

    MSC Class: 62H20; 62G05

  7. arXiv:2402.15515  [pdf

    cs.AI q-bio.QM stat.AP

    Feasibility of Identifying Factors Related to Alzheimer's Disease and Related Dementia in Real-World Data

    Authors: Aokun Chen, Qian Li, Yu Huang, Yongqiu Li, Yu-neng Chuang, Xia Hu, Serena Guo, Yonghui Wu, Yi Guo, Jiang Bian

    Abstract: A comprehensive view of factors associated with AD/ADRD will significantly aid in studies to develop new treatments for AD/ADRD and identify high-risk populations and patients for prevention efforts. In our study, we summarized the risk factors for AD/ADRD by reviewing existing meta-analyses and review articles on risk and preventive factors for AD/ADRD. In total, we extracted 477 risk factors in… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  8. arXiv:2401.09259  [pdf, other

    math.NA math.DS stat.ML

    Mitigating distribution shift in machine learning-augmented hybrid simulation

    Authors: Jiaxi Zhao, Qianxiao Li

    Abstract: We study the problem of distribution shift generally arising in machine-learning augmented hybrid simulation, where parts of simulation algorithms are replaced by data-driven surrogates. We first establish a mathematical framework to understand the structure of machine-learning augmented hybrid simulation problems, and the cause and effect of the associated distribution shift. We show correlations… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    MSC Class: 68T99; 65M15; 37M05

  9. arXiv:2401.04856  [pdf, other

    cs.LG stat.ML

    A Good Score Does not Lead to A Good Generative Model

    Authors: Sixu Li, Shi Chen, Qin Li

    Abstract: Score-based Generative Models (SGMs) is one leading method in generative modeling, renowned for their ability to generate high-quality samples from complex, high-dimensional data distributions. The method enjoys empirical success and is supported by rigorous theoretical convergence properties. In particular, it has been shown that SGMs can generate samples from a distribution that is close to the… ▽ More

    Submitted 27 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  10. arXiv:2401.00521  [pdf, other

    cs.LG cs.AI stat.AP

    Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data

    Authors: Yuxiao Hu, Qian Li, Xiaodan Shi, Jinyue Yan, Yuntian Chen

    Abstract: Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is of… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  11. arXiv:2312.08670  [pdf, other

    stat.ME cs.AI cs.LG

    Temporal-Spatial Entropy Balancing for Causal Continuous Treatment-Effect Estimation

    Authors: Tao Hu, Honglong Zhang, Fan Zeng, Min Du, XiangKun Du, Yue Zheng, Quanqi Li, Mengran Zhang, Dan Yang, Jihao Wu

    Abstract: In the field of intracity freight transportation, changes in order volume are significantly influenced by temporal and spatial factors. When building subsidy and pricing strategies, predicting the causal effects of these strategies on order volume is crucial. In the process of calculating causal effects, confounding variables can have an impact. Traditional methods to control confounding variables… ▽ More

    Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages;

  12. arXiv:2312.08324  [pdf, other

    stat.AP

    Bayesian Nonparametric Clustering with Feature Selection for Spatially Resolved Transcriptomics Data

    Authors: Bencong Zhu, Guanyu Hu, Yang Xie, Lin Xu, Xiaodan Fan, Qiwei Li

    Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  13. arXiv:2312.07067  [pdf, other

    cs.LG cs.CR cs.CV stat.AP

    Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training

    Authors: Qian Li, Yuxiao Hu, Yinpeng Dong, Dongxiao Zhang, Yuntian Chen

    Abstract: Adversarial training is often formulated as a min-max problem, however, concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model, i.e., previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders", which reveal the hidden h… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  14. arXiv:2312.03967  [pdf, other

    stat.ME

    Test-negative designs with various reasons for testing: statistical bias and solution

    Authors: Mengxin Yu, Kendrick Qijun Li, Nicholas Jewell, Eric Tchetgen Tchetgen, Dylan Small, Xu Shi, Bingkai Wang

    Abstract: Test-negative designs are widely used for post-market evaluation of vaccine effectiveness, particularly in cases where randomization is not feasible. Differing from classical test-negative designs where only healthcare-seekers with symptoms are included, recent test-negative designs have involved individuals with various reasons for testing, especially in an outbreak setting. While including these… ▽ More

    Submitted 21 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  15. arXiv:2311.05067  [pdf, other

    cs.LG cs.AI stat.ML

    Accelerating Exploration with Unlabeled Prior Data

    Authors: Qiyang Li, Jason Zhang, Dibya Ghosh, Amy Zhang, Sergey Levine

    Abstract: Learning to solve tasks from a sparse reward signal is a major challenge for standard reinforcement learning (RL) algorithms. However, in the real world, agents rarely need to solve sparse reward tasks entirely from scratch. More often, we might possess prior experience to draw on that provides considerable guidance about which actions and outcomes are possible in the world, which we can use to ex… ▽ More

    Submitted 20 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 25 pages, 16 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  16. arXiv:2310.08867  [pdf

    cs.LG cs.DB stat.ME

    A Survey of Methods for Handling Disk Data Imbalance

    Authors: Shuangshuang Yuan, Peng Wu, Yuehui Chen, Qiang Li

    Abstract: Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalanc… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  17. arXiv:2308.14671  [pdf, other

    stat.ME stat.AP stat.CO stat.OT

    A generalized Bayesian stochastic block model for microbiome community detection

    Authors: Kevin C. Lutz, Michael L. Neugent, Tejasv Bedi, Nicole J. De Nisco, Qiwei Li

    Abstract: Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the microbiome study. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co-occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essentia… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  18. arXiv:2307.01389  [pdf, other

    cs.LG stat.ME

    Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer's Disease Progression via Counterfactual Inference

    Authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li

    Abstract: Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-bet… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  19. arXiv:2306.01675  [pdf, other

    stat.ME physics.soc-ph

    Bayesian Segmentation Modeling of Epidemic Growth

    Authors: Tejasv Bedi, Yanxun Xu, Qiwei Li

    Abstract: Tracking the spread of infectious disease during a pandemic has posed a great challenge to the governments and health sectors on a global scale. To facilitate informed public health decision-making, the concerned parties usually rely on short-term daily and weekly projections generated via predictive modeling. Several deterministic and stochastic epidemiological models, including growth and compar… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  20. arXiv:2305.08204  [pdf, other

    stat.CO stat.ME

    glmmPen: High Dimensional Penalized Generalized Linear Mixed Models

    Authors: Hillary M. Heiling, Naim U. Rashid, Quefeng Li, Joseph G. Ibrahim

    Abstract: Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process since model misspecification may lead to significant bias. However, the joint selection of fixed and random effects has historically been limited to… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

  21. arXiv:2305.08201  [pdf, ps, other

    stat.ME stat.CO

    Efficient Computation of High-Dimensional Penalized Generalized Linear Mixed Models by Latent Factor Modeling of the Random Effects

    Authors: Hillary M. Heiling, Naim U. Rashid, Quefeng Li, Xianlu L. Peng, Jen Jen Yeh, Joseph G. Ibrahim

    Abstract: Modern biomedical datasets are increasingly high dimensional and exhibit complex correlation structures. Generalized Linear Mixed Models (GLMMs) have long been employed to account for such dependencies. However, proper specification of the fixed and random effects in GLMMs is increasingly difficult in high dimensions, and computational complexity grows with increasing dimension of the random effec… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 May, 2023; originally announced May 2023.

  22. arXiv:2304.10466  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient Deep Reinforcement Learning Requires Regulating Overfitting

    Authors: Qiyang Li, Aviral Kumar, Ilya Kostrikov, Sergey Levine

    Abstract: Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has bee… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 26 pages, 18 figures, 3 tables, The International Conference on Learning Representations (ICLR) 2023

  23. arXiv:2303.07050  [pdf, other

    stat.AP

    Evaluation of wait time saving effectiveness of triage algorithms

    Authors: Yee Lam Elim Thompson, Gary M Levine, Weijie Chen, Berkman Sahiner, Qin Li, Nicholas Petrick, Jana G Delfino, Miguel A Lago, Qian Cao, Qin Li, Frank W Samuelson

    Abstract: In the past decade, Artificial Intelligence (AI) algorithms have made promising impacts to transform healthcare in all aspects. One application is to triage patients' radiological medical images based on the algorithm's binary outputs. Such AI-based prioritization software is known as computer-aided triage and notification (CADt). Their main benefit is to speed up radiological review of images wit… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  24. arXiv:2212.09160  [pdf, other

    math.OC stat.ME

    Stochastic Economic Dispatch Considering Demand Response and Endogenous Uncertainty

    Authors: Nasrin Bayat, Qifeng Li, Joon-Hyuk Park

    Abstract: This paper considers endogenous uncertainty (EnU) in the stochastic economic dispatch (SED) problem, where the endogenous uncertainty means decision dependent uncertainty. In this problem, demand response (DR) commitment is the source of the EnU. Nevertheless, EnU is not well considered in existing literature. Our first contribution is to build up an optimization model of DR-involved SED under EnU… ▽ More

    Submitted 31 May, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

  25. arXiv:2212.08771  [pdf, other

    stat.AP cs.LG

    Assign Experiment Variants at Scale in Online Controlled Experiments

    Authors: Qike Li, Samir Jamkhande, Pavel Kochetkov, Pai Liu

    Abstract: Online controlled experiments (A/B tests) have become the gold standard for learning the impact of new product features in technology companies. Randomization enables the inference of causality from an A/B test. The randomized assignment maps end users to experiment buckets and balances user characteristics between the groups. Therefore, experiments can attribute any outcome differences between th… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  26. arXiv:2211.03258  [pdf, other

    astro-ph.IM hep-ph physics.data-an stat.CO

    Nested sampling statistical errors

    Authors: Andrew Fowlie, Qiao Li, Huifang Lv, Yecheng Sun, Jia Zhang, Le Zheng

    Abstract: Nested sampling (NS) is a popular algorithm for Bayesian computation. We investigate statistical errors in NS both analytically and numerically. We show two analytic results. First, we show that the leading terms in Skilling's expression using information theory match the leading terms in Keeton's expression from an analysis of moments. This approximate agreement was previously only known numerica… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: 12 pages + appendices, 3 figures

  27. arXiv:2210.06025  [pdf, other

    stat.ME math.ST

    Bregman Divergence-Based Data Integration with Application to Polygenic Risk Score (PRS) Heterogeneity Adjustment

    Authors: Qinmengge Li, Matthew T. Patrick, Haihan Zhang, Chachrit Khunsriraksakul, Philip E. Stuart, Johann E. Gudjonsson, Rajan Nair, James T. Elder, Dajiang J. Liu, Jian Kang, Lam C. Tsoi, Kevin He

    Abstract: Polygenic risk scores (PRS) have recently received much attention for genetics risk prediction. While successful for the Caucasian population, the PRS based on the minority population suffer from small sample sizes, high dimensionality and low signal-to-noise ratios, exacerbating already severe health disparities. Due to population heterogeneity, direct trans-ethnic prediction by utilizing the Cau… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 35 pages, 6 figures

  28. arXiv:2209.13779  [pdf

    astro-ph.SR stat.ML

    Solar Flare Index Prediction Using SDO/HMI Vector Magnetic Data Products with Statistical and Machine Learning Methods

    Authors: Hewei Zhang, Qin Li, Yanxing Yang, Ju Jing, Jason T. L. Wang, Haimin Wang, Zuofeng Shang

    Abstract: Solar flares, especially the M- and X-class flares, are often associated with coronal mass ejections (CMEs). They are the most important sources of space weather effects, that can severely impact the near-Earth environment. Thus it is essential to forecast flares (especially the M-and X-class ones) to mitigate their destructive and hazardous consequences. Here, we introduce several statistical and… ▽ More

    Submitted 1 December, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

    Journal ref: The Astrophysical Journal Supplement Series (2022), Volume 263, Number 2

  29. arXiv:2209.12388  [pdf, other

    stat.ME stat.AP

    Joint and Individual Component Regression

    Authors: Peiyao Wang, Haodong Wang, Quefeng Li, Dinggang Shen, Yufeng Liu

    Abstract: Multi-group data are commonly seen in practice. Such data structure consists of data from multiple groups and can be challenging to analyze due to data heterogeneity. We propose a novel Joint and Individual Component Regression (JICO) model to analyze multi-group data. In particular, our proposed model decomposes the response into shared and group-specific components, which are driven by low-rank… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

  30. arXiv:2208.02246  [pdf, other

    cs.LG cs.AI stat.ML

    AdaCat: Adaptive Categorical Discretization for Autoregressive Models

    Authors: Qiyang Li, Ajay Jain, Pieter Abbeel

    Abstract: Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio. Most state-of-the-art models discretize continuous data into several bins and use categorical distributions over the bins to approximate the continuous data distribution. The advantage is that the categorical distribution can easily expre… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Uncertainty in Artificial Intelligence (UAI) 2022 13 pages, 4 figures

  31. arXiv:2208.01237  [pdf, ps, other

    stat.ME

    Doubly Robust Proximal Causal Inference under Confounded Outcome-Dependent Sampling

    Authors: Kendrick Qijun Li, Xu Shi, Wang Miao, Eric Tchetgen Tchetgen

    Abstract: Unmeasured confounding and selection bias are often of concern in observational studies and may invalidate a causal analysis if not appropriately accounted for. Under outcome-dependent sampling, a latent factor that has causal effects on the treatment, outcome, and sample selection process may cause both unmeasured confounding and selection bias, rendering standard causal parameters unidentifiable… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 43 pages, 1 figure

  32. arXiv:2207.00100  [pdf, other

    stat.ME math.ST

    A Bayesian 'sandwich' for variance estimation

    Authors: Kendrick Qijun Li, Kenneth Martin Rice

    Abstract: Large-sample Bayesian analogs exist for many frequentist methods, but are less well-known for the widely-used 'sandwich' or 'robust' variance estimates. We review existing approaches to Bayesian analogs of sandwich variance estimates and propose a new analog, as the Bayes rule under a form of balanced loss function, that combines elements of standard parametric inference with fidelity of the data… ▽ More

    Submitted 2 November, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: 11 pages, 2 figures

  33. arXiv:2203.12509  [pdf, other

    stat.ME

    Double Negative Control Inference in Test-Negative Design Studies of Vaccine Effectiveness

    Authors: Kendrick Qijun Li, Xu Shi, Wang Miao, Eric Tchetgen Tchetgen

    Abstract: The test-negative design (TND) has become a standard approach to evaluate vaccine effectiveness against the risk of acquiring infectious diseases in real-world settings, such as Influenza, Rotavirus, Dengue fever, and more recently COVID-19. In a TND study, individuals who experience symptoms and seek care are recruited and tested for the infectious disease which defines cases and controls. Despit… ▽ More

    Submitted 8 March, 2023; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: 78 pages, 4 figures, 5 tables

  34. arXiv:2202.10670  [pdf, other

    stat.ML cs.LG

    From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

    Authors: Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li

    Abstract: Optimization and generalization are two essential aspects of statistical machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the optimization trajectory under the gradient flow algorithm. The key ingredient of this framework is the Uniform-LGI, a property that is generally satisfied when training machine… ▽ More

    Submitted 12 October, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

    Journal ref: Transactions on Machine Learning Research 2022

  35. arXiv:2201.06332  [pdf

    stat.AP

    Optimal monitoring location for risk tracking of geotechnical systems: theory and application to tunneling excavation risks

    Authors: Zeyu Wang, Abdollah Shafieezadeh, Xiong Xiao, Xiaowei Wang, Quanwang Li

    Abstract: The maturity of structural health monitoring technology brings ever-increasing opportunities for geotechnical structures and underground infrastructure systems to track the risk of structural failure, such as settlement-induced building damage, based on the monitored data. Reliability updating techniques can offer solutions to estimate the probability of failing to meet a prescribed objective usin… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

  36. arXiv:2112.13356  [pdf, other

    stat.ME

    Transfer Learning in High-dimensional Semi-parametric Graphical Models with Application to Brain Connectivity Analysis

    Authors: Yong He, Qiushi Li, Qinqin Hu, Lei Liu

    Abstract: Transfer learning has drawn growing attention with the target of improving statistical efficiency of one study (dataset) by digging information from similar and related auxiliary studies (datasets). In the article, we consider transfer learning problem in estimating undirected semi-parametric graphical model. We propose an algorithm called Trans-Copula-CLIME for estimating undirected graphical mod… ▽ More

    Submitted 26 December, 2021; originally announced December 2021.

  37. Incorporating Surrogate Information for Adaptive Subgroup Enrichment Design with Sample Size Re-estimation

    Authors: Liwen Wu, Qing Li, Mengya Liu, Jianchang Lin

    Abstract: Adaptive subgroup enrichment design is an efficient design framework that allows accelerated development for investigational treatments while also having flexibility in population selection within the course of the trial. The adaptive decision at the interim analysis is commonly made based on the conditional probability of trial success. However, one of the critical challenges for such adaptive de… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  38. arXiv:2110.05291  [pdf, other

    cs.LG stat.ML

    Graph Neural Network Guided Local Search for the Traveling Salesperson Problem

    Authors: Benjamin Hudson, Qingbiao Li, Matthew Malencia, Amanda Prorok

    Abstract: Solutions to the Traveling Salesperson Problem (TSP) have practical applications to processes in transportation, logistics, and automation, yet must be computed with minimal delay to satisfy the real-time nature of the underlying tasks. However, solving large TSP instances quickly without sacrificing solution quality remains challenging for current approximate algorithms. To close this gap, we pre… ▽ More

    Submitted 4 April, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

  39. arXiv:2110.02926  [pdf, ps, other

    cs.LG math.NA stat.ML

    On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

    Authors: Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright

    Abstract: Finding the optimal configuration of parameters in ResNet is a nonconvex minimization problem, but first-order methods nevertheless find the global optimum in the overparameterized regime. We study this phenomenon with mean-field analysis, by translating the training process of ResNet to a gradient-flow partial differential equation (PDE) and examining the convergence properties of this limiting p… ▽ More

    Submitted 28 November, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: text overlap with arXiv:2105.14417

  40. Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?

    Authors: Deqiang Li, Tian Qiu, Shuo Chen, Qianmu Li, Shouhuai Xu

    Abstract: The deep learning approach to detecting malicious software (malware) is promising but has yet to tackle the problem of dataset shift, namely that the joint distribution of examples and their labels associated with the test set is different from that of the training set. This problem causes the degradation of deep learning models without users' notice. In order to alleviate the problem, one approac… ▽ More

    Submitted 20 October, 2021; v1 submitted 20 September, 2021; originally announced September 2021.

    Comments: Accepted by ACSAC'2021

    MSC Class: 62

  41. arXiv:2107.10013  [pdf

    eess.SY cs.AI stat.ML

    Optimal Operation of Power Systems with Energy Storage under Uncertainty: A Scenario-based Method with Strategic Sampling

    Authors: Ren Hu, Qifeng Li

    Abstract: The multi-period dynamics of energy storage (ES), intermittent renewable generation and uncontrollable power loads, make the optimization of power system operation (PSO) challenging. A multi-period optimal PSO under uncertainty is formulated using the chance-constrained optimization (CCO) modeling paradigm, where the constraints include the nonlinear energy storage and AC power flow models. Based… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

    MSC Class: 68T09

  42. arXiv:2107.09355  [pdf, ps, other

    cs.LG stat.ML

    Approximation Theory of Convolutional Architectures for Time Series Modelling

    Authors: Haotian Jiang, Zhong Li, Qianxiao Li

    Abstract: We study the approximation properties of convolutional architectures applied to time series modelling, which can be formulated mathematically as a functional approximation problem. In the recurrent setting, recent results reveal an intricate connection between approximation efficiency and memory structures in the data generation process. In this paper, we derive parallel results for convolutional… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

    Comments: Published version

    MSC Class: 68W25; 68T07; 37M10 ACM Class: I.2.6

  43. arXiv:2107.06040  [pdf, ps, other

    stat.ME math.ST

    The Cauchy Combination Test under Arbitrary Dependence Structures

    Authors: Mingya Long, Zhengbang Li, Wei Zhang, Qizhai Li

    Abstract: Aggregating multiple effects is often encountered in large-scale data analysis where the fraction of significant effects is generally small. Many existing methods cannot handle it effectively because of lack of computational accuracy for small p-values. The Cauchy combination test (abbreviated as CCT) ( J Am Statist Assoc, 2020, 115(529):393-402) is a powerful and computational effective test to a… ▽ More

    Submitted 2 August, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: 51 pages, 6 figures

  44. arXiv:2106.13508  [pdf, other

    stat.CO math.OC stat.ME

    MARS: A second-order reduction algorithm for high-dimensional sparse precision matrices estimation

    Authors: Qian LI, Binyan Jiang, Defeng Sun

    Abstract: Estimation of the precision matrix (or inverse covariance matrix) is of great importance in statistical data analysis and machine learning. However, as the number of parameters scales quadratically with the dimension $p$, computation becomes very challenging when $p$ is large. In this paper, we propose an adaptive sieving reduction algorithm to generate a solution path for the estimation of precis… ▽ More

    Submitted 1 November, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

  45. arXiv:2105.14417  [pdf, ps, other

    cs.LG math.NA stat.ML

    Overparameterization of deep ResNet: zero loss and mean-field analysis

    Authors: Zhiyan Ding, Shi Chen, Qin Li, Stephen Wright

    Abstract: Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many practical situations. We examine this phenomenon for the case of Residual Neural Networks (ResNet) with smooth activation functions in a limiting regime in which both… ▽ More

    Submitted 9 November, 2021; v1 submitted 29 May, 2021; originally announced May 2021.

  46. arXiv:2105.12898  [pdf, other

    cs.AI cs.LG stat.ML

    Stochastic Intervention for Causal Effect Estimation

    Authors: Tri Dung Duong, Qian Li, Guandong Xu

    Abstract: Causal inference methods are widely applied in various decision-making domains such as precision medicine, optimal policy and economics. Central to these applications is the treatment effect estimation of intervention strategies. Current estimation methods are mostly restricted to the deterministic treatment, which however, is unable to address the stochastic space treatment policies. Moreover, pr… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Comments: Accepted in IJCNN 21

  47. arXiv:2104.13957  [pdf, other

    stat.AP q-bio.GN

    A Bayesian Modified Ising Model for Identifying Spatially Variable Genes from Spatial Transcriptomics Data

    Authors: Xi Jiang, Qiwei Li, Guanghua Xiao

    Abstract: A recent technology breakthrough in spatial molecular profiling has enabled the comprehensive molecular characterizations of single cells while preserving spatial information. It provides new opportunities to delineate how cells from different origins form tissues with distinctive structures and functions. One immediate question in spatial molecular profiling data analysis is to identify genes who… ▽ More

    Submitted 5 October, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: Version 3

  48. arXiv:2103.11269  [pdf

    cs.LG stat.ML

    Development and Validation of a Deep Learning Model for Prediction of Severe Outcomes in Suspected COVID-19 Infection

    Authors: Varun Buch, Aoxiao Zhong, Xiang Li, Marcio Aloisio Bezerra Cavalcanti Rockenbach, Dufan Wu, Hui Ren, Jiahui Guan, Andrew Liteplo, Sayon Dutta, Ittai Dayan, Quanzheng Li

    Abstract: COVID-19 patient triaging with predictive outcome of the patients upon first present to emergency department (ED) is crucial for improving patient prognosis, as well as better hospital resources management and cross-infection control. We trained a deep feature fusion model to predict patient outcomes, where the model inputs were EHR data including demographic information, co-morbidities, vital sig… ▽ More

    Submitted 28 March, 2021; v1 submitted 20 March, 2021; originally announced March 2021.

    Comments: Varun Buch, Aoxiao Zhong and Xiang Li contribute equally to this work

  49. arXiv:2102.04279  [pdf, ps, other

    stat.ML cs.LG math.NA

    Constrained Ensemble Langevin Monte Carlo

    Authors: Zhiyan Ding, Qin Li

    Abstract: The classical Langevin Monte Carlo method looks for samples from a target distribution by descending the samples along the gradient of the target distribution. The method enjoys a fast convergence rate. However, the numerical cost is sometimes high because each iteration requires the computation of a gradient. One approach to eliminate the gradient computation is to employ the concept of ``ensembl… ▽ More

    Submitted 29 October, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

  50. arXiv:2101.03133  [pdf

    stat.AP cs.SI

    Infections Forecasting and Intervention Effect Evaluation for COVID-19 via a Data-Driven Markov Process and Heterogeneous Simulation

    Authors: Quan-Lin Li, Chengliang Wang, Yiming Xu, Chi Zhang, Yanxia Chang, Xiaole Wu, Zhen-Ping Fan, Zhi-Guo Liu

    Abstract: The Coronavirus Disease 2019 (COVID-19) pandemic has caused tremendous amount of deaths and a devastating impact on the economic development all over the world. Thus, it is paramount to control its further transmission, for which purpose it is necessary to find the mechanism of its transmission process and evaluate the effect of different control strategies. To deal with these issues, we describe… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

    Comments: 29 pages, 10 figures

    MSC Class: 90B15; 90B22; 60J27; 68P01; 68T07; 68T09 ACM Class: E.m; G.1; I.6; I.2