Skip to main content

Showing 1–50 of 607 results for author: Liu, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.13702  [pdf, ps, other

    stat.AP

    Examining Differential Item Functioning (DIF) in Self-Reported Health Survey Data: Via Multilevel Modeling

    Authors: Dandan Chen Kaptur, Yiqing Liu, Bradley Kaptur, Nicholas Peterman, Jinming Zhang, Justin Kern, Carolyn Anderson

    Abstract: Few health-related constructs or measures have received critical evaluation in terms of measurement equivalence, such as self-reported health survey data. Differential item functioning (DIF) analysis is crucial for evaluating measurement equivalence in self-reported health surveys, which are often hierarchical in structure. While traditional DIF methods rely on single-level models, multilevel mode… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: preprint, 11 pages (excluding references)

  2. arXiv:2408.07890  [pdf, other

    stat.ML cs.LG

    Local Causal Discovery with Background Knowledge

    Authors: Qingyuan Zheng, Yue Liu, Yangbo He

    Abstract: Causality plays a pivotal role in various fields of study. Based on the framework of causal graphical models, previous works have proposed identifying whether a variable is a cause or non-cause of a target in every Markov equivalent graph solely by learning a local structure. However, the presence of prior knowledge, often represented as a partially known causal graph, is common in many causal mod… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  3. arXiv:2408.05765  [pdf, other

    cs.LG stat.ML

    Scalable and Adaptive Spectral Embedding for Attributed Graph Clustering

    Authors: Yunhui Liu, Tieke He, Qing Wu, Tao Zheng, Jianhua Zhao

    Abstract: Attributed graph clustering, which aims to group the nodes of an attributed graph into disjoint clusters, has made promising advancements in recent years. However, most existing methods face challenges when applied to large graphs due to the expensive computational cost and high memory usage. In this paper, we introduce Scalable and Adaptive Spectral Embedding (SASE), a simple attributed graph clu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by CIKM 2024 (Short Paper)

  4. arXiv:2407.20235  [pdf

    cs.CY stat.AP

    Solve the Refugee Crisis with Data

    Authors: Yunfei Liu

    Abstract: In this study, we addressed the refugee crisis through two main models. For predicting the ultimate number of refugees, we first established a Logistic Regression Model, but due to the limited data points, its prediction accuracy was suboptimal. Consequently, we incorporated Gray Theory to develop the Gary Verhulst Model, which provided scientifically sound and reasonable predictions. Statistical… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 19 pages, 6 figures, 7 tables

  5. arXiv:2407.03420  [pdf, other

    stat.ME stat.AP

    Balancing events, not patients, maximizes power of the logrank test: and other insights on unequal randomization in survival trials

    Authors: Godwin Yung, Kaspar Rufibach, Marcel Wolbers, Ray Lin, Yi Liu

    Abstract: We revisit the question of what randomization ratio (RR) maximizes power of the logrank test in event-driven survival trials under proportional hazards (PH). By comparing three approximations of the logrank test (Schoenfeld, Freedman, Rubinstein) to empirical simulations, we find that the RR that maximizes power is the RR that balances number of events across treatment arms at the end of the trial… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 17 pages, 3 figures, 2 tables

  6. arXiv:2407.01015  [pdf, other

    stat.ML cs.LG

    Bayesian Entropy Neural Networks for Physics-Aware Prediction

    Authors: Rahul Rathnakumar, Jiayu Huang, Hao Yan, Yongming Liu

    Abstract: This paper addresses the need for deep learning models to integrate well-defined constraints into their outputs, driven by their application in surrogate models, learning with limited data and partial information, and scenarios requiring flexible model behavior to incorporate non-data sample information. We introduce Bayesian Entropy Neural Networks (BENN), a framework grounded in Maximum Entropy… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 15 pages

    ACM Class: I.5.1

  7. arXiv:2407.00716  [pdf, other

    stat.ME

    On a General Theoretical Framework of Reliability

    Authors: Yang Liu, Jolynn Pek, Alberto Maydeu-Olivares

    Abstract: Reliability is an essential measure of how closely observed scores represent latent scores (reflecting constructs), assuming some latent variable measurement model. We present a general theoretical framework of reliability, placing emphasis on measuring association between latent and observed scores. This framework was inspired by McDonald's (2011) regression framework, which highlighted the coeff… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  8. arXiv:2406.15523  [pdf, other

    cs.LG stat.ML

    Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

    Authors: Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Kaize Ding, Rui Miao, Ying Wang, Shirui Pan, Xin Wang

    Abstract: To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  9. arXiv:2406.10554  [pdf, other

    stat.ME stat.AP

    Causal Inference with Outcomes Truncated by Death and Missing Not at Random

    Authors: Wei Li, Yuan Liu, Shanshan Luo, Zhi Geng

    Abstract: In clinical trials, principal stratification analysis is commonly employed to address the issue of truncation by death, where a subject dies before the outcome can be measured. However, in practice, many survivor outcomes may remain uncollected or be missing not at random, posing a challenge to standard principal stratification analyses. In this paper, we explore the identification, estimation, an… ▽ More

    Submitted 2 August, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  10. arXiv:2406.08968  [pdf, other

    stat.ME

    Covariate Selection for Optimizing Balance with Covariate-Adjusted Response-Adaptive Randomization

    Authors: Ziqing Guo, Yang Liu, Lucy Xia

    Abstract: Balancing influential covariates is crucial for valid treatment comparisons in clinical studies. While covariate-adaptive randomization is commonly used to achieve balance, its performance can be inadequate when the number of baseline covariates is large. It is therefore essential to identify the influential factors associated with the outcome and ensure balance among these critical covariates. In… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 54 pages, 4 figures

  11. arXiv:2406.05340  [pdf, other

    stat.ME stat.ML

    Selecting the Number of Communities for Weighted Degree-Corrected Stochastic Block Models

    Authors: Yucheng Liu, Xiaodong Li

    Abstract: We investigate how to select the number of communities for weighted networks without a full likelihood modeling. First, we propose a novel weighted degree-corrected stochastic block model (DCSBM), in which the mean adjacency matrix is modeled as the same as in standard DCSBM, while the variance profile matrix is assumed to be related to the mean adjacency matrix through a given variance function.… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 3 figures, 2 tables

  12. arXiv:2405.20782  [pdf, other

    cs.CR cs.IT stat.ML

    Universal Exact Compression of Differentially Private Mechanisms

    Authors: Yanxiao Liu, Wei-Ning Chen, Ayfer Özgür, Cheuk Ting Li

    Abstract: To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the or… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 30 pages, 3 figures

  13. arXiv:2405.15204  [pdf, other

    stat.ME

    A New Fit Assessment Framework for Common Factor Models Using Generalized Residuals

    Authors: Youjin Sung, Youngjin Han, Yang Liu

    Abstract: Standard common factor models, such as the linear normal factor model, rely on strict parametric assumptions, which require rigorous model-data fit assessment to prevent fallacious inferences. However, overall goodness-of-fit diagnostics conventionally used in factor analysis do not offer diagnostic information on where the misfit originates. In the current work, we propose a new fit assessment fr… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 40 pages, 12 figures

  14. arXiv:2405.09331  [pdf, other

    stat.ME stat.ML

    Multi-Source Conformal Inference Under Distribution Shift

    Authors: Yi Liu, Alexander W. Levis, Sharon-Lise Normand, Larry Han

    Abstract: Recent years have experienced increasing utilization of complex machine learning models across multiple sources of data to inform more generalizable decision-making. However, distribution shifts across data sources and privacy concerns related to sharing individual-level data, coupled with a lack of uncertainty quantification from machine learning predictions, make it challenging to achieve valid… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024, 39 pages, 13 figures

  15. arXiv:2405.07397  [pdf, other

    stat.ME

    The Spike-and-Slab Quantile LASSO for Robust Variable Selection in Cancer Genomics Studies

    Authors: Yuwen Liu, Jie Ren, Shuangge Ma, Cen Wu

    Abstract: Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy-tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the non-robust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  16. arXiv:2405.05733  [pdf, other

    stat.ML cs.LG

    Batched Stochastic Bandit for Nondegenerate Functions

    Authors: Yu Liu, Yunlu Shu, Tianyu Wang

    Abstract: This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs… ▽ More

    Submitted 29 August, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: 34 pages, 14 colored figures

  17. arXiv:2405.01709  [pdf, other

    stat.ME math.ST stat.ML

    Minimax Regret Learning for Data with Heterogeneous Subgroups

    Authors: Weibin Mo, Weijing Tang, Songkai Xue, Yufeng Liu, Ji Zhu

    Abstract: Modern complex datasets often consist of various sub-populations. To develop robust and generalizable methods in the presence of sub-population heterogeneity, it is important to guarantee a uniform learning performance instead of an average one. In many applications, prior information is often available on which sub-population or group the data points belong to. Given the observed groups of data,… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  18. arXiv:2404.18008  [pdf, other

    cs.LG stat.AP

    Implicit Generative Prior for Bayesian Neural Networks

    Authors: Yijia Liu, Xiao Wang

    Abstract: Predictive uncertainty quantification is crucial for reliable decision-making in various applied domains. Bayesian neural networks offer a powerful framework for this task. However, defining meaningful priors and ensuring computational efficiency remain significant challenges, especially for complex real-world applications. This paper addresses these challenges by proposing a novel neural adaptive… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  19. arXiv:2404.17592  [pdf, other

    cs.IR cs.LG stat.ML

    Low-Rank Online Dynamic Assortment with Dual Contextual Information

    Authors: Seong Jin Lee, Will Wei Sun, Yufeng Liu

    Abstract: As e-commerce expands, delivering real-time personalized recommendations from vast catalogs poses a critical challenge for retail platforms. Maximizing revenue requires careful consideration of both individual customer characteristics and available item features to optimize assortments over time. In this paper, we consider the dynamic assortment problem with dual contexts -- user and item features… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  20. arXiv:2404.16709  [pdf, other

    stat.ME

    Understanding Reliability from a Regression Perspective

    Authors: Yang Liu, Jolynn Pek, Alberto Maydeu-Olivares

    Abstract: Reliability is an important quantification of measurement precision based on a latent variable measurement model. Inspired by McDonald (2011), we present a regression framework of reliability, placing emphasis on whether latent or observed scores serve as the regression outcome. Our theory unifies two extant perspectives of reliability: (a) classical test theory (measurement decomposition), and (b… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  21. arXiv:2404.13016  [pdf, other

    cs.CV cs.LG stat.ML

    Optimizing Calibration by Gaining Aware of Prediction Correctness

    Authors: Yuchi Liu, Lei Wang, Yuli Zou, James Zou, Liang Zheng

    Abstract: Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly pr… ▽ More

    Submitted 24 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  22. arXiv:2404.07906  [pdf

    stat.ME q-bio.BM

    WiNNbeta: Batch and drift correction method by white noise normalization for metabolomic studies

    Authors: Olga Demler, Franco Giulianini, Yanyan Liu, Malte Londschien, Anja Sjöström, Tanmay Tanna, Heike Luttmann-Gibson, Antoine Jeanrenaud

    Abstract: We developed a method called batch and drift correction method by White Noise Normalization (WiNNbeta) to correct individual metabolites for batch effects and drifts. This method tests for white noise properties to identify metabolites in need of correction and corrects them by using fine-tuned splines. To test the method performance we applied WiNNbeta to LC-MS data from our metabolomic studies a… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  23. arXiv:2403.15711  [pdf, other

    cs.LG stat.ME stat.ML

    Identifiable Latent Neural Causal Models

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. It is particularly good at predictions under unseen distribution shifts, because these shifts can generally be interpreted as consequences of interventions. Hence leveraging {seen} distribution shifts becomes a natural strategy to help identifying causal representations, which in… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  24. arXiv:2403.15421  [pdf, other

    eess.SP cs.LG stat.AP

    Agile gesture recognition for low-power applications: customisation for generalisation

    Authors: Ying Liu, Liucheng Guo, Valeri A. Makarovc, Alexander Gorbana, Evgeny Mirkesa, Ivan Y. Tyukin

    Abstract: Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that op… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  25. arXiv:2403.14813  [pdf, other

    stat.ML cs.HC cs.LG

    Curvature Augmented Manifold Embedding and Learning

    Authors: Yongming Liu

    Abstract: A new dimensional reduction (DR) and data visualization method, Curvature-Augmented Manifold Embedding and Learning (CAMEL), is proposed. The key novel contribution is to formulate the DR problem as a mechanistic/physics model, where the force field among nodes (data points) is used to find an n-dimensional manifold representation of the data sets. Compared with many existing attractive-repulsive… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  26. arXiv:2403.12284  [pdf, other

    math.ST q-bio.QM stat.AP stat.ME

    The Wreaths of KHAN: Uniform Graph Feature Selection with False Discovery Rate Control

    Authors: Jiajun Liang, Yue Liu, Doudou Zhou, Sinian Zhang, Junwei Lu

    Abstract: Graphical models find numerous applications in biology, chemistry, sociology, neuroscience, etc. While substantial progress has been made in graph estimation, it remains largely unexplored how to select significant graph signals with uncertainty assessment, especially those graph features related to topological structures including cycles (i.e., wreaths), cliques, hubs, etc. These features play a… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  27. arXiv:2403.07185  [pdf, other

    cs.LG stat.ML

    Uncertainty in Graph Neural Networks: A Survey

    Authors: Fangxin Wang, Yuqing Liu, Kay Liu, Yibo Wang, Sourav Medya, Philip S. Yu

    Abstract: Graph Neural Networks (GNNs) have been extensively used in various real-world applications. However, the predictive uncertainty of GNNs stemming from diverse sources such as inherent randomness in data and model training errors can lead to unstable and erroneous predictions. Therefore, identifying, quantifying, and utilizing uncertainty are essential to enhance the performance of the model for the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 13 main pages, 3 figures, 1 table. Under review

  28. arXiv:2402.18130  [pdf, ps, other

    stat.ME math.ST

    Sequential Change-point Detection for Compositional Time Series with Exogenous Variables

    Authors: Yajun Liu, Beth Andrews

    Abstract: Sequential change-point detection for time series enables us to sequentially check the hypothesis that the model still holds as more and more data are observed. It is widely used in data monitoring in practice. In this work, we consider sequential change-point detection for compositional time series, time series in which the observations are proportions. For fitting compositional time series, we p… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  29. arXiv:2402.17274  [pdf, ps, other

    stat.ME

    Sequential Change-point Detection for Binomial Time Series with Exogenous Variables

    Authors: Yajun Liu, Beth Andrews

    Abstract: Sequential change-point detection for time series enables us to sequentially check the hypothesis that the model still holds as more and more data are observed. It's widely used in data monitoring in practice. Meanwhile, binomial time series, which depicts independent binary individual behaviors within a group when the individual behaviors are dependent on past observations of the whole group, is… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  30. arXiv:2402.17106  [pdf, other

    stat.ML cs.CY cs.LG

    Achievable Fairness on Your Data With Utility Guarantees

    Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Yang Liu

    Abstract: In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questio… ▽ More

    Submitted 30 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  31. arXiv:2402.10537  [pdf, other

    stat.ME

    Quantifying Individual Risk for Binary Outcome: Bounds and Inference

    Authors: Peng Wu, Peng Ding, Zhi Geng, Yue Liu

    Abstract: Understanding treatment heterogeneity is crucial for reliable decision-making in treatment evaluation and selection. While the conditional average treatment effect (CATE) is commonly used to capture treatment heterogeneity induced by covariates and design individualized treatment policies, it remains an averaging metric within subpopulations. This limitation prevents it from unveiling individual-l… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  32. arXiv:2402.08018  [pdf, other

    cs.LG cs.CV stat.ML

    Nearest Neighbour Score Estimators for Diffusion Generative Models

    Authors: Matthew Niedoba, Dylan Green, Saeid Naderiparizi, Vasileios Lioutas, Jonathan Wilder Lavington, Xiaoxuan Liang, Yunpeng Liu, Ke Zhang, Setareh Dabiri, Adam Ścibior, Berend Zwartsenberg, Frank Wood

    Abstract: Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set… ▽ More

    Submitted 16 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: 25 pages, 9 figures. To be published in ICML 2024

  33. arXiv:2402.06223  [pdf, other

    cs.LG cs.CV stat.ML

    Revealing Multimodal Contrastive Representation Learning through Latent Partial Causal Models

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Biwei Huang, Mingming Gong, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Multimodal contrastive representation learning methods have proven successful across a range of domains, partly due to their ability to generate meaningful shared representations of complex phenomena. To enhance the depth of analysis and understanding of these acquired representations, we introduce a unified causal model specifically designed for multimodal data. By examining this model, we show t… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  34. Mobility Gaps between Low-Income and Not Low-Income Households: A Case Study in New York State

    Authors: Yuandong Liu, Majbah Uddin

    Abstract: Understanding the travel challenges faced by low-income residents has always been and continues to be one of the most important transportation equity topics. This study aims to explore the mobility gaps between low-income households (HHs) and not low-income HHs, and how the gaps vary within different socio-demographic population groups in New York State (NYS). The latest National Household Travel… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Journal ref: In International Conference on Transportation and Development 2023 (pp. 663-673)

  35. arXiv:2402.02368  [pdf, other

    cs.LG stat.ML

    Timer: Generative Pre-trained Transformers Are Large Time Series Models

    Authors: Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

    Abstract: Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous prog… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  36. arXiv:2401.18017  [pdf, ps, other

    stat.ML cs.LG

    Causal Discovery by Kernel Deviance Measures with Heterogeneous Transforms

    Authors: Tim Tse, Zhitang Chen, Shengyu Zhu, Yue Liu

    Abstract: The discovery of causal relationships in a set of random variables is a fundamental objective of science and has also recently been argued as being an essential component towards real machine intelligence. One class of causal discovery techniques are founded based on the argument that there are inherent structural asymmetries between the causal and anti-causal direction which could be leveraged in… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  37. arXiv:2401.08173  [pdf, other

    stat.ME

    Simultaneous Change Point Detection and Identification for High Dimensional Linear Models

    Authors: Bin Liu, Xinsheng Zhang, Yufeng Liu

    Abstract: In this article, we consider change point inference for high dimensional linear models. For change point detection, given any subgroup of variables, we propose a new method for testing the homogeneity of corresponding regression coefficients across the observations. Under some regularity conditions, the proposed new testing procedure controls the type I error asymptotically and is powerful against… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  38. arXiv:2401.02154  [pdf, other

    cs.LG cs.AI cs.CR stat.ME

    Disentangle Estimation of Causal Effects from Cross-Silo Data

    Authors: Yuxuan Liu, Haozhao Wang, Shuang Wang, Zhiming He, Wenchao Xu, Jialiang Zhu, Fan Yang

    Abstract: Estimating causal effects among different events is of great importance to critical fields such as drug development. Nevertheless, the data features associated with events may be distributed across various silos and remain private within respective parties, impeding direct information exchange between them. This, in turn, can result in biased estimations of local causal effects, which rely on the… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  39. arXiv:2312.17065  [pdf, other

    stat.ME cs.SE stat.AP

    CluBear: A Subsampling Package for Interactive Statistical Analysis with Massive Data on A Single Machine

    Authors: Ke Xu, Yingqiu Zhu, Yijing Liu, Hansheng Wang

    Abstract: This article introduces CluBear, a Python-based open-source package for interactive massive data analysis. The key feature of CluBear is that it enables users to conduct convenient and interactive statistical analysis of massive data with only a traditional single-computer system. Thus, CluBear provides a cost-effective solution when mining large-scale datasets. In addition, the CluBear package in… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  40. arXiv:2312.14333  [pdf, other

    cs.MA cs.LG stat.ME

    Behaviour Modelling of Social Animals via Causal Structure Discovery and Graph Neural Networks

    Authors: Gaël Gendron, Yang Chen, Mitchell Rogers, Yiping Liu, Mihailo Azhar, Shahrokh Heidari, David Arturo Soriano Valdez, Kobe Knowles, Padriac O'Leary, Simon Eyre, Michael Witbrock, Gillian Dobbie, Jiamou Liu, Patrice Delmas

    Abstract: Better understanding the natural world is a crucial task with a wide range of applications. In environments with close proximity between humans and animals, such as zoos, it is essential to better understand the causes behind animal behaviour and what interventions are responsible for changes in their behaviours. This can help to predict unusual behaviours, mitigate detrimental effects and increas… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 9 pages, 7 figures, accepted as an extended abstract and poster at AAMAS 2024

    ACM Class: I.2.6; I.5.1; I.6.3; J.4

  41. arXiv:2312.12871  [pdf, other

    cs.LG stat.ML

    Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches

    Authors: Yu Liu, Runzhe Wan, James McQueen, Doug Hains, Jinxiang Gu, Rui Song

    Abstract: The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of da… ▽ More

    Submitted 17 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  42. arXiv:2312.11437  [pdf, ps, other

    math.ST stat.ME

    Clustering Consistency of General Nonparametric Classification Methods in Cognitive Diagnosis

    Authors: Yanlong Liu, Gongjun Xu

    Abstract: Cognitive diagnosis models have been popularly used in fields such as education, psychology, and social sciences. While parametric likelihood estimation is a prevailing method for fitting cognitive diagnosis models, nonparametric methodologies are attracting increasing attention due to their ease of implementation and robustness, particularly when sample sizes are relatively small. However, existi… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  43. arXiv:2312.08075  [pdf, other

    cs.LG stat.ML

    TERM Model: Tensor Ring Mixture Model for Density Estimation

    Authors: Ruituo Wu, Jiani Liu, Ce Zhu, Anh-Huy Phan, Ivan V. Oseledets, Yipeng Liu

    Abstract: Efficient probability density estimation is a core challenge in statistical machine learning. Tensor-based probabilistic graph methods address interpretability and stability concerns encountered in neural network approaches. However, a substantial number of potential tensor permutations can lead to a tensor network with the same structure but varying expressive capabilities. In this paper, we take… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  44. arXiv:2312.03951  [pdf, other

    cs.LG stat.ML

    Understanding the Role of Optimization in Double Descent

    Authors: Chris Yuhao Liu, Jeffrey Flanigan

    Abstract: The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory and practice \citep{Belkin2018ReconcilingMM}. Additionally, while double descent has been observed in various tasks and architectures, the peak of double descent… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: NeurIPS Workshop 2023 Optimization for Machine Learning

  45. arXiv:2312.03218  [pdf, other

    cs.LG cs.CC stat.ML

    Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

    Authors: Yuanshi Liu, Hanzhen Zhao, Yang Xu, Pengyun Yue, Cong Fang

    Abstract: Gradient-based minimax optimal algorithms have greatly promoted the development of continuous optimization and machine learning. One seminal work due to Yurii Nesterov [Nes83a] established $\tilde{\mathcal{O}}(\sqrt{L/μ})$ gradient complexity for minimizing an $L$-smooth $μ$-strongly convex objective. However, an ideal algorithm would adapt to the explicit complexity of a particular objective func… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Optimization for Machine Learning

  46. arXiv:2312.01294  [pdf, other

    cs.LG stat.ML

    Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series

    Authors: Ying Liu, Peng Cui, Wenbo Hu, Richang Hong

    Abstract: Multivariate time series are everywhere. Nevertheless, real-world time series data often exhibit numerous missing values, which is the time series imputation task. Although previous deep learning methods have been shown to be effective for time series imputation, they are shown to produce overconfident imputations, which might be a potentially overlooked threat to the reliability of the intelligen… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  47. arXiv:2311.18506  [pdf, other

    stat.ML cs.LG eess.SY math.ST

    Global Convergence of Online Identification for Mixed Linear Regression

    Authors: Yujing Liu, Zhixin Liu, Lei Guo

    Abstract: Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships by utilizing a mixture of linear regression sub-models. The identification of MLR is a fundamental problem, where most of the existing results focus on offline algorithms, rely on independent and identically distributed (i.i.d) data assumptions, and provide local convergence results only. This paper invest… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  48. arXiv:2311.17605  [pdf, other

    stat.AP stat.ME

    Improving the Balance of Unobserved Covariates From Information Theory in Multi-Arm Randomization with Unequal Allocation Ratio

    Authors: Xingjian Ma, Yang Liu

    Abstract: Multi-arm randomization has increasingly widespread applications recently and it is also crucial to ensure that the distributions of important observed covariates as well as the potential unobserved covariates are similar and comparable among all the treatment. However, the theoretical properties of unobserved covariates imbalance in multi-arm randomization with unequal allocation ratio remains un… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 60 pages, 3 figures

  49. arXiv:2311.11163  [pdf, other

    cs.SI stat.AP stat.CO

    Hate speech and hate crimes: a data-driven study of evolving discourse around marginalized groups

    Authors: Malvina Bozhidarova, Jonathn Chang, Aaishah Ale-rasool, Yuxiang Liu, Chongyao Ma, Andrea L. Bertozzi, P. Jeffrey Brantingham, Junyuan Lin, Sanjukta Krishnagopal

    Abstract: This study explores the dynamic relationship between online discourse, as observed in tweets, and physical hate crimes, focusing on marginalized groups. Leveraging natural language processing techniques, including keyword extraction and topic modeling, we analyze the evolution of online discourse after events affecting these groups. Examining sentiment and polarizing tweets, we establish correlati… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  50. arXiv:2311.10101  [pdf, other

    cs.CR cs.DS cs.LG stat.ML stat.OT

    Gaussian Differential Privacy on Riemannian Manifolds

    Authors: Yangdi Jiang, Xiaotian Chang, Yi Liu, Lei Ding, Linglong Kong, Bei Jiang

    Abstract: We develop an advanced approach for extending Gaussian Differential Privacy (GDP) to general Riemannian manifolds. The concept of GDP stands out as a prominent privacy definition that strongly warrants extension to manifold settings, due to its central limit properties. By harnessing the power of the renowned Bishop-Gromov theorem in geometric analysis, we propose a Riemannian Gaussian distributio… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.