Search | arXiv e-print repository

Towards Certified Unlearning for Deep Neural Networks

Authors: Binchi Zhang, Yushun Dong, Tianhao Wang, Jundong Li

Abstract: In the field of machine unlearning, certified unlearning has been extensively studied in convex machine learning models due to its high efficiency and strong theoretical guarantees. However, its application to deep neural networks (DNNs), known for their highly nonconvex nature, still poses challenges. To bridge the gap between certified unlearning and DNNs, we propose several simple techniques to… ▽ More In the field of machine unlearning, certified unlearning has been extensively studied in convex machine learning models due to its high efficiency and strong theoretical guarantees. However, its application to deep neural networks (DNNs), known for their highly nonconvex nature, still poses challenges. To bridge the gap between certified unlearning and DNNs, we propose several simple techniques to extend certified unlearning methods to nonconvex objectives. To reduce the time complexity, we develop an efficient computation method by inverse Hessian approximation without compromising certification guarantees. In addition, we extend our discussion of certification to nonconvergence training and sequential unlearning, considering that real-world users can send unlearning requests at different time points. Extensive experiments on three real-world datasets demonstrate the efficacy of our method and the advantages of certified unlearning in DNNs. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: ICML 2024

arXiv:2407.19446 [pdf, ps, other]

Leave-One-Out Analysis for Nonconvex Robust Matrix Completion with General Thresholding Functions

Authors: Tianming Wang, Ke Wei

Abstract: We study the problem of robust matrix completion (RMC), where the partially observed entries of an underlying low-rank matrix is corrupted by sparse noise. Existing analysis of the non-convex methods for this problem either requires the explicit but empirically redundant regularization in the algorithm or requires sample splitting in the analysis. In this paper, we consider a simple yet efficient… ▽ More We study the problem of robust matrix completion (RMC), where the partially observed entries of an underlying low-rank matrix is corrupted by sparse noise. Existing analysis of the non-convex methods for this problem either requires the explicit but empirically redundant regularization in the algorithm or requires sample splitting in the analysis. In this paper, we consider a simple yet efficient nonconvex method which alternates between a projected gradient step for the low-rank part and a thresholding step for the sparse noise part. Inspired by leave-one out analysis for low rank matrix completion, it is established that the method can achieve linear convergence for a general class of thresholding functions, including for example soft-thresholding and SCAD. To the best of our knowledge, this is the first leave-one-out analysis on a nonconvex method for RMC. Additionally, when applying our result to low rank matrix completion, it improves the sampling complexity of existing result for the singular value projection method. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.03619 [pdf, other]

Multivariate Representations of Univariate Marked Hawkes Processes

Authors: Louis Davis, Conor Kresin, Boris Baeumer, Ting Wang

Abstract: Univariate marked Hawkes processes are used to model a range of real-world phenomena including earthquake aftershock sequences, contagious disease spread, content diffusion on social media platforms, and order book dynamics. This paper illustrates a fundamental connection between univariate marked Hawkes processes and multivariate Hawkes processes. Exploiting this connection renders a framework th… ▽ More Univariate marked Hawkes processes are used to model a range of real-world phenomena including earthquake aftershock sequences, contagious disease spread, content diffusion on social media platforms, and order book dynamics. This paper illustrates a fundamental connection between univariate marked Hawkes processes and multivariate Hawkes processes. Exploiting this connection renders a framework that can be built upon for expressive and flexible inference on diverse data. Specifically, multivariate unmarked Hawkes representations are introduced as a tool to parameterize univariate marked Hawkes processes. We show that such multivariate representations can asymptotically approximate a large class of univariate marked Hawkes processes, are stationary given the approximated process is stationary, and that resultant conditional intensity parameters are identifiable. A simulation study demonstrates the efficacy of this approach, and provides heuristic bounds for error induced by the relatively larger parameter space of multivariate Hawkes processes. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 26 pages, 3 figures, submitted to the Annals of Statistics

arXiv:2407.02754 [pdf, other]

Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Authors: Garud Iyengar, Henry Lam, Tianyu Wang

Abstract: Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that in fact, for a wide spectrum of models, CV does not statistically outperform the simple "plug-in" approach where one r… ▽ More Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that in fact, for a wide spectrum of models, CV does not statistically outperform the simple "plug-in" approach where one reuses training data for testing evaluation. Specifically, in terms of both the asymptotic bias and coverage accuracy of the associated interval for out-of-sample evaluation, $K$-fold CV provably cannot outperform plug-in regardless of the rate at which the parametric or nonparametric models converge. Leave-one-out CV can have a smaller bias as compared to plug-in; however, this bias improvement is negligible compared to the variability of the evaluation, and in some important cases leave-one-out again does not outperform plug-in once this variability is taken into account. We obtain our theoretical comparisons via a novel higher-order Taylor analysis that allows us to derive necessary conditions for limit theorems of testing evaluations, which applies to model classes that are not amenable to previously known sufficient conditions. Our numerical results demonstrate that plug-in performs indeed no worse than CV across a wide range of examples. △ Less

Submitted 20 August, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.12843 [pdf, other]

Can Go AIs be adversarially robust?

Authors: Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave

Abstract: Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if simple defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that some of these defenses are able to protect… ▽ More Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if simple defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that some of these defenses are able to protect against previously discovered attacks. Unfortunately, we also find that none of these defenses are able to withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even in narrow domains such as Go. For interactive examples of attacks and a link to our codebase, see https://goattack.far.ai. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 67 pages

arXiv:2406.11011 [pdf, other]

Data Shapley in One Training Run

Authors: Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia

Abstract: Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, m… ▽ More Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, meaning they cannot perform targeted attribution towards a specific model obtained from a single run of the algorithm. This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest. In its most efficient implementation, our technique incurs negligible additional runtime compared to standard model training. This dramatic efficiency improvement makes it possible to perform data attribution for the foundation model pretraining stage for the first time. We present several case studies that offer fresh insights into pretraining data's contribution and discuss their implications for copyright in generative AI and pretraining data curation. △ Less

Submitted 29 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2405.17684 [pdf, other]

doi 10.5705/ss.202023.0107

ZIKQ: An innovative centile chart method for utilizing natural history data in rare disease clinical development

Authors: Tianying Wang, Wenfei Zhang, Ying Wei

Abstract: Utilizing natural history data as external control plays an important role in the clinical development of rare diseases, since placebo groups in double-blind randomization trials may not be available due to ethical reasons and low disease prevalence. This article proposed an innovative approach for utilizing natural history data to support rare disease clinical development by constructing referenc… ▽ More Utilizing natural history data as external control plays an important role in the clinical development of rare diseases, since placebo groups in double-blind randomization trials may not be available due to ethical reasons and low disease prevalence. This article proposed an innovative approach for utilizing natural history data to support rare disease clinical development by constructing reference centile charts. Due to the deterioration nature of certain rare diseases, the distributions of clinical endpoints can be age-dependent and have an absorbing state of zero, which can result in censored natural history data. Existing methods of reference centile charts can not be directly used in the censored natural history data. Therefore, we propose a new calibrated zero-inflated kernel quantile (ZIKQ) estimation to construct reference centile charts from censored natural history data. Using the application to Duchenne Muscular Dystrophy drug development, we demonstrate that the reference centile charts using the ZIKQ method can be implemented to evaluate treatment efficacy and facilitate a more targeted patient enrollment in rare disease clinical development. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17248 [pdf, other]

Transformer In-Context Learning for Categorical Data

Authors: Aaron T. Wang, Ricardo Henao, Lawrence Carin

Abstract: Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is draw… ▽ More Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is drawn from a categorical distribution that depends on covariates $x_i\in\mathbb{R}^d$. Contextual outcomes in the $m$th set of contextual data, $\textsf{C}_m$, are modeled in terms of latent function $f_m(x)\in\textsf{F}$, where $\textsf{F}$ is a functional class with $(C-1)$-dimensional vector output. The probability of observing class $c\in\{0,\dots,C-1\}$ is modeled in terms of the output components of $f_m(x)$ via the softmax. The Transformer parameters may be trained with $M$ contextual examples, $\{\textsf{C}_m\}_{m=1,M}$, and the trained model is then applied to new contextual data $\textsf{C}_{M+1}$ for new $f_{M+1}(x)\in\textsf{F}$. The goal is for the Transformer to constitute the probability of each category $c\in\{0,\dots,C-1\}$ for a new query $x_{N_{M+1}+1}$. We assume each component of $f_m(x)$ resides in a reproducing kernel Hilbert space (RKHS), specifying $\textsf{F}$. Analysis and an extensive set of experiments suggest that on its forward pass the Transformer (with attention defined by the RKHS kernel) implements a form of gradient descent of the underlying function, connected to the latent vector function associated with the softmax. We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.12437 [pdf]

Considerations for Single-Arm Trials to Support Accelerated Approval of Oncology Drugs

Authors: Feinan Lu, Tao Wang, Ying Lu, Jie Chen

Abstract: In the last two decades, single-arm trials (SATs) have been effectively used to study anticancer therapies in well-defined patient populations using durable response rates as an objective and interpretable clinical endpoints. With a growing trend of regulatory accelerated approval (AA) requiring randomized controlled trials (RCTs), some confusions have arisen about the roles of SATs in AA. This pa… ▽ More In the last two decades, single-arm trials (SATs) have been effectively used to study anticancer therapies in well-defined patient populations using durable response rates as an objective and interpretable clinical endpoints. With a growing trend of regulatory accelerated approval (AA) requiring randomized controlled trials (RCTs), some confusions have arisen about the roles of SATs in AA. This paper is intended to elucidate conditions under which an SAT may be considered reasonable for AA. Specifically, the paper describes (1) two necessary conditions for designing an SAT, (2) three sufficient conditions that help either optimize the study design or interpret the study results, (3) four conditions that demonstrate substantial evidence of clinical benefits of the drug, and (4) a plan of a confirmatory RCT to verify the clinical benefits. Some further considerations are discussed to help design a scientifically sound SAT and communicate with regulatory agencies. Conditions presented in this paper may serve as a set of references for sponsors using SATs for regulatory decision. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.09841 [pdf, other]

Simultaneous Identification of Sparse Structures and Communities in Heterogeneous Graphical Models

Authors: Dapeng Shi, Tiandong Wang, Zhiliang Ying

Abstract: Exploring and detecting community structures hold significant importance in genetics, social sciences, neuroscience, and finance. Especially in graphical models, community detection can encourage the exploration of sets of variables with group-like properties. In this paper, within the framework of Gaussian graphical models, we introduce a novel decomposition of the underlying graphical structure… ▽ More Exploring and detecting community structures hold significant importance in genetics, social sciences, neuroscience, and finance. Especially in graphical models, community detection can encourage the exploration of sets of variables with group-like properties. In this paper, within the framework of Gaussian graphical models, we introduce a novel decomposition of the underlying graphical structure into a sparse part and low-rank diagonal blocks (non-overlapped communities). We illustrate the significance of this decomposition through two modeling perspectives and propose a three-stage estimation procedure with a fast and efficient algorithm for the identification of the sparse structure and communities. Also on the theoretical front, we establish conditions for local identifiability and extend the traditional irrepresentability condition to an adaptive form by constructing an effective norm, which ensures the consistency of model selection for the adaptive $\ell_1$ penalized estimator in the second stage. Moreover, we also provide the clustering error bound for the K-means procedure in the third stage. Extensive numerical experiments are conducted to demonstrate the superiority of the proposed method over existing approaches in estimating graph structures. Furthermore, we apply our method to the stock return data, revealing its capability to accurately identify non-overlapped community structures. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 61 pages, 11 figures, 4 tables

arXiv:2405.05733 [pdf, other]

Batched Stochastic Bandit for Nondegenerate Functions

Authors: Yu Liu, Yunlu Shu, Tianyu Wang

Abstract: This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs… ▽ More This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $π$, there exists a problem instance on which $π$ admits a regret of order $Ω ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ Ω( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches. △ Less

Submitted 29 August, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 34 pages, 14 colored figures

arXiv:2405.03875 [pdf, other]

Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

Authors: Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia

Abstract: Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis te… ▽ More Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis testing framework and show that Data Shapley's performance can be no better than random selection without specific constraints on utility functions. We identify a class of utility functions, monotonically transformed modular functions, within which Data Shapley optimally selects data. Based on this insight, we propose a heuristic for predicting Data Shapley's effectiveness in data selection tasks. Our experiments corroborate these findings, adding new insights into when Data Shapley may or may not succeed. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2404.13964 [pdf, other]

An Economic Solution to Copyright Challenges of Generative AI

Authors: Jiachen T. Wang, Zhun Deng, Hiroaki Chiba-Okabe, Boaz Barak, Weijie J. Su

Abstract: Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their cont… ▽ More Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners. △ Less

Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.01478 [pdf, other]

A Multidimensional Fractional Hawkes Process for Multiple Earthquake Mainshock Aftershock Sequences

Authors: Louis Davis, Boris Baeumer, Ting Wang

Abstract: Most point process models for earthquakes currently in the literature assume the magnitude distribution is i.i.d. potentially hindering the ability of the model to describe the main features of data sets containing multiple earthquake mainshock aftershock sequences in succession. This study presents a novel multidimensional fractional Hawkes process model designed to capture magnitude dependent tr… ▽ More Most point process models for earthquakes currently in the literature assume the magnitude distribution is i.i.d. potentially hindering the ability of the model to describe the main features of data sets containing multiple earthquake mainshock aftershock sequences in succession. This study presents a novel multidimensional fractional Hawkes process model designed to capture magnitude dependent triggering behaviour by incorporating history dependence into the magnitude distribution. This is done by discretising the magnitude range into disjoint intervals and modelling events with magnitude in these ranges as the subprocesses of a mutually exciting Hawkes process using the Mittag-Leffler density as the kernel function. We demonstrate this model's use by applying it to two data sets, Japan and the Middle America Trench, both containing multiple mainshock aftershock sequences and compare it to the existing ETAS model by using information criteria, residual diagnostics and retrospective prediction performance. We find that for both data sets all metrics indicate that the multidimensional fractional Hawkes process performs favourably against the ETAS model. Furthermore, using the multidimensional fractional Hawkes process we are able to infer characteristics of the data sets that are consistent with results currently in the literature and that cannot be found by using the ETAS model. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 37 pages, 10 tables, 3 figures

arXiv:2403.08699 [pdf, ps, other]

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention

Authors: Heejune Sheen, Siyu Chen, Tianhao Wang, Harrison H. Zhou

Abstract: We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model, where the key and query weight matrices are trained separately. Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices. Such i… ▽ More We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model, where the key and query weight matrices are trained separately. Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices. Such implicit regularization can be described by a Support Vector Machine (SVM) problem with respect to the attention weights. This finding contrasts with prior results showing that the gradient descent induces an implicit regularization on the Frobenius norm on the product weight matrix when the key and query matrices are combined into a single weight matrix for training. For diagonal key and query matrices, our analysis builds upon the reparameterization technique and exploits approximate KKT conditions of the SVM associated with the classification data. Moreover, the results are extended to general weights configurations given proper alignment of the weight matrices' singular spaces with the data features at initialization. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 34 pages

arXiv:2403.06783 [pdf]

A doubly robust estimator for the Mann Whitney Wilcoxon Rank Sum Test when applied for causal inference in observational studies

Authors: Ruohui Chen, Tuo Lin, Lin Liu, Jinyuan Liu, Ruifeng Chen, Jingjing Zou, Chenyu Liu, Loki Natarajan, Tang Wang, Xinlian Zhang, Xin Tu

Abstract: The Mann-Whitney-Wilcoxon rank sum test (MWWRST) is a widely used method for comparing two treatment groups in randomized control trials, particularly when dealing with highly skewed data. However, when applied to observational study data, the MWWRST often yields invalid results for causal inference. To address this limitation, Wu et al. (2014) introduced an approach that incorporates inverse prob… ▽ More The Mann-Whitney-Wilcoxon rank sum test (MWWRST) is a widely used method for comparing two treatment groups in randomized control trials, particularly when dealing with highly skewed data. However, when applied to observational study data, the MWWRST often yields invalid results for causal inference. To address this limitation, Wu et al. (2014) introduced an approach that incorporates inverse probability weighting (IPW) into this rank-based statistics to mitigate confounding effects. Subsequently, Mao (2018), Zhang et al. (2019), and Ai et al. (2020) extended this IPW estimator to develop doubly robust estimators. Nevertheless, each of these approaches has notable limitations. Mao's method imposes stringent assumptions that may not align with real-world study data. Zhang et al.'s (2019) estimators rely on bootstrap inference, which suffers from computational inefficiency and lacks known asymptotic properties. Meanwhile, Ai et al. (2020) primarily focus on testing the null hypothesis of equal distributions between two groups, which is a more stringent assumption that may not be well-suited to the primary practical application of MWWRST. In this paper, we aim to address these limitations by leveraging functional response models (FRM) to develop doubly robust estimators. We demonstrate the performance of our proposed approach using both simulated and real study data. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.03183 [pdf, other]

How Well Can Transformers Emulate In-context Newton's Method?

Authors: Angeliki Giannou, Liu Yang, Tianhao Wang, Dimitris Papailiopoulos, Jason D. Lee

Abstract: Transformer-based models have demonstrated remarkable in-context learning capabilities, prompting extensive research into its underlying mechanisms. Recent studies have suggested that Transformers can implement first-order optimization algorithms for in-context learning and even second order ones for the case of linear regression. In this work, we study whether Transformers can perform higher orde… ▽ More Transformer-based models have demonstrated remarkable in-context learning capabilities, prompting extensive research into its underlying mechanisms. Recent studies have suggested that Transformers can implement first-order optimization algorithms for in-context learning and even second order ones for the case of linear regression. In this work, we study whether Transformers can perform higher order optimization methods, beyond the case of linear regression. We establish that linear attention Transformers with ReLU layers can approximate second order optimization algorithms for the task of logistic regression and achieve $ε$ error with only a logarithmic to the error more layers. As a by-product we demonstrate the ability of even linear attention-only Transformers in implementing a single step of Newton's iteration for matrix inversion with merely two layers. These results suggest the ability of the Transformer architecture to implement complex algorithms, beyond gradient descent. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00142 [pdf, other]

A Fractional Model for Earthquakes

Authors: Louis Davis, Boris Baeumer, Ting Wang

Abstract: This paper extends the existing fractional Hawkes process to better model mainshock-aftershock sequences of earthquakes. The fractional Hawkes process is a self-exciting point process model with temporal decay kernel being a Mittag-Leffler function. A maximum likelihood estimation scheme is developed and its consistency is checked. It is then compared to the ETAS model on three earthquake sequence… ▽ More This paper extends the existing fractional Hawkes process to better model mainshock-aftershock sequences of earthquakes. The fractional Hawkes process is a self-exciting point process model with temporal decay kernel being a Mittag-Leffler function. A maximum likelihood estimation scheme is developed and its consistency is checked. It is then compared to the ETAS model on three earthquake sequences in Southern California. The fractional Hawkes process performs favourably against the ETAS model. Additionally, two parameters in the fractional Hawkes process may have a fixed geophysical meaning dependent on the study zone and the stage of the seismic cycle the zone is in. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 16 pages, 7 figure, submitted to the Journal of the Royal Statistical Society Series C

arXiv:2402.19442 [pdf, other]

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality

Authors: Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang

Abstract: We study the dynamics of gradient flow for training a multi-head softmax attention model for in-context learning of multi-task linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving… ▽ More We study the dynamics of gradient flow for training a multi-head softmax attention model for in-context learning of multi-task linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multi-task model. Specifically, we prove that the gradient flow dynamics can be split into three phases -- a warm-up phase where the loss decreases rather slowly and the attention heads gradually build up their inclination towards individual tasks, an emergence phase where each head selects a single task and the loss rapidly decreases, and a convergence phase where the attention parameters converge to a limit. Furthermore, we prove the optimality of gradient flow in the sense that the limiting model learned by gradient flow is on par with the best possible multi-head softmax attention model up to a constant factor. Our analysis also delineates a strict separation in terms of the prediction accuracy of ICL between single-head and multi-head attention models. The key technique for our convergence analysis is to map the gradient flow dynamics in the parameter space to a set of ordinary differential equations in the spectral domain, where the relative magnitudes of the semi-singular values of the attention weights determines task allocation. To our best knowledge, our work provides the first convergence result for the multi-head softmax attention model. △ Less

Submitted 10 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 141 pages, 7 figures

arXiv:2402.16661 [pdf, other]

Penalized Generative Variable Selection

Authors: Tong Wang, Jian Huang, Shuangge Ma

Abstract: Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using t… ▽ More Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using the conditional Wasserstein Generative Adversarial networks. Group Lasso penalization is applied for variable selection, which may improve model estimation/prediction, interpretability, stability, etc. Significantly advancing from the existing literature, the analysis of censored survival data is also considered. We establish the convergence rate for variable selection while considering the approximation error, and obtain a more efficient distribution estimation. Simulations and the analysis of real experimental data demonstrate satisfactory practical utility of the proposed analysis. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15620 [pdf, other]

Comparison of sectoral structures between China and Japan: A network perspective

Authors: Tao Wang, Shiying Xiao, Jun Yan

Abstract: Economic structure comparisons between China and Japan have long captivated development economists. To delve deeper into their sectoral differences from 1995 to 2018, we used the annual input-output tables (IOTs) of both nations to construct weighted and directed input-output networks (IONs). This facilitated deeper network analyses. Strength distributions underscored variations in inter-sector ec… ▽ More Economic structure comparisons between China and Japan have long captivated development economists. To delve deeper into their sectoral differences from 1995 to 2018, we used the annual input-output tables (IOTs) of both nations to construct weighted and directed input-output networks (IONs). This facilitated deeper network analyses. Strength distributions underscored variations in inter-sector economic interactions. Weighted, directed assortativity coefficients encapsulated the homophily among connecting sectors' features. By adjusting emphasis in PageRank centrality, key sectors were identified. Community detection revealed their clustering tendencies among the sectors. As anticipated, the analysis pinpointed manufacturing as China's central sector, while Japan favored services. Yet, at a finer level of the specific sectors, both nations exhibited varied structural evolutions. Contrastingly, sectoral communities in both China and Japan demonstrated commendable stability over the examined duration. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14090 [pdf, other]

Social Environment Design

Authors: Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen

Abstract: Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The fra… ▽ More Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The framework seeks to capture general economic environments, includes voting on policy objectives, and gives a direction for the systematic analysis of government and economic policy through AI simulation. We highlight key open problems for future research in AI-based policy-making. By solving these challenges, we hope to achieve various social welfare objectives, thereby promoting more ethical and responsible decision making. △ Less

Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: ICML 2024 Position Paper. Website at https://sed.eddie.win

arXiv:2402.13259 [pdf, other]

Fast Discrete-Event Simulation of Markovian Queueing Networks through Euler Approximation

Authors: L. Jeff Hong, Yingda Song, Tan Wang

Abstract: The efficient management of large-scale queueing networks is critical for a variety of sectors, including healthcare, logistics, and customer service, where system performance has profound implications for operational effectiveness and cost management. To address this key challenge, our paper introduces simulation techniques tailored for complex, large-scale Markovian queueing networks. We develop… ▽ More The efficient management of large-scale queueing networks is critical for a variety of sectors, including healthcare, logistics, and customer service, where system performance has profound implications for operational effectiveness and cost management. To address this key challenge, our paper introduces simulation techniques tailored for complex, large-scale Markovian queueing networks. We develop two simulation schemes based on Euler approximation, namely the backward and forward schemes. These schemes can accommodate time-varying dynamics and are optimized for efficient implementation using vectorization. Assuming a feedforward queueing network structure, we establish that the two schemes provide stochastic upper and lower bounds for the system state, while the approximation error remains bounded over the simulation horizon. With the recommended choice of time step, we show that our approximation schemes exhibit diminishing asymptotic relative error as the system scales up, while maintaining much lower computational complexity compared to traditional discrete-event simulation and achieving speedups up to tens of thousands times. This study highlights the substantial potential of Euler approximation in simulating large-scale discrete systems. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.09702 [pdf, other]

Sparse and Faithful Explanations Without Sparse Models

Authors: Yiyang Sun, Zhi Chen, Vittorio Orlandi, Tong Wang, Cynthia Rudin

Abstract: Even if a model is not globally sparse, it is possible for decisions made from that model to be accurately and faithfully described by a small number of features. For instance, an application for a large loan might be denied to someone because they have no credit history, which overwhelms any evidence towards their creditworthiness. In this work, we introduce the Sparse Explanation Value (SEV), a… ▽ More Even if a model is not globally sparse, it is possible for decisions made from that model to be accurately and faithfully described by a small number of features. For instance, an application for a large loan might be denied to someone because they have no credit history, which overwhelms any evidence towards their creditworthiness. In this work, we introduce the Sparse Explanation Value (SEV), a new way of measuring sparsity in machine learning models. In the loan denial example above, the SEV is 1 because only one factor is needed to explain why the loan was denied. SEV is a measure of decision sparsity rather than overall model sparsity, and we are able to show that many machine learning models -- even if they are not sparse -- actually have low decision sparsity, as measured by SEV. SEV is defined using movements over a hypercube, allowing SEV to be defined consistently over various model classes, with movement restrictions reflecting real-world constraints. We proposed the algorithms that reduce SEV without sacrificing accuracy, providing sparse and completely faithful explanations, even without globally sparse models. △ Less

Submitted 8 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted in AISTATS 2024

arXiv:2402.07806 [pdf, other]

A comparison of mixed-models for the analysis of non-linear longitudinal data: application to late-life cognitive trajectories

Authors: Maude Wagner, Donald R. Hedeker, Tianhao Wang, Graciela Muniz-Terrera, Ana W. Capuano

Abstract: Several mixed-effects models for longitudinal data have been proposed to accommodate the non-linearity of late-life cognitive trajectories and assess the putative influence of covariates on it. No prior research provides a side-by-side examination of these models to offer guidance on their proper application and interpretation. In this work, we examined five statistical approaches previously used… ▽ More Several mixed-effects models for longitudinal data have been proposed to accommodate the non-linearity of late-life cognitive trajectories and assess the putative influence of covariates on it. No prior research provides a side-by-side examination of these models to offer guidance on their proper application and interpretation. In this work, we examined five statistical approaches previously used to answer research questions related to non-linear changes in cognitive aging: the linear mixed model (LMM) with a quadratic term, LMM with splines, the functional mixed model, the piecewise linear mixed model, and the sigmoidal mixed model. We first theoretically describe the models. Next, using data from two prospective cohorts with annual cognitive testing, we compared the interpretation of the models by investigating associations of education on cognitive change before death. Lastly, we performed a simulation study to empirically evaluate the models and provide practical recommendations. Except for the LMM-quadratic, the fit of all models was generally adequate to capture non-linearity of cognitive change and models were relatively robust. Although spline-based models have no interpretable nonlinearity parameters, their convergence was easier to achieve, and they allow graphical interpretation. In contrast, piecewise and sigmoidal models, with interpretable non-linear parameters, may require more data to achieve convergence. △ Less

Submitted 6 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 34 pages, 7 Figures, 1 Table

arXiv:2401.13094 [pdf, other]

On cross-validated estimation of skew normal model

Authors: Jian Zhang, Tong Wang

Abstract: Skew normal model suffers from inferential drawbacks, namely singular Fisher information in the vicinity of symmetry and diverging of maximum likelihood estimation. To address the above drawbacks, Azzalini and Arellano-Valle (2013) introduced maximum penalised likelihood estimation (MPLE) by subtracting a penalty function from the log-likelihood function with a pre-specified penalty coefficient. H… ▽ More Skew normal model suffers from inferential drawbacks, namely singular Fisher information in the vicinity of symmetry and diverging of maximum likelihood estimation. To address the above drawbacks, Azzalini and Arellano-Valle (2013) introduced maximum penalised likelihood estimation (MPLE) by subtracting a penalty function from the log-likelihood function with a pre-specified penalty coefficient. Here, we propose a cross-validated MPLE to improve its performance when the underlying model is close to symmetry. We develop a theory for MPLE, where an asymptotic rate for the cross-validated penalty coefficient is derived. We further show that the proposed cross-validated MPLE is asymptotically efficient under certain conditions. In simulation studies and a real data application, we demonstrate that the proposed estimator can outperform the conventional MPLE when the model is close to symmetry. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.11103 [pdf, other]

Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

Authors: Jiachen T. Wang, Prateek Mittal, Ruoxi Jia

Abstract: This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting… ▽ More This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting a notable improvement from $O(N^K)$, the best result from existing literature. We develop a deterministic approximation algorithm that further improves computational efficiency while maintaining the key fairness properties of the Shapley value. Through extensive experiments, we demonstrate WKNN-Shapley's computational efficiency and its superior performance in discerning data quality compared to its unweighted counterpart. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: AISTATS 2024 Oral

arXiv:2312.16260 [pdf, other]

Multinomial Link Models

Authors: Tianmeng Wang, Liping Tong, Jie Yang

Abstract: We propose a unified multinomial link model for analyzing categorical responses. It not only covers the existing multinomial logistic models and their extensions as special cases, but also includes new models that can incorporate the observations with NA or Unknown responses in the data analysis. We provide explicit formulae and detailed algorithms for finding the maximum likelihood estimates of t… ▽ More We propose a unified multinomial link model for analyzing categorical responses. It not only covers the existing multinomial logistic models and their extensions as special cases, but also includes new models that can incorporate the observations with NA or Unknown responses in the data analysis. We provide explicit formulae and detailed algorithms for finding the maximum likelihood estimates of the model parameters and computing the Fisher information matrix. Our algorithms solve the infeasibility issue of existing statistical software on estimating parameters of cumulative link models. The applications to real datasets show that the new models can fit the data significantly better, and the corresponding data analysis may correct the misleading conclusions due to missing responses. △ Less

Submitted 18 June, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: 39 pages, 5 figures

arXiv:2312.06204 [pdf, ps, other]

Multilayer Network Regression with Eigenvector Centrality and Community Structure

Authors: Zhuoye Han, Tiandong Wang

Abstract: In the analysis of complex networks, centrality measures and community structures are two important aspects. For multilayer networks, one crucial task is to integrate information across different layers, especially taking the dependence structure within and between layers into consideration. In this study, we introduce a novel two-stage regression model (CC-MNetR) that leverages the eigenvector ce… ▽ More In the analysis of complex networks, centrality measures and community structures are two important aspects. For multilayer networks, one crucial task is to integrate information across different layers, especially taking the dependence structure within and between layers into consideration. In this study, we introduce a novel two-stage regression model (CC-MNetR) that leverages the eigenvector centrality and network community structure of fourth-order tensor-like multilayer networks. In particular, we construct community-based centrality measures, which are then incorporated into the regression model. In addition, considering the noise of network data, we analyze the centrality measure with and without measurement errors respectively, and establish the consistent properties of the least squares estimates in the regression. Our proposed method is then applied to the World Input-Output Database (WIOD) dataset to explore how input-output network data between different countries and different industries affect the Gross Output of each industry. △ Less

Submitted 11 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.03561 [pdf]

Blueprinting the Future: Automatic Item Categorization using Hierarchical Zero-Shot and Few-Shot Classifiers

Authors: Ting Wang, Keith Stelter, Jenn Floyd, Thomas O'Neill, Nathaniel Hendrix, Andrew Bazemore, Kevin Rode, Warren Newton

Abstract: In testing industry, precise item categorization is pivotal to align exam questions with the designated content domains outlined in the assessment blueprint. Traditional methods either entail manual classification, which is laborious and error-prone, or utilize machine learning requiring extensive training data, often leading to model underfit or overfit issues. This study unveils a novel approach… ▽ More In testing industry, precise item categorization is pivotal to align exam questions with the designated content domains outlined in the assessment blueprint. Traditional methods either entail manual classification, which is laborious and error-prone, or utilize machine learning requiring extensive training data, often leading to model underfit or overfit issues. This study unveils a novel approach employing the zero-shot and few-shot Generative Pretrained Transformer (GPT) classifier for hierarchical item categorization, minimizing the necessity for training data, and instead, leveraging human-like language descriptions to define categories. Through a structured python dictionary, the hierarchical nature of examination blueprints is navigated seamlessly, allowing for a tiered classification of items across multiple levels. An initial simulation with artificial data demonstrates the efficacy of this method, achieving an average accuracy of 92.91% measured by the F1 score. This method was further applied to real exam items from the 2022 In-Training Examination (ITE) conducted by the American Board of Family Medicine (ABFM), reclassifying 200 items according to a newly formulated blueprint swiftly in 15 minutes, a task that traditionally could span several days among editors and physicians. This innovative approach not only drastically cuts down classification time but also ensures a consistent, principle-driven categorization, minimizing human biases and discrepancies. The ability to refine classifications by adjusting definitions adds to its robustness and sustainability. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.02618 [pdf, other]

Regionalization of China's PM2.5 through Robust Spatio temporal Functional Clustering Method

Authors: Tingyin Wang, Xueqin Wang, Xiaobo Guo, Heping Zhang

Abstract: The patterns of particulate matter with diameters that are generally 2.5 micrometers and smaller (PM2.5) are heterogeneous in China nationwide but can be homogeneous region-wide. To reduce the adverse effects from PM2.5, policymakers need to develop location-specific regulations based on nationwide clustering analysis of PM2.5 concentrations. However, such an analysis is challenging because the da… ▽ More The patterns of particulate matter with diameters that are generally 2.5 micrometers and smaller (PM2.5) are heterogeneous in China nationwide but can be homogeneous region-wide. To reduce the adverse effects from PM2.5, policymakers need to develop location-specific regulations based on nationwide clustering analysis of PM2.5 concentrations. However, such an analysis is challenging because the data have complex structures and are usually noisy. In this study, we propose a robust clustering framework using a novel concept of depth, which can handle both magnitude and shape outliers effectively and incorporate spatial information. We apply this framework to a PM2.5 dataset and reveal ten regions in China that have distinct PM2.5 patterns, and the homogeneity within each cluster is also confirmed. The clusters have clearly visible boundaries and enable policymakers to develop local emission control policies and establish regional collaborative systems to control air pollution in China. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.09999 [pdf, other]

Outlier Detection Using Generative Models with Theoretical Performance Guarantees

Authors: Jirong Yi, Jingchao Gao, Tianming Wang, Xiaodong Wu, Weiyu Xu

Abstract: This paper considers the problem of recovering signals modeled by generative models from linear measurements contaminated with sparse outliers. We propose an outlier detection approach for reconstructing the ground-truth signals modeled by generative models under sparse outliers. We establish theoretical recovery guarantees for reconstruction of signals using generative models in the presence of o… ▽ More This paper considers the problem of recovering signals modeled by generative models from linear measurements contaminated with sparse outliers. We propose an outlier detection approach for reconstructing the ground-truth signals modeled by generative models under sparse outliers. We establish theoretical recovery guarantees for reconstruction of signals using generative models in the presence of outliers, giving lower bounds on the number of correctable outliers. Our results are applicable to both linear generator neural networks and the nonlinear generator neural networks with an arbitrary number of layers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outlier detection problem via $\ell_1$ norm minimization, and a gradient descent algorithm for solving the outlier detection problem via squared $\ell_1$ norm minimization. We conduct extensive experiments using variational auto-encoder and deep convolutional generative adversarial networks, and the experimental results show that the signals can be successfully reconstructed under outliers using our approach. Our approach outperforms the traditional Lasso and $\ell_2$ minimization approach. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:1810.11335

arXiv:2310.06715 [pdf, other]

S4Sleep: Elucidating the design space of deep-learning-based sleep stage classification models

Authors: Tiezhi Wang, Nils Strodthoff

Abstract: Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these… ▽ More Scoring sleep stages in polysomnography recordings is a time-consuming task plagued by significant inter-rater variability. Therefore, it stands to benefit from the application of machine learning algorithms. While many algorithms have been proposed for this purpose, certain critical architectural decisions have not received systematic exploration. In this study, we meticulously investigate these design choices within the broad category of encoder-predictor architectures. We identify robust architectures applicable to both time series and spectrogram input representations. These architectures incorporate structured state space models as integral components and achieve statistically significant performance improvements compared to state-of-the-art approaches on the extensive Sleep Heart Health Study dataset. We anticipate that the architectural insights gained from this study along with the refined methodology for architecture search demonstrated herein will not only prove valuable for future research in sleep staging but also hold relevance for other time series annotation tasks. △ Less

Submitted 21 August, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 33 pages, 3 figures, code available at https://github.com/AI4HealthUOL/s4sleep

arXiv:2309.16578 [pdf, other]

doi 10.1038/s43588-024-00605-8

Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

Authors: He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

Abstract: Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT… ▽ More Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. Here we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep learning functional model. We build the essential non-locality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those seen in training, which unleashes the appealing scaling of OFDFT for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry. △ Less

Submitted 9 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Published in Nature Computational Science, March 2024. Full paper with supplementary information

arXiv:2308.15709 [pdf, other]

Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

Authors: Jiachen T. Wang, Yuqing Zhu, Yu-Xiang Wang, Ruoxi Jia, Prateek Mittal

Abstract: Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowad… ▽ More Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data. △ Less

Submitted 25 November, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: NeurIPS 2023 Spotlight

arXiv:2308.13298 [pdf, other]

Federated Linear Bandit Learning via Over-the-Air Computation

Authors: Jiali Wang, Yuning Jiang, Xin Liu, Ting Wang, Yuanming Shi

Abstract: In this paper, we investigate federated contextual linear bandit learning within a wireless system that comprises a server and multiple devices. Each device interacts with the environment, selects an action based on the received reward, and sends model updates to the server. The primary objective is to minimize cumulative regret across all devices within a finite time horizon. To reduce the commun… ▽ More In this paper, we investigate federated contextual linear bandit learning within a wireless system that comprises a server and multiple devices. Each device interacts with the environment, selects an action based on the received reward, and sends model updates to the server. The primary objective is to minimize cumulative regret across all devices within a finite time horizon. To reduce the communication overhead, devices communicate with the server via over-the-air computation (AirComp) over noisy fading channels, where the channel noise may distort the signals. In this context, we propose a customized federated linear bandits scheme, where each device transmits an analog signal, and the server receives a superposition of these signals distorted by channel noise. A rigorous mathematical analysis is conducted to determine the regret bound of the proposed scheme. Both theoretical analysis and numerical experiments demonstrate the competitive performance of our proposed scheme in terms of regret bounds in various settings. △ Less

Submitted 28 August, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.10113 [pdf, other]

Modeling Random Networks with Heterogeneous Reciprocity

Authors: Daniel Cirkovic, Tiandong Wang

Abstract: Reciprocity, or the tendency of individuals to mirror behavior, is a key measure that describes information exchange in a social network. Users in social networks tend to engage in different levels of reciprocal behavior. Differences in such behavior may indicate the existence of communities that reciprocate links at varying rates. In this paper, we develop methodology to model the diverse recipro… ▽ More Reciprocity, or the tendency of individuals to mirror behavior, is a key measure that describes information exchange in a social network. Users in social networks tend to engage in different levels of reciprocal behavior. Differences in such behavior may indicate the existence of communities that reciprocate links at varying rates. In this paper, we develop methodology to model the diverse reciprocal behavior in growing social networks. In particular, we present a preferential attachment model with heterogeneous reciprocity that imitates the attraction users have for popular users, plus the heterogeneous nature by which they reciprocate links. We compare Bayesian and frequentist model fitting techniques for large networks, as well as computationally efficient variational alternatives. Cases where the number of communities are known and unknown are both considered. We apply the presented methods to the analysis of a Facebook wallpost network where users have non-uniform reciprocal behavior patterns. The fitted model captures the heavy-tailed nature of the empirical degree distributions in the Facebook data and identifies multiple groups of users that differ in their tendency to reply to and receive responses to wallposts. △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2306.15163 [pdf, other]

Wasserstein Generative Regression

Authors: Shanshan Song, Tong Wang, Guohao Shen, Yuanyuan Lin, Jian Huang

Abstract: In this paper, we propose a new and unified approach for nonparametric regression and conditional distribution learning. Our approach simultaneously estimates a regression function and a conditional generator using a generative learning framework, where a conditional generator is a function that can generate samples from a conditional distribution. The main idea is to estimate a conditional genera… ▽ More In this paper, we propose a new and unified approach for nonparametric regression and conditional distribution learning. Our approach simultaneously estimates a regression function and a conditional generator using a generative learning framework, where a conditional generator is a function that can generate samples from a conditional distribution. The main idea is to estimate a conditional generator that satisfies the constraint that it produces a good regression function estimator. We use deep neural networks to model the conditional generator. Our approach can handle problems with multivariate outcomes and covariates, and can be used to construct prediction intervals. We provide theoretical guarantees by deriving non-asymptotic error bounds and the distributional consistency of our approach under suitable assumptions. We also perform numerical experiments with simulated and real data to demonstrate the effectiveness and superiority of our approach over some existing approaches in various scenarios. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 50 pages, including appendix. 5 figures and 6 tables in the main text. 1 figure and 7 tables in the appendix

MSC Class: 62G08; 68T07

arXiv:2306.10475 [pdf, other]

SpreadDetect: Detection of spreading change in a network over time

Authors: Hanqing Cai, Tengyao Wang

Abstract: Change-point analysis has been successfully applied to the detect changes in multivariate data streams over time. In many applications, when data are observed over a graph/network, change does not occur simultaneously but instead spread from an initial source coordinate to the neighbouring coordinates over time. We propose a new method, SpreadDetect, that estimates both the source coordinate and t… ▽ More Change-point analysis has been successfully applied to the detect changes in multivariate data streams over time. In many applications, when data are observed over a graph/network, change does not occur simultaneously but instead spread from an initial source coordinate to the neighbouring coordinates over time. We propose a new method, SpreadDetect, that estimates both the source coordinate and the initial timepoint of change in such a setting. We prove that under appropriate conditions, the SpreadDetect algorithm consistently estimates both the source coordinate and the timepoint of change and that the minimal signal size detectable by the algorithm is minimax optimal. The practical utility of the algorithm is demonstrated through numerical experiments and a COVID-19 real dataset. △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: 26 pages,3 figures, 2 tables

arXiv:2306.08280 [pdf, other]

Differentially Private Wireless Federated Learning Using Orthogonal Sequences

Authors: Xizixiang Wei, Tianhao Wang, Ruiquan Huang, Cong Shen, Jing Yang, H. Vincent Poor

Abstract: We propose a privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the perspective of communication designs, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we… ▽ More We propose a privacy-preserving uplink over-the-air computation (AirComp) method, termed FLORAS, for single-input single-output (SISO) wireless federated learning (FL) systems. From the perspective of communication designs, FLORAS eliminates the requirement of channel state information at the transmitters (CSIT) by leveraging the properties of orthogonal sequences. From the privacy perspective, we prove that FLORAS offers both item-level and client-level differential privacy (DP) guarantees. Moreover, by properly adjusting the system parameters, FLORAS can flexibly achieve different DP levels at no additional cost. A new FL convergence bound is derived which, combined with the privacy guarantees, allows for a smooth tradeoff between the achieved convergence rate and differential privacy levels. Experimental results demonstrate the advantages of FLORAS compared with the baseline AirComp method, and validate that the analytical results can guide the design of privacy-preserving FL with different tradeoff requirements on the model convergence and privacy levels. △ Less

Submitted 21 November, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 33 pages, 5 figures

arXiv:2305.18987 [pdf, other]

Robust mean change point testing in high-dimensional data with heavy tails

Authors: Mengchu Li, Yudong Chen, Tengyao Wang, Yi Yu

Abstract: We study a mean change point testing problem for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $\ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and… ▽ More We study a mean change point testing problem for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $\ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and propose novel testing procedures that attain optimal rates in each of the four regimes up to a poly-iterated logarithmic factor. By comparing with previous results under Gaussian assumptions, our results quantify the costs of heavy-tailedness on the fundamental difficulty of change point testing problems for high-dimensional data. To be specific, when the error vectors follow sub-Weibull distributions, a CUSUM-type statistic is shown to achieve a minimax testing rate up to $\sqrt{\log\log(8n)}$. When the error distributions have polynomially-decaying tails, admitting bounded $α$-th moments for some $α\geq 4$, we introduce a median-of-means-type test statistic that achieves a near-optimal testing rate in both dense and sparse regimes. In particular, in the sparse regime, we further propose a computationally-efficient test to achieve the exact optimality. Surprisingly, our investigation in the even more challenging case of $2 \leq α< 4$, unveils a new phenomenon that the minimax testing rate has no sparse regime, i.e.\ testing sparse changes is information-theoretically as hard as testing dense changes. This phenomenon implies a phase transition of the minimax testing rates at $α= 4$. △ Less

Submitted 17 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: 51 pages, 1 figure

arXiv:2305.17284 [pdf, other]

GC-Flow: A Graph-Based Flow Network for Effective Clustering

Authors: Tianchun Wang, Farzaneh Mirzazadeh, Xiang Zhang, Jie Chen

Abstract: Graph convolutional networks (GCNs) are \emph{discriminative models} that directly model the class posterior $p(y|\mathbf{x})$ for semi-supervised classification of graph data. While being effective, as a representation learning approach, the node representations extracted from a GCN often miss useful information for effective clustering, because the objectives are different. In this work, we desi… ▽ More Graph convolutional networks (GCNs) are \emph{discriminative models} that directly model the class posterior $p(y|\mathbf{x})$ for semi-supervised classification of graph data. While being effective, as a representation learning approach, the node representations extracted from a GCN often miss useful information for effective clustering, because the objectives are different. In this work, we design normalizing flows that replace GCN layers, leading to a \emph{generative model} that models both the class conditional likelihood $p(\mathbf{x}|y)$ and the class prior $p(y)$. The resulting neural network, GC-Flow, retains the graph convolution operations while being equipped with a Gaussian mixture representation space. It enjoys two benefits: it not only maintains the predictive power of GCN, but also produces well-separated clusters, due to the structuring of the representation space. We demonstrate these benefits on a variety of benchmark data sets. Moreover, we show that additional parameterization, such as that on the adjacency matrix used for graph convolutions, yields additional improvement in clustering. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: ICML 2023. Code is available at https://github.com/xztcwang/GCFlow

arXiv:2305.11509 [pdf, other]

From Random Search to Bandit Learning in Metric Measure Spaces

Authors: Chuying Han, Yasong Feng, Tianyu Wang

Abstract: Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describe… ▽ More Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a probability measure, and show that under mild conditions, BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance. △ Less

Submitted 12 February, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.06465 [pdf, other]

Occam Factor for Random Graphs: Erdös-Rényi, Independent Edge, and Rank-1 Stochastic Blockmodel

Authors: Tianyu Wang, Zachary M. Pisano, Carey E. Priebe

Abstract: We investigate the evidence/flexibility (i.e., "Occam") paradigm and demonstrate the theoretical and empirical consistency of Bayesian evidence for the task of determining an appropriate generative model for network data. This model selection framework involves determining a collection of candidate models, equipping each of these models' parameters with prior distributions derived via the encompas… ▽ More We investigate the evidence/flexibility (i.e., "Occam") paradigm and demonstrate the theoretical and empirical consistency of Bayesian evidence for the task of determining an appropriate generative model for network data. This model selection framework involves determining a collection of candidate models, equipping each of these models' parameters with prior distributions derived via the encompassing priors method, and computing or approximating each models' evidence. We demonstrate how such a criterion may be used to select the most suitable model among the Erdös-Rényi (ER) model, independent edge (IE) model, and rank-1 stochastic blockmodel (SBM). The Erdös-Rényi may be considered as being linearly nested within IE, a fact which permits exponential family results. The rank-1 SBM is not so ideal, so we propose a numerical method to approximate its evidence. We apply this paradigm to brain connectome data. Future work necessitates deriving and equipping additional candidate random graph models with appropriate priors so they may be included in the paradigm. △ Less

Submitted 7 May, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2305.00054 [pdf, other]

LAVA: Data Valuation without Pre-Specified Learning Algorithms

Authors: Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming Jin, Ruoxi Jia

Abstract: Traditionally, data valuation (DV) is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many DV use cases, such as setting priorities over different data sources in a data acquisit… ▽ More Traditionally, data valuation (DV) is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many DV use cases, such as setting priorities over different data sources in a data acquisition process and informing pricing mechanisms in a data marketplace. In these scenarios, data needs to be valued before the actual analysis and the choice of the learning algorithm is still undetermined then. Another side-effect of the dependence is that to assess the value of individual points, one needs to re-run the learning algorithm with and without a point, which incurs a large computation burden. This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. Our main results are as follows. (1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions. (2) We develop a novel method to value individual data based on the sensitivity analysis of the class-wise Wasserstein distance. Importantly, these values can be directly obtained for free from the output of off-the-shelf optimization solvers when computing the distance. (3) We evaluate our new data valuation framework over various use cases related to detecting low-quality data and show that, surprisingly, the learning-agnostic feature of our framework enables a significant improvement over SOTA performance while being orders of magnitude faster. △ Less

Submitted 19 December, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

Comments: ICLR 2023 Spotlight Latest Updated Version: 2023/12/19

arXiv:2304.09154 [pdf, other]

Sharp-SSL: Selective high-dimensional axis-aligned random projections for semi-supervised learning

Authors: Tengyao Wang, Edgar Dobriban, Milana Gataric, Richard J. Samworth

Abstract: We propose a new method for high-dimensional semi-supervised learning problems based on the careful aggregation of the results of a low-dimensional procedure applied to many axis-aligned random projections of the data. Our primary goal is to identify important variables for distinguishing between the classes; existing low-dimensional methods can then be applied for final class assignment. Motivate… ▽ More We propose a new method for high-dimensional semi-supervised learning problems based on the careful aggregation of the results of a low-dimensional procedure applied to many axis-aligned random projections of the data. Our primary goal is to identify important variables for distinguishing between the classes; existing low-dimensional methods can then be applied for final class assignment. Motivated by a generalized Rayleigh quotient, we score projections according to the traces of the estimated whitened between-class covariance matrices on the projected data. This enables us to assign an importance weight to each variable for a given projection, and to select our signal variables by aggregating these weights over high-scoring projections. Our theory shows that the resulting Sharp-SSL algorithm is able to recover the signal coordinates with high probability when we aggregate over sufficiently many random projections and when the base procedure estimates the whitened between-class covariance matrix sufficiently well. The Gaussian EM algorithm is a natural choice as a base procedure, and we provide a new analysis of its performance in semi-supervised settings that controls the parameter estimation error in terms of the proportion of labeled data in the sample. Numerical results on both simulated data and a real colon tumor dataset support the excellent empirical performance of the method. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 49 pages, 4 figures

MSC Class: 62H30

arXiv:2304.04258 [pdf, ps, other]

A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms"

Authors: Jiachen T. Wang, Ruoxi Jia

Abstract: Data valuation is a growing research field that studies the influence of individual data points for machine learning (ML) models. Data Shapley, inspired by cooperative game theory and economics, is an effective method for data valuation. However, it is well-known that the Shapley value (SV) can be computationally expensive. Fortunately, Jia et al. (2019) showed that for K-Nearest Neighbors (KNN) m… ▽ More Data valuation is a growing research field that studies the influence of individual data points for machine learning (ML) models. Data Shapley, inspired by cooperative game theory and economics, is an effective method for data valuation. However, it is well-known that the Shapley value (SV) can be computationally expensive. Fortunately, Jia et al. (2019) showed that for K-Nearest Neighbors (KNN) models, the computation of Data Shapley is surprisingly simple and efficient. In this note, we revisit the work of Jia et al. (2019) and propose a more natural and interpretable utility function that better reflects the performance of KNN models. We derive the corresponding calculation procedure for the Data Shapley of KNN classifiers/regressors with the new utility functions. Our new approach, dubbed soft-label KNN-SV, achieves the same time complexity as the original method. We further provide an efficient approximation algorithm for soft-label KNN-SV based on locality sensitive hashing (LSH). Our experimental results demonstrate that Soft-label KNN-SV outperforms the original method on most datasets in the task of mislabeled data detection, making it a better baseline for future work on data valuation. △ Less

Submitted 25 November, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: Technical Note

arXiv:2302.14275 [pdf, other]

Self-normalized score-based tests to detect parameter heterogeneity for mixed models

Authors: Ting Wang, Edgar Merkle

Abstract: Score-based tests have been used to study parameter heterogeneity across many types of statistical models. This chapter describes a new self-normalization approach for score-based tests of mixed models, which addresses situations where there is dependence between scores. This differs from the traditional score-based tests, which require independence of scores. We first review traditional score-bas… ▽ More Score-based tests have been used to study parameter heterogeneity across many types of statistical models. This chapter describes a new self-normalization approach for score-based tests of mixed models, which addresses situations where there is dependence between scores. This differs from the traditional score-based tests, which require independence of scores. We first review traditional score-based tests and then propose a new, self-normalized statistic that is related to previous work by Shao and Zhang (2010) and Zhang, Shao, Hayhoe, and Wuebbles (2011). We then provide simulation studies that demonstrate how traditional score-based tests can fail when scores are dependent, and that also demonstrate the good performance of the self-normalized tests. Next, we illustrate how the statistics can be used with real data. Finally, we discuss the potential broad application of self-normalized, score-based tests in mixed models and other models with dependent observations. △ Less

Submitted 11 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.11431 [pdf, ps, other]

A Note on "Towards Efficient Data Valuation Based on the Shapley Value''

Authors: Jiachen T. Wang, Ruoxi Jia

Abstract: The Shapley value (SV) has emerged as a promising method for data valuation. However, computing or estimating the SV is often computationally expensive. To overcome this challenge, Jia et al. (2019) propose an advanced SV estimation algorithm called ``Group Testing-based SV estimator'' which achieves favorable asymptotic sample complexity. In this technical note, we present several improvements in… ▽ More The Shapley value (SV) has emerged as a promising method for data valuation. However, computing or estimating the SV is often computationally expensive. To overcome this challenge, Jia et al. (2019) propose an advanced SV estimation algorithm called ``Group Testing-based SV estimator'' which achieves favorable asymptotic sample complexity. In this technical note, we present several improvements in the analysis and design choices of this SV estimator. Moreover, we point out that the Group Testing-based SV estimator does not fully reuse the collected samples. Our analysis and insights contribute to a better understanding of the challenges in developing efficient SV estimation algorithms for data valuation. △ Less

Submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.07348 [pdf, other]

Cliff-Learning

Authors: Tony T. Wang, Igor Zablotchi, Nir Shavit, Jonathan S. Rosenfeld

Abstract: We study the data-scaling of transfer learning from foundation models in the low-downstream-data regime. We observe an intriguing phenomenon which we call cliff-learning. Cliff-learning refers to regions of data-scaling laws where performance improves at a faster than power law rate (i.e. regions of concavity on a log-log scaling plot). We conduct an in-depth investigation of foundation-model clif… ▽ More We study the data-scaling of transfer learning from foundation models in the low-downstream-data regime. We observe an intriguing phenomenon which we call cliff-learning. Cliff-learning refers to regions of data-scaling laws where performance improves at a faster than power law rate (i.e. regions of concavity on a log-log scaling plot). We conduct an in-depth investigation of foundation-model cliff-learning and study toy models of the phenomenon. We observe that the degree of cliff-learning reflects the degree of compatibility between the priors of a learning algorithm and the task being learned. △ Less

Submitted 6 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 16 pages; v2 updates: improved layout, added acknowledgements

Showing 1–50 of 210 results for author: Wang, T