Skip to main content

Showing 1–50 of 56 results for author: Su, W J

.
  1. arXiv:2406.11050  [pdf, other

    cs.CL cs.AI

    A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

    Authors: Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth

    Abstract: This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syll… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Codes are open-sourced at https://github.com/bowen-upenn/llm_token_bias

  2. arXiv:2406.05372  [pdf, ps, other

    stat.ML cs.LG

    Bridging the Gap: Rademacher Complexity in Robust and Standard Generalization

    Authors: Jiancong Xiao, Ruoyu Sun, Qi Long, Weijie J. Su

    Abstract: Training Deep Neural Networks (DNNs) with adversarial examples often results in poor generalization to test-time adversarial data. This paper investigates this issue, known as adversarially robust generalization, through the lens of Rademacher complexity. Building upon the studies by Khim and Loh (2018); Yin et al. (2019), numerous works have been dedicated to this problem, yet achieving a satisfa… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: COLT 2024

  3. arXiv:2406.03341  [pdf, other

    cs.LG cs.AI stat.AP stat.ME stat.ML

    Tackling GenAI Copyright Issues: Originality Estimation and Genericization

    Authors: Hiroaki Chiba-Okabe, Weijie J. Su

    Abstract: The rapid progress of generative AI technology has sparked significant copyright concerns, leading to numerous lawsuits filed against AI developers. While some studies explore methods to mitigate copyright risks by steering the outputs of generative models away from those resembling copyrighted data, little attention has been paid to the question of how much of a resemblance is undesirable; more o… ▽ More

    Submitted 20 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures

  4. arXiv:2406.00252  [pdf, other

    cs.AI cs.CL cs.CV cs.MA

    Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

    Authors: Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick

    Abstract: Rationality is the quality of being guided by reason, characterized by logical thinking and decision-making that align with evidence and logical rules. This quality is essential for effective problem-solving, as it ensures that solutions are well-founded and systematically derived. Despite the advancements of large language models (LLMs) in generating human-like text with remarkable accuracy, they… ▽ More

    Submitted 18 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  5. arXiv:2405.19524  [pdf, other

    cs.CR cs.AI

    AI Risk Management Should Incorporate Both Safety and Security

    Authors: Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

    Abstract: The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2405.16455  [pdf, other

    stat.ML cs.LG stat.ME

    On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

    Authors: Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, Weijie J. Su

    Abstract: Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that reinforcement learning from human feedback (RLHF) -- the predominant approach for aligning LLMs with human preferences through a reward model -- suffers from an inherent algorithmic bias due to its K… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  7. arXiv:2405.08920  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

    Authors: Chendi Wang, Yuqing Zhu, Weijie J. Su, Yu-Xiang Wang

    Abstract: A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in… ▽ More

    Submitted 16 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: To appear in ICML 2024

  8. arXiv:2404.13964  [pdf, other

    cs.LG econ.GN stat.ME

    An Economic Solution to Copyright Challenges of Generative AI

    Authors: Jiachen T. Wang, Zhun Deng, Hiroaki Chiba-Okabe, Boaz Barak, Weijie J. Su

    Abstract: Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their cont… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  9. arXiv:2404.01245  [pdf, other

    math.ST cs.CL cs.CR cs.LG stat.ML

    A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

    Authors: Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

    Abstract: Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical effi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  10. arXiv:2403.05006  [pdf, ps, other

    cs.LG cs.AI stat.ME stat.ML

    Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

    Authors: Huiying Zhong, Zhun Deng, Weijie J. Su, Zhiwei Steven Wu, Linjun Zhang

    Abstract: Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how trad… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  11. arXiv:2401.01623  [pdf, other

    cs.AI cs.CL

    Can AI Be as Creative as Humans?

    Authors: Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi

    Abstract: Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the da… ▽ More

    Submitted 25 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: The paper examines AI's creativity, introducing Relative and Statistical Creativity for theoretical and practical analysis, along with practical training guidelines. Project Page: ai-relative-creativity.github.io

  12. arXiv:2310.19973  [pdf, other

    stat.ML cs.CR cs.LG math.ST stat.ME

    Unified Enhancement of Privacy Bounds for Mixture Mechanisms via $f$-Differential Privacy

    Authors: Chendi Wang, Buxin Su, Jiayuan Ye, Reza Shokri, Weijie J. Su

    Abstract: Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy… ▽ More

    Submitted 1 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

  13. arXiv:2307.02792  [pdf, other

    cs.CY cs.AI cs.CL

    What Should Data Science Education Do with Large Language Models?

    Authors: Xinming Tu, James Zou, Weijie J. Su, Linjun Zhang

    Abstract: The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analys… ▽ More

    Submitted 7 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

  14. arXiv:2306.05734  [pdf, other

    cs.LG cs.CR cs.DS

    DP-HyPO: An Adaptive Private Hyperparameter Optimization Framework

    Authors: Hua Wang, Sheng Gao, Huanyu Zhang, Weijie J. Su, Milan Shen

    Abstract: Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to a… ▽ More

    Submitted 26 November, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  15. arXiv:2305.17608  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Reward Collapse in Aligning Large Language Models

    Authors: Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

    Abstract: The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences, which are often represented as rankings of responses to prompts. In this paper, we document the phenomenon of \textit{reward collapse}, an empirical observation where the prevailing ranking-based approach results i… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  16. arXiv:2305.17490  [pdf, other

    stat.ML cs.LG

    The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

    Authors: Lei Wu, Weijie J. Su

    Abstract: In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of {\em dynamical stability} (Wu et al., 2018). We start by revising existing stability analyses of SGD, showing how the Frobenius norm and trace of Hessian relate to different notions of stability. Notably, if a global minimum is linearly stable for SGD, then the trace of Hessian must be less… ▽ More

    Submitted 1 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: ICML 2023 camera ready

  17. arXiv:2304.11160  [pdf, other

    math.ST cs.GT cs.LG econ.TH stat.ME

    The Isotonic Mechanism for Exponential Family Estimation

    Authors: Yuling Yan, Weijie J. Su, Jianqing Fan

    Abstract: In 2023, the International Conference on Machine Learning (ICML) required authors with multiple submissions to rank their submissions based on perceived quality. In this paper, we aim to employ these author-specified rankings to enhance peer review in machine learning and artificial intelligence conferences by extending the Isotonic Mechanism to exponential family distributions. This mechanism gen… ▽ More

    Submitted 2 October, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

  18. arXiv:2210.17020  [pdf, other

    cs.LG cs.AI cs.CV cs.IT stat.ML

    A Law of Data Separation in Deep Learning

    Authors: Hangfeng He, Weijie J. Su

    Abstract: While deep learning has enabled significant advances in many areas of science, its black-box nature hinders architecture design for future artificial intelligence applications and interpretation for high-stakes decision makings. We addressed this issue by studying the fundamental question of how deep neural networks process data in the intermediate layers. Our finding is a simple and quantitative… ▽ More

    Submitted 10 August, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted at PNAS

  19. arXiv:2209.14501  [pdf, other

    quant-ph cs.DS cs.LG math.OC

    On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks

    Authors: Yizhou Liu, Weijie J. Su, Tongyang Li

    Abstract: Classical algorithms are often not effective for solving nonconvex optimization problems where local minima are separated by high barriers. In this paper, we explore possible quantum speedups for nonconvex optimization by leveraging the global effect of quantum tunneling. Specifically, we introduce a quantum algorithm termed the quantum tunneling walk (QTW) and apply it to nonconvex problems where… ▽ More

    Submitted 22 May, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: 89 pages, 19 figures (full version)

    Journal ref: Quantum 7, 1030 (2023)

  20. arXiv:2206.08149  [pdf, ps, other

    cs.LG cs.GT econ.TH math.ST stat.ME

    A Truthful Owner-Assisted Scoring Mechanism

    Authors: Weijie J. Su

    Abstract: Alice (owner) has knowledge of the underlying quality of her items measured in grades. Given the noisy grades provided by an independent party, can Bob (appraiser) obtain accurate estimates of the ground-truth grades of the items by asking Alice a question about the grades? We address this when the payoff to Alice is additive convex utility over all her items. We establish that if Alice has to tru… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: A (significantly) extended version of arXiv: 2110.14802

  21. arXiv:2206.04236  [pdf, other

    cs.CR cs.DS cs.LG stat.ML

    Analytical Composition of Differential Privacy via the Edgeworth Accountant

    Authors: Hua Wang, Sheng Gao, Huanyu Zhang, Milan Shen, Weijie J. Su

    Abstract: Many modern machine learning algorithms are composed of simple private algorithms; thus, an increasingly important problem is to efficiently compute the overall privacy loss under composition. In this study, we introduce the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees of private algorithms. The Edgeworth Accountant starts by losslessly tracking the pri… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  22. arXiv:2206.02792  [pdf, other

    cs.LG cs.AI cs.CV cs.CY stat.ML

    FIFA: Making Fairness More Generalizable in Classifiers Trained on Imbalanced Data

    Authors: Zhun Deng, Jiayao Zhang, Linjun Zhang, Ting Ye, Yates Coley, Weijie J. Su, James Zou

    Abstract: Algorithmic fairness plays an important role in machine learning and imposing fairness constraints during learning is a common approach. However, many datasets are imbalanced in certain label classes (e.g. "healthy") and sensitive subgroups (e.g. "older patients"). Empirically, this imbalance leads to a lack of generalizability not only of classification, but also of fairness properties, especiall… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  23. arXiv:2202.00436  [pdf, other

    cs.CL cs.AI cs.LG stat.AP

    ROCK: Causal Inference Principles for Reasoning about Commonsense Causality

    Authors: Jiayao Zhang, Hongming Zhang, Weijie J. Su, Dan Roth

    Abstract: Commonsense causality reasoning (CCR) aims at identifying plausible causes and effects in natural language descriptions that are deemed reasonable by an average person. Although being of great academic and practical interest, this problem is still shadowed by the lack of a well-posed theoretical framework; existing work usually relies on deep language models wholeheartedly, and is potentially susc… ▽ More

    Submitted 17 June, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: To appear, ICML 2022

  24. arXiv:2112.09741  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech cs.CV stat.ML

    Neurashed: A Phenomenological Model for Imitating Deep Learning Training

    Authors: Weijie J. Su

    Abstract: To advance deep learning methodologies in the next decade, a theoretical framework for reasoning about modern neural networks is needed. While efforts are increasing toward demystifying why deep learning is so effective, a comprehensive picture remains lacking, suggesting that a better theory is possible. We argue that a future deep learning theory should inherit three characteristics: a \textit{h… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

    Comments: 8 pages

  25. arXiv:2110.14802  [pdf, ps, other

    cs.LG cs.GT math.OC stat.ME stat.ML

    You Are the Best Reviewer of Your Own Papers: An Owner-Assisted Scoring Mechanism

    Authors: Weijie J. Su

    Abstract: I consider a setting where reviewers offer very noisy scores for several items for the selection of high-quality ones (e.g., peer review of large conference proceedings), whereas the owner of these items knows the true underlying scores but prefers not to provide this information. To address this withholding of information, in this paper, I introduce the Isotonic Mechanism, a simple and efficient… ▽ More

    Submitted 16 June, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: Corrected typos and added a reference

  26. arXiv:2110.05960  [pdf, other

    cs.LG cs.CV stat.ML

    Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

    Authors: Jiayao Zhang, Hua Wang, Weijie J. Su

    Abstract: Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent? In this study, we model the evolution of features during deep learning training using a set… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  27. arXiv:2110.02796  [pdf, other

    cs.LG stat.ML

    An Unconstrained Layer-Peeled Perspective on Neural Collapse

    Authors: Wenlong Ji, Yiping Lu, Yiliang Zhang, Zhun Deng, Weijie J. Su

    Abstract: Neural collapse is a highly symmetric geometric pattern of neural networks that emerges during the terminal phase of training, with profound implications on the generalization performance and robustness of the trained networks. To understand how the last-layer features and classifiers exhibit this recently discovered implicit bias, in this paper, we introduce a surrogate model called the unconstra… ▽ More

    Submitted 24 April, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  28. arXiv:2105.14095  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Weighted Training for Cross-Task Learning

    Authors: Shuxiao Chen, Koby Crammer, Hangfeng He, Dan Roth, Weijie J. Su

    Abstract: In this paper, we introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning based on minimizing a representation-based task distance between the source and target tasks. We show that TAWT is easy to implement, is computationally efficient, requires little hyperparameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. The effectiveness o… ▽ More

    Submitted 1 March, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Published as a conference paper at ICLR 2022

  29. arXiv:2105.13302  [pdf, other

    math.ST cs.IT cs.LG eess.SP stat.ML

    Characterizing the SLOPE Trade-off: A Variational Perspective and the Donoho-Tanner Limit

    Authors: Zhiqi Bu, Jason Klusowski, Cynthia Rush, Weijie J. Su

    Abstract: Sorted l1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this paper, we study how this relatively new regularization technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion… ▽ More

    Submitted 5 June, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

    Journal ref: Annals of Statistics 2022

  30. arXiv:2105.08233  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Oneshot Differentially Private Top-k Selection

    Authors: Gang Qiao, Weijie J. Su, Li Zhang

    Abstract: Being able to efficiently and accurately select the top-$k$ elements with differential privacy is an integral component of various private data analysis tasks. In this paper, we present the oneshot Laplace mechanism, which generalizes the well-known Report Noisy Max mechanism to reporting noisy top-$k$ elements. We show that the oneshot Laplace mechanism with a noise level of… ▽ More

    Submitted 23 June, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

    Comments: Accepted to ICML 2021

  31. arXiv:2104.01987  [pdf, ps, other

    cs.CR cs.LG math.ST stat.ML

    Rejoinder: Gaussian Differential Privacy

    Authors: Jinshuo Dong, Aaron Roth, Weijie J. Su

    Abstract: In this rejoinder, we aim to address two broad issues that cover most comments made in the discussion. First, we discuss some theoretical aspects of our work and comment on how this work might impact the theoretical foundation of privacy-preserving data analysis. Taking a practical viewpoint, we next discuss how f-differential privacy (f-DP) and Gaussian differential privacy (GDP) can make a diffe… ▽ More

    Submitted 25 June, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: Updated the references. Rejoinder to discussions on Gaussian Differential Privacy, read to the Royal Statistical Society in December 2020

  32. arXiv:2103.08721  [pdf, other

    stat.ML cs.CR cs.IT cs.LG math.ST

    A Central Limit Theorem for Differentially Private Query Answering

    Authors: Jinshuo Dong, Weijie J. Su, Linjun Zhang

    Abstract: Perhaps the single most important use case for differential privacy is to privately answer numerical queries, which is usually achieved by adding noise to the answer vector. The central question, therefore, is to understand which noise distribution optimizes the privacy-accuracy trade-off, especially when the dimension of the answer vector is high. Accordingly, extensive literature has been dedica… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

  33. arXiv:2103.01901  [pdf, ps, other

    stat.ML cs.LG

    A Theorem of the Alternative for Personalized Federated Learning

    Authors: Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su

    Abstract: A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often come from different but not entirely unrelated distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning with a smooth, stron… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: 50 pages (main manuscript: 25 pages, appendices: 25 pages)

  34. arXiv:2102.11158  [pdf, other

    stat.ML cs.AI cs.CR cs.CV cs.LG

    Federated $f$-Differential Privacy

    Authors: Qinqing Zheng, Shuxiao Chen, Qi Long, Weijie J. Su

    Abstract: Federated learning (FL) is a training paradigm where the clients collaboratively learn models by repeatedly sharing information without compromising much on the privacy of their local sensitive data. In this paper, we introduce federated $f$-differential privacy, a new notion specifically tailored to the federated setting, based on the framework of Gaussian differential privacy. Federated $f$-diff… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: Accepted to AISTATS 2021

  35. arXiv:2101.12699  [pdf, other

    cs.LG cs.CV math.OC stat.ML

    Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training

    Authors: Cong Fang, Hangfeng He, Qi Long, Weijie J. Su

    Abstract: In this paper, we introduce the \textit{Layer-Peeled Model}, a nonconvex yet analytically tractable optimization program, in a quest to better understand deep neural networks that are trained for a sufficiently long time. As the name suggests, this new model is derived by isolating the topmost layer from the remainder of the neural network, followed by imposing certain constraints separately on th… ▽ More

    Submitted 8 September, 2021; v1 submitted 29 January, 2021; originally announced January 2021.

    Comments: Accepted at Proceedings of the National Academy of Sciences (PNAS); Changed the title

  36. arXiv:2010.13988  [pdf, other

    cs.LG cs.NE

    Toward Better Generalization Bounds with Locally Elastic Stability

    Authors: Zhun Deng, Hangfeng He, Weijie J. Su

    Abstract: Algorithmic stability is a key characteristic to ensure the generalization ability of a learning algorithm. Among different notions of stability, \emph{uniform stability} is arguably the most popular one, which yields exponential generalization bounds. However, uniform stability only considers the worst-case loss change (or so-called sensitivity) by removing a single data point, which is distribut… ▽ More

    Submitted 13 July, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: Published in ICML 2021

  37. arXiv:2010.11775  [pdf, other

    cs.LG stat.ML

    Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity

    Authors: Shuxiao Chen, Hangfeng He, Weijie J. Su

    Abstract: As a popular approach to modeling the dynamics of training overparametrized neural networks (NNs), the neural tangent kernels (NTK) are known to fall behind real-world NNs in generalization ability. This performance gap is in part due to the \textit{label agnostic} nature of the NTK, which renders the resulting kernel not as \textit{locally elastic} as NNs~\citep{he2019local}. In this paper, we in… ▽ More

    Submitted 29 October, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020 camera ready version, 32 pages, 2 figures, 3 tables

  38. arXiv:2010.11750  [pdf, other

    stat.ML cs.LG

    Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

    Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

    Abstract: The problem of learning one task with samples from another task has received much interest recently. In this paper, we ask a fundamental question: when is combining data from two tasks better than learning one task alone? Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices. However, quantifying such a transfer effect… ▽ More

    Submitted 10 August, 2023; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 64 pages, 6 figures; We thoroughly revised the paper by adding new results and reorganizing the presentation

  39. arXiv:2010.10650  [pdf, other

    cs.LG stat.ML

    Towards Understanding the Dynamics of the First-Order Adversaries

    Authors: Zhun Deng, Hangfeng He, Jiaoyang Huang, Weijie J. Su

    Abstract: An acknowledged weakness of neural networks is their vulnerability to adversarial perturbations to the inputs. To improve the robustness of these models, one of the most popular defense mechanisms is to alternatively maximize the loss over the constrained perturbations (or called adversaries) on the inputs using projected gradient ascent and minimize over weights. In this paper, we analyze the dyn… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  40. arXiv:2007.15346  [pdf, other

    math.ST

    A Power Analysis for Model-X Knockoffs with $\ell_{p}$-Regularized Statistics

    Authors: Asaf Weinstein, Weijie J. Su, Małgorzata Bogdan, Rina F. Barber, Emmanuel J. Candès

    Abstract: Variable selection properties of procedures utilizing penalized-likelihood estimates is a central topic in the study of high dimensional linear regression problems. Existing literature emphasizes the quality of ranking of the variables by such procedures as reflected in the receiver operating characteristic curve or in prediction performance. Specifically, recent works have harnessed modern theory… ▽ More

    Submitted 27 April, 2022; v1 submitted 30 July, 2020; originally announced July 2020.

  41. arXiv:2007.11078  [pdf, other

    math.ST cs.IT

    The Complete Lasso Tradeoff Diagram

    Authors: Hua Wang, Yachong Yang, Zhiqi Bu, Weijie J. Su

    Abstract: A fundamental problem in the high-dimensional regression is to understand the tradeoff between type I and type II errors or, equivalently, false discovery rate (FDR) and power in variable selection. To address this important problem, we offer the first complete tradeoff diagram that distinguishes all pairs of FDR and power that can be asymptotically realized by the Lasso with some choice of its pe… ▽ More

    Submitted 28 October, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: To appear in the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  42. arXiv:2007.00566  [pdf, other

    math.ST cs.IT

    The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions

    Authors: Hua Wang, Yachong Yang, Weijie J. Su

    Abstract: In high-dimensional sparse regression, would increasing the signal-to-noise ratio while fixing the sparsity level always lead to better model selection? For high-dimensional sparse regression problems, surprisingly, in this paper we answer this question in the negative in the regime of linear sparsity for the Lasso method, relying on a new concept we term effect size heterogeneity. Roughly speakin… ▽ More

    Submitted 8 March, 2022; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: To appear in IEEE Transactions on Information Theory

  43. arXiv:2004.06977  [pdf, ps, other

    cs.LG math.AP math.OC stat.ML

    On Learning Rates and Schrödinger Operators

    Authors: Bin Shi, Weijie J. Su, Michael I. Jordan

    Abstract: The learning rate is perhaps the single most important parameter in the training of neural networks and, more broadly, in stochastic (nonconvex) optimization. Accordingly, there are numerous effective, but poorly understood, techniques for tuning the learning rate, including learning rate decay, which starts with a large initial learning rate that is gradually decreased. In this paper, we present… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: 49 pages, 21 figures

  44. arXiv:2003.04493  [pdf, other

    stat.ML cs.AI cs.CR cs.LG stat.ME

    Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion

    Authors: Qinqing Zheng, Jinshuo Dong, Qi Long, Weijie J. Su

    Abstract: Datasets containing sensitive information are often sequentially analyzed by many algorithms. This raises a fundamental question in differential privacy regarding how the overall privacy bound degrades under composition. To address this question, we introduce a family of analytical and sharp privacy bounds under composition using the Edgeworth expansion in the framework of the recently proposed f-… ▽ More

    Submitted 25 March, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

  45. arXiv:1911.11607  [pdf, other

    cs.LG cs.CR stat.ML

    Deep Learning with Gaussian Differential Privacy

    Authors: Zhiqi Bu, Jinshuo Dong, Qi Long, Weijie J. Su

    Abstract: Deep learning models are often trained on datasets that contain sensitive information such as individuals' shopping transactions, personal contacts, and medical records. An increasingly important line of work therefore has sought to train neural networks subject to privacy constraints that are specified by differential privacy or its divergence-based relaxations. These privacy definitions, however… ▽ More

    Submitted 22 July, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: To appear in Harvard Data Science Review

  46. arXiv:1910.06943  [pdf, other

    cs.LG cs.CV stat.ML

    The Local Elasticity of Neural Networks

    Authors: Hangfeng He, Weijie J. Su

    Abstract: This paper presents a phenomenon in neural networks that we refer to as \textit{local elasticity}. Roughly speaking, a classifier is said to be locally elastic if its prediction at a feature vector $\bx'$ is \textit{not} significantly perturbed, after the classifier is updated via stochastic gradient descent at a (labeled) feature vector $\bx$ that is \textit{dissimilar} to $\bx'$ in a certain sen… ▽ More

    Submitted 14 February, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: To appear in ICLR 2020

  47. arXiv:1905.02383  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Gaussian Differential Privacy

    Authors: Jinshuo Dong, Aaron Roth, Weijie J. Su

    Abstract: Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analyzing important primitives like privacy amplification by subsampling. Inspired by the hypothesis test… ▽ More

    Submitted 30 May, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

    Comments: v2 revises introduction, adds discussion and fixes some inconsistencies. v3 fixes typos

  48. arXiv:1902.03694  [pdf, ps, other

    math.OC cs.LG math.NA stat.ML

    Acceleration via Symplectic Discretization of High-Resolution Differential Equations

    Authors: Bin Shi, Simon S. Du, Weijie J. Su, Michael I. Jordan

    Abstract: We study first-order optimization methods obtained by discretizing ordinary differential equations (ODEs) corresponding to Nesterov's accelerated gradient methods (NAGs) and Polyak's heavy-ball method. We consider three discretization schemes: an explicit Euler scheme, an implicit Euler scheme, and a symplectic scheme. We show that the optimization algorithm generated by applying the symplectic sc… ▽ More

    Submitted 4 November, 2019; v1 submitted 10 February, 2019; originally announced February 2019.

    Comments: Published in Neurips 2019

  49. arXiv:1812.08965  [pdf, ps, other

    math.ST

    The FDR-Linking Theorem

    Authors: Weijie J. Su

    Abstract: This paper introduces the \texttt{FDR-linking} theorem, a novel technique for understanding \textit{non-asymptotic} FDR control of the Benjamini--Hochberg (BH) procedure under arbitrary dependence of the $p$-values. This theorem offers a principled and flexible approach to linking all $p$-values and the null $p$-values from the FDR control perspective, suggesting a profound implication that, to a… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

  50. arXiv:1810.08907  [pdf, ps, other

    math.OC cs.LG math.CA math.NA stat.ML

    Understanding the Acceleration Phenomenon via High-Resolution Differential Equations

    Authors: Bin Shi, Simon S. Du, Michael I. Jordan, Weijie J. Su

    Abstract: Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms---Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method---we study an alternative limiting process that yields… ▽ More

    Submitted 1 November, 2018; v1 submitted 21 October, 2018; originally announced October 2018.

    Comments: 82 pages, 11 figures