Skip to main content

Showing 1–50 of 122 results for author: Wu, L

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.12332  [pdf, other

    cs.LG stat.ML

    Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition

    Authors: Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, Danica J. Sutherland

    Abstract: We present a theoretical explanation of the ``grokking'' phenomenon, where a model generalizes long after overfitting,for the originally-studied problem of modular addition. First, we show that early in gradient descent, when the ``kernel regime'' approximately holds, no permutation-equivariant model can achieve small population error on modular addition unless it sees at least a constant fraction… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024

  2. arXiv:2405.20763  [pdf, other

    cs.LG math.OC stat.ML

    Improving Generalization and Convergence by Enhancing Implicit Regularization

    Authors: Mingze Wang, Haotian He, Jinbo Wang, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

    Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 35 pages

  3. arXiv:2404.06391  [pdf, other

    cs.LG stat.ML

    Exploring Neural Network Landscapes: Star-Shaped and Geodesic Connectivity

    Authors: Zhanran Lin, Puheng Li, Lei Wu

    Abstract: One of the most intriguing findings in the structure of neural network landscape is the phenomenon of mode connectivity: For two typical global minima, there exists a path connecting them without barrier. This concept of mode connectivity has played a crucial role in understanding important phenomena in deep learning. In this paper, we conduct a fine-grained analysis of this connectivity phenome… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally

  4. arXiv:2404.00438  [pdf, other

    cs.DC cs.AI cs.LG math.OC stat.ML

    Communication Efficient Distributed Training with Distributed Lion

    Authors: Bo Liu, Lemeng Wu, Lizhang Chen, Kaizhao Liang, Jiaxu Zhu, Chen Liang, Raghuraman Krishnamoorthi, Qiang Liu

    Abstract: The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages on memory, computation, and sample efficiency. In this paper, we introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. Leveraging the sign operator in Lion, our Distributed Lion only requires communicating binary or lower-precision vectors be… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 22 pages

  5. arXiv:2403.16995  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows

    Authors: Shujian Zhang, Lemeng Wu, Chengyue Gong, Xingchao Liu

    Abstract: Recent works have demonstrated success in controlling sentence attributes ($e.g.$, sentiment) and structure ($e.g.$, syntactic structure) based on the diffusion language model. A key component that drives theimpressive performance for generating high-quality samples from noise is iteratively denoise for thousands of steps. While beneficial, the complexity of starting from the noise and the learnin… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024

  6. arXiv:2403.12729  [pdf, other

    stat.ML cs.LG

    Posterior Uncertainty Quantification in Neural Networks using Data Augmentation

    Authors: Luhuan Wu, Sinead Williamson

    Abstract: In this paper, we approach the problem of uncertainty quantification in deep learning through a predictive framework, which captures uncertainty in model parameters by specifying our assumptions about the predictive distribution of unseen future data. Under this view, we show that deep ensembling (Lakshminarayanan et al., 2017) is a fundamentally mis-specified model class, since it assumes that fu… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  7. arXiv:2402.15718  [pdf, other

    stat.ML cs.LG

    A Duality Analysis of Kernel Ridge Regression in the Noiseless Regime

    Authors: Jihao Long, Xiaojun Peng, Lei Wu

    Abstract: In this paper, we conduct a comprehensive analysis of generalization properties of Kernel Ridge Regression (KRR) in the noiseless regime, a scenario crucial to scientific computing, where data are often generated via computer simulations. We prove that KRR can attain the minimax optimal rate, which depends on both the eigenvalue decay of the associated kernel and the relative smoothness of target… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  8. arXiv:2402.07193  [pdf, other

    cs.LG math.OC stat.ML

    Loss Symmetry and Noise Equilibrium of Stochastic Gradient Descent

    Authors: Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu

    Abstract: Symmetries exist abundantly in the loss function of neural networks. We characterize the learning dynamics of stochastic gradient descent (SGD) when exponential symmetries, a broad subclass of continuous symmetries, exist in the loss function. We establish that when gradient noises do not balance, SGD has the tendency to move the model parameters toward a point where noises from different directio… ▽ More

    Submitted 3 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: preprint

  9. arXiv:2311.15221  [pdf, other

    cs.IT cs.LG eess.SP math.OC math.ST stat.ML

    The Local Landscape of Phase Retrieval Under Limited Samples

    Authors: Kaizhao Liu, Zihao Wang, Lei Wu

    Abstract: In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish th… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: 41 pages

  10. arXiv:2310.00692  [pdf, other

    cs.LG stat.ML

    A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent

    Authors: Mingze Wang, Lei Wu

    Abstract: In this paper, we provide a theoretical study of noise geometry for minibatch stochastic gradient descent (SGD), a phenomenon where noise aligns favorably with the geometry of local landscape. We propose two metrics, derived from analyzing how noise influences the loss and subspace projection dynamics, to quantify the alignment strength. We show that for (over-parameterized) linear models and two-… ▽ More

    Submitted 1 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: 30 pages

  11. arXiv:2309.00756  [pdf, other

    stat.AP math.OC

    Learning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in the National Football League

    Authors: Nathan Sandholtz, Lucas Wu, Martin Puterman, Timothy C. Y. Chan

    Abstract: For decades, National Football League (NFL) coaches' observed fourth down decisions have been largely inconsistent with prescriptions based on statistical models. In this paper, we develop a framework to explain this discrepancy using an inverse optimization approach. We model the fourth down decision and the subsequent sequence of plays in a game as a Markov decision process (MDP), the dynamics o… ▽ More

    Submitted 17 July, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: 23 pages, 12 figures

  12. arXiv:2307.04304  [pdf, other

    stat.ME

    Enhancing Treatment Effect Estimation: A Model Robust Approach Integrating Randomized Experiments and External Controls using the Double Penalty Integration Estimator

    Authors: Yuwen Cheng, Lili Wu, Shu Yang

    Abstract: Randomized experiments (REs) are the cornerstone for treatment effect evaluation. However, due to practical considerations, REs may encounter difficulty recruiting sufficient patients. External controls (ECs) can supplement REs to boost estimation efficiency. Yet, there may be incomparability between ECs and concurrent controls (CCs), resulting in misleading treatment effect evaluation. We introdu… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  13. arXiv:2306.17775  [pdf, other

    stat.ML cs.LG q-bio.BM

    Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

    Authors: Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, John P. Cunningham

    Abstract: Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requir… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Code: https://github.com/blt2114/twisted_diffusion_sampler

  14. arXiv:2306.02833  [pdf, ps, other

    stat.ML cs.LG math.ST

    The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces

    Authors: Hongrui Chen, Jihao Long, Lei Wu

    Abstract: In this work, we analyze the learnability of reproducing kernel Hilbert spaces (RKHS) under the $L^\infty$ norm, which is critical for understanding the performance of kernel methods and random feature models in safety- and security-critical applications. Specifically, we relate the $L^\infty$ learnability of a RKHS to the spectrum decay of the associate kernel and both lower bounds and upper boun… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 20 pages

  15. arXiv:2305.19082  [pdf, other

    stat.ML cs.LG math.NA

    Embedding Inequalities for Barron-type Spaces

    Authors: Lei Wu

    Abstract: An important problem in machine learning theory is to understand the approximation and generalization properties of two-layer neural networks in high dimensions. To this end, researchers have introduced the Barron space $\mathcal{B}_s(Ω)$ and the spectral Barron space $\mathcal{F}_s(Ω)$, where the index $s\in [0,\infty)$ indicates the smoothness of functions within these spaces and… ▽ More

    Submitted 27 December, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 11 pages

    MSC Class: 68T07

    Journal ref: Journal of Machine Learning, 2023

  16. arXiv:2305.17490  [pdf, other

    stat.ML cs.LG

    The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

    Authors: Lei Wu, Weijie J. Su

    Abstract: In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of {\em dynamical stability} (Wu et al., 2018). We start by revising existing stability analyses of SGD, showing how the Frobenius norm and trace of Hessian relate to different notions of stability. Notably, if a global minimum is linearly stable for SGD, then the trace of Hessian must be less… ▽ More

    Submitted 1 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: ICML 2023 camera ready

  17. arXiv:2305.08404  [pdf, other

    cs.LG stat.ML

    Theoretical Analysis of Inductive Biases in Deep Convolutional Networks

    Authors: Zihao Wang, Lei Wu

    Abstract: In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of $\mathcal{O}(\log d)$ suffices for deep CNNs to achieve this universality, where $d$ in the input dimension. Additionally, we establish that learning spar… ▽ More

    Submitted 20 January, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: 57 pages

  18. arXiv:2305.05642  [pdf, ps, other

    stat.ML cs.LG

    A duality framework for generalization analysis of random feature models and two-layer neural networks

    Authors: Hongrui Chen, Jihao Long, Lei Wu

    Abstract: We consider the problem of learning functions in the $\mathcal{F}_{p,π}$ and Barron spaces, which are natural function spaces that arise in the high-dimensional analysis of random feature models (RFMs) and two-layer neural networks. Through a duality analysis, we reveal that the approximation and estimation of these spaces can be considered equivalent in a certain sense. This enables us to focus o… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 42 pages

  19. arXiv:2305.02499  [pdf, other

    cs.CL cs.AI cs.CV cs.LG stat.ML

    AutoML-GPT: Automatic Machine Learning with GPT

    Authors: Shujian Zhang, Chengyue Gong, Lemeng Wu, Xingchao Liu, Mingyuan Zhou

    Abstract: AI tasks encompass a wide range of domains and fields. While numerous AI models have been designed for specific tasks and applications, they often require considerable human efforts in finding the right model architecture, optimization algorithm, and hyperparameters. Recent advances in large language models (LLMs) like ChatGPT show remarkable capabilities in various aspects of reasoning, comprehen… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  20. Flexible Seamless 2-in-1 Design with Sample Size Adaptation

    Authors: Runjia Li, Liwen Wu, Rachael Liu, Jianchang Lin

    Abstract: 2-in-1 design (Chen et al. 2018) is becoming popular in oncology drug development, with the flexibility of using different endpoints at different decision time. Based on the observed interim data, sponsors choose either to seamlessly advance a small phase 2 trial to a full-scale confirmatory phase 3 trial with a pre-determined maximum sample size, or to remain in a phase 2 trial. This approach may… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  21. arXiv:2210.10768  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Anytime-valid off-policy inference for contextual bandits

    Authors: Ian Waudby-Smith, Lili Wu, Aaditya Ramdas, Nikos Karampatziakis, Paul Mineiro

    Abstract: Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to actions $A_t$ in an attempt to maximize stochastic rewards $R_t$. This adaptivity raises interesting but hard statistical inference questions, especially counte… ▽ More

    Submitted 17 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 40 pages, 6 figures

  22. arXiv:2208.13065  [pdf, other

    math.OC cs.LG eess.SY stat.AP

    Towards Improving Unit Commitment Economics: An Add-On Tailor for Renewable Energy and Reserve Predictions

    Authors: Xianbang Chen, Yikui Liu, Lei Wu

    Abstract: Generally, day-ahead unit commitment (UC) is conducted in a predict-then-optimize process: it starts by predicting the renewable energy source (RES) availability and system reserve requirements; given the predictions, the UC model is then optimized to determine the economic operation plans. In fact, predictions within the process are raw. In other words, if the predictions are further tailored to… ▽ More

    Submitted 7 July, 2024; v1 submitted 27 August, 2022; originally announced August 2022.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  23. arXiv:2207.02628  [pdf, other

    stat.ML cs.LG

    The alignment property of SGD noise and how it helps select flat minima: A stability analysis

    Authors: Lei Wu, Mingze Wang, Weijie Su

    Abstract: The phenomenon that stochastic gradient descent (SGD) favors flat minima has played a critical role in understanding the implicit regularization of SGD. In this paper, we provide an explanation of this striking phenomenon by relating the particular noise structure of SGD to its \emph{linear stability} (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss.… ▽ More

    Submitted 16 October, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted at NeurIPS 2022

  24. arXiv:2202.08064  [pdf, other

    stat.ML cs.LG cs.NE math.OC

    Learning a Single Neuron for Non-monotonic Activation Functions

    Authors: Lei Wu

    Abstract: We study the problem of learning a single neuron $\mathbf{x}\mapsto σ(\mathbf{w}^T\mathbf{x})$ with gradient descent (GD). All the existing positive results are limited to the case where $σ$ is monotonic. However, it is recently observed that non-monotonic activation functions outperform the traditional monotonic ones in many applications. To fill this gap, we establish learnability without assumi… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: AISTATS 2022

  25. arXiv:2202.01694  [pdf, other

    cs.LG stat.ML

    Variational Nearest Neighbor Gaussian Process

    Authors: Luhuan Wu, Geoff Pleiss, John Cunningham

    Abstract: Variational approximations to Gaussian processes (GPs) typically use a small set of inducing points to form a low-rank approximation to the covariance matrix. In this work, we instead exploit a sparse approximation of the precision matrix. We propose variational nearest neighbor Gaussian process (VNNGP), which introduces a prior that only retains correlations within K nearest-neighboring observati… ▽ More

    Submitted 7 July, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

  26. Incorporating Surrogate Information for Adaptive Subgroup Enrichment Design with Sample Size Re-estimation

    Authors: Liwen Wu, Qing Li, Mengya Liu, Jianchang Lin

    Abstract: Adaptive subgroup enrichment design is an efficient design framework that allows accelerated development for investigational treatments while also having flexibility in population selection within the course of the trial. The adaptive decision at the interim analysis is commonly made based on the conditional probability of trial success. However, one of the critical challenges for such adaptive de… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  27. arXiv:2110.09823  [pdf, other

    cs.LG stat.AP stat.ME

    An Empirical Study: Extensive Deep Temporal Point Process

    Authors: Haitao Lin, Cheng Tan, Lirong Wu, Zhangyang Gao, Stan. Z. Li

    Abstract: Temporal point process as the stochastic process on continuous domain of time is commonly used to model the asynchronous event sequence featuring with occurrence timestamps. Thanks to the strong expressivity of deep neural networks, they are emerging as a promising choice for capturing the patterns in asynchronous sequences, in the context of temporal point process. In this paper, we first review… ▽ More

    Submitted 21 December, 2021; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: 22 pages, 8 figures

  28. arXiv:2110.08850  [pdf

    physics.soc-ph cs.LG cs.SI q-bio.MN stat.ML

    Understanding the network formation pattern for better link prediction

    Authors: Jiating Yu, Ling-Yun Wu

    Abstract: As a classical problem in the field of complex networks, link prediction has attracted much attention from researchers, which is of great significance to help us understand the evolution and dynamic development mechanisms of networks. Although various network type-specific algorithms have been proposed to tackle the link prediction problem, most of them suppose that the network structure is domina… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: 21 pages, 3 figures, 18 tables, and 29 references

    Journal ref: Physica A: Statistical Mechanics and its Applications, 600 (2022) 127522

  29. arXiv:2108.08415  [pdf, other

    stat.ME

    Transfer learning of individualized treatment rules from experimental to real-world data

    Authors: Lili Wu, Shu Yang

    Abstract: Individualized treatment effect lies at the heart of precision medicine. Interpretable individualized treatment rules (ITRs) are desirable for clinicians or policymakers due to their intuitive appeal and transparency. The gold-standard approach to estimating the ITRs is randomized experiments, where subjects are randomized to different treatment groups and the bias is minimized to the extent possi… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

  30. arXiv:2108.04964  [pdf, other

    stat.ML cs.LG

    A spectral-based analysis of the separation between two-layer neural networks and linear methods

    Authors: Lei Wu, Jihao Long

    Abstract: We propose a spectral-based approach to analyze how two-layer neural networks separate from linear methods in terms of approximating high-dimensional functions. We show that quantifying this separation can be reduced to estimating the Kolmogorov width of two-layer neural networks, and the latter can be further characterized by using the spectrum of an associated kernel. Different from previous wor… ▽ More

    Submitted 23 February, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: Accepted by Journal of Machine Learning Research

  31. Interim Monitoring in Sequential Multiple Assignment Randomized Trials

    Authors: Liwen Wu, Junyao Wang, Abdus S. Wahed

    Abstract: A sequential multiple assignment randomized trial (SMART) facilitates comparison of multiple adaptive treatment strategies (ATSs) simultaneously. Previous studies have established a framework to test the homogeneity of multiple ATSs by a global Wald test through inverse probability weighting. SMARTs are generally lengthier than classical clinical trials due to the sequential nature of treatment ra… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

  32. arXiv:2107.11593  [pdf

    econ.GN stat.AP

    Inferring Economic Condition Uncertainty from Electricity Big Data

    Authors: Haoqi Qian, Zhengyu Shi, Libo Wu

    Abstract: Inferring the uncertainty in economic conditions is significant for both decision makers as well as market players. In this paper, we propose a novel approach to measure the economic uncertainties by using the Hidden Markov Model (HMM). We construct a dimensionless index, Economic Condition Uncertainty (ECU) index, which ranges from zero to one and is comparable among sectors, regions and periods.… ▽ More

    Submitted 30 May, 2023; v1 submitted 24 July, 2021; originally announced July 2021.

  33. arXiv:2103.12528  [pdf, other

    cs.CL cs.AI stat.ML

    Multilingual Autoregressive Entity Linking

    Authors: Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

    Abstract: We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem -- the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 20 pages, 8 figures, and 11 tables

  34. arXiv:2103.12345  [pdf, other

    stat.ML cs.LG q-fin.PM

    The Success of AdaBoost and Its Application in Portfolio Management

    Authors: Yijian Chuan, Chaoyi Zhao, Zhenrui He, Lan Wu

    Abstract: We develop a novel approach to explain why AdaBoost is a successful classifier. By introducing a measure of the influence of the noise points (ION) in the training data for the binary classification problem, we prove that there is a strong connection between the ION and the test error. We further identify that the ION of AdaBoost decreases as the iteration number or the complexity of the base lear… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

  35. arXiv:2103.00393  [pdf, other

    cs.LG stat.ML

    Hierarchical Inducing Point Gaussian Process for Inter-domain Observations

    Authors: Luhuan Wu, Andrew Miller, Lauren Anderson, Geoff Pleiss, David Blei, John Cunningham

    Abstract: We examine the general problem of inter-domain Gaussian Processes (GPs): problems where the GP realization and the noisy observations of that realization lie on different domains. When the mapping between those domains is linear, such as integration or differentiation, inference is still closed form. However, many of the scaling and approximation techniques that our community has developed do not… ▽ More

    Submitted 24 June, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

  36. arXiv:2102.08606  [pdf, other

    cs.LG stat.ML

    Centroid Transformers: Learning to Abstract with Attention

    Authors: Lemeng Wu, Xingchao Liu, Qiang Liu

    Abstract: Self-attention, as the key block of transformers, is a powerful mechanism for extracting features from the inputs. In essence, what self-attention does is to infer the pairwise relations between the elements of the inputs, and modify the inputs by propagating information between input pairs. As a result, it maps inputs to N outputs and casts a quadratic $O(N^2)$ memory and time complexity. We prop… ▽ More

    Submitted 8 March, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

  37. arXiv:2102.06695  [pdf, other

    cs.LG stat.ML

    Bias-Free Scalable Gaussian Processes via Randomized Truncations

    Authors: Andres Potapczynski, Luhuan Wu, Dan Biderman, Geoff Pleiss, John P. Cunningham

    Abstract: Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF). We find that both methods introduce a systematic bias on the learned hyperparameters: CG tends to underfit while RFF tends to overfit. We address these issu… ▽ More

    Submitted 28 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Journal ref: 38th International Conference on Machine Learning (ICML 2021)

  38. arXiv:2012.00481  [pdf, other

    cs.LG cs.AI stat.ML

    Consistent Representation Learning for High Dimensional Data Analysis

    Authors: Stan Z. Li, Lirong Wu, Zelin Zang

    Abstract: High dimensional data analysis for exploration and discovery includes three fundamental tasks: dimensionality reduction, clustering, and visualization. When the three associated tasks are done separately, as is often the case thus far, inconsistencies can occur among the tasks in terms of data geometry and others. This can lead to confusing or misleading data interpretation. In this paper, we prop… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  39. arXiv:2011.07518  [pdf, other

    stat.ME

    A robust statistical method for Genome-wide association analysis of human copy number variation

    Authors: Han Wang, Changhu Wang, Linjie Wu, Ruibin Xi

    Abstract: Conducting genome-wide association studies (GWAS) in copy number variation (CNV) level is a field where few people involves and little statistical progresses have been achieved, traditional methods suffer from many problems such as batch effects, heterogeneity across genome, leading to low power or high false discovery rate. We develop a new robust method to find disease-risking regions related to… ▽ More

    Submitted 15 November, 2020; originally announced November 2020.

  40. arXiv:2010.15969  [pdf, other

    cs.LG math.OC stat.ML

    Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

    Authors: Mao Ye, Lemeng Wu, Qiang Liu

    Abstract: Despite the great success of deep learning, recent works show that large deep neural networks are often highly redundant and can be significantly reduced in size. However, the theoretical question of how much we can prune a neural network given a specified tolerance of accuracy drop is still open. This paper provides one answer to this question by proposing a greedy optimization based pruning meth… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Journal ref: NeurIPS 2020

  41. arXiv:2010.14831  [pdf, other

    cs.LG cs.CV cs.HC stat.ML

    Deep Manifold Transformation for Nonlinear Dimensionality Reduction

    Authors: Stan Z. Li, Zelin Zang, Lirong Wu

    Abstract: Manifold learning-based encoders have been playing important roles in nonlinear dimensionality reduction (NLDR) for data exploration. However, existing methods can often fail to preserve geometric, topological and/or distributional structures of data. In this paper, we propose a deep manifold learning framework, called deep manifold transformation (DMT) for unsupervised NLDR and embedding learning… ▽ More

    Submitted 3 May, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

  42. arXiv:2009.10713  [pdf, other

    cs.LG math.NA stat.ML

    Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

    Authors: Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu

    Abstract: The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as… ▽ More

    Submitted 7 December, 2020; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: Review article. Feedback welcome

    MSC Class: 68T07 (primary); 26B40; 41A30; 35Q68

  43. arXiv:2009.09590  [pdf, other

    cs.LG cs.AI stat.ML

    Generalized Clustering and Multi-Manifold Learning with Geometric Structure Preservation

    Authors: Lirong Wu, Zicheng Liu, Zelin Zang, Jun Xia, Siyuan Li, Stan. Z Li

    Abstract: Though manifold-based clustering has become a popular research topic, we observe that one important factor has been omitted by these works, namely that the defined clustering loss may corrupt the local and global structure of the latent space. In this paper, we propose a novel Generalized Clustering and Multi-manifold Learning (GCML) framework with geometric structure preservation for generalized… ▽ More

    Submitted 8 October, 2021; v1 submitted 20 September, 2020; originally announced September 2020.

  44. arXiv:2009.06132  [pdf, ps, other

    cs.LG stat.ML

    Complexity Measures for Neural Networks with General Activation Functions Using Path-based Norms

    Authors: Zhong Li, Chao Ma, Lei Wu

    Abstract: A simple approach is proposed to obtain complexity controls for neural networks with general activation functions. The approach is motivated by approximating the general activation functions with one-dimensional ReLU networks, which reduces the problem to the complexity controls of ReLU networks. Specifically, we consider two-layer networks and deep residual networks, for which path-based norms ar… ▽ More

    Submitted 13 September, 2020; originally announced September 2020.

    Comments: 47 pages

  45. arXiv:2009.06125  [pdf, other

    cs.LG stat.ML

    A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms

    Authors: Chao Ma, Lei Wu, Weinan E

    Abstract: The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations. Three types of qualitative features are observed in the training loss curve: fast initial convergence, oscillations, and large spikes in the late phase. The sign gradient descent (signGD) flow, which is the limit of Adam when taking the learning rate t… ▽ More

    Submitted 29 September, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

  46. arXiv:2008.05621  [pdf, other

    cs.LG stat.ML

    The Slow Deterioration of the Generalization Error of the Random Feature Model

    Authors: Chao Ma, Lei Wu, Weinan E

    Abstract: The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size. This behavior is characterized by the appearance of large generalization gap, and is due to the occurrence of very small eigenvalues for the associated Gram matrix. In this paper, we examine the dynamic behavior of the gradient descent algorithm in this regime. We show… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

  47. arXiv:2007.04649  [pdf, other

    cs.LG stat.ML

    Learning to Reweight with Deep Interactions

    Authors: Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

    Abstract: Recently, the concept of teaching has been introduced into machine learning, in which a teacher model is used to guide the training of a student model (which will be used in real tasks) through data selection, loss function design, etc. Learning to reweight, which is a specific kind of teaching that reweights training data using a teacher model, receives much attention due to its simplicity and ef… ▽ More

    Submitted 12 January, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Accepted to AAAI-2021

  48. arXiv:2007.04395  [pdf, other

    cs.LG cs.AI stat.ML

    Multilevel Graph Matching Networks for Deep Graph Similarity Learning

    Authors: Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X. Liu, Chunming Wu, Shouling Ji

    Abstract: While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to the task of graph similarity learning. Recent work on graph similarity learning has considered either global-level graph-graph interactions or low-level node-node interactions, however ignoring the rich cross-level interactions (e.g., be… ▽ More

    Submitted 7 August, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS)

  49. arXiv:2006.14450  [pdf, other

    cs.LG math.OC stat.ML

    The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

    Authors: Chao Ma, Lei Wu, Weinan E

    Abstract: A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametr… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: 23 pages

  50. Crop Yield Prediction Integrating Genotype and Weather Variables Using Deep Learning

    Authors: Johnathon Shook, Tryambak Gangopadhyay, Linjiang Wu, Baskar Ganapathysubramanian, Soumik Sarkar, Asheesh K. Singh

    Abstract: Accurate prediction of crop yield supported by scientific and domain-relevant insights, can help improve agricultural breeding, provide monitoring across diverse climatic conditions and thereby protect against climatic challenges to crop production including erratic rainfall and temperature variations. We used historical performance records from Uniform Soybean Tests (UST) in North America spannin… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: 18 pages, 9 figures