Search | arXiv e-print repository

Convergence and Complexity Guarantee for Inexact First-order Riemannian Optimization Algorithms

Authors: Yuchen Li, Laura Balzano, Deanna Needell, Hanbaek Lyu

Abstract: We analyze inexact Riemannian gradient descent (RGD) where Riemannian gradients and retractions are inexactly (and cheaply) computed. Our focus is on understanding when inexact RGD converges and what is the complexity in the general nonconvex and constrained setting. We answer these questions in a general framework of tangential Block Majorization-Minimization (tBMM). We establish that tBMM conver… ▽ More We analyze inexact Riemannian gradient descent (RGD) where Riemannian gradients and retractions are inexactly (and cheaply) computed. Our focus is on understanding when inexact RGD converges and what is the complexity in the general nonconvex and constrained setting. We answer these questions in a general framework of tangential Block Majorization-Minimization (tBMM). We establish that tBMM converges to an $ε$-stationary point within $O(ε^{-2})$ iterations. Under a mild assumption, the results still hold when the subproblem is solved inexactly in each iteration provided the total optimality gap is bounded. Our general analysis applies to a wide range of classical algorithms with Riemannian constraints including inexact RGD and proximal gradient method on Stiefel manifolds. We numerically validate that tBMM shows improved performance over existing methods when applied to various problems, including nonnegative tensor decomposition with Riemannian constraints, regularized nonnegative matrix factorization, and low-rank matrix recovery problems. △ Less

Submitted 9 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 23 pages, 5 figures. ICML 2024. Appendix revised

arXiv:2403.06903 [pdf, ps, other]

Benign overfitting in leaky ReLU networks with moderate input dimension

Authors: Kedar Karhadkar, Erin George, Michael Murray, Guido Montúfar, Deanna Needell

Abstract: The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data that can be decomposed into the sum of a common signal and a random noise component, that lie on subspaces orthogonal… ▽ More The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. We consider input data that can be decomposed into the sum of a common signal and a random noise component, that lie on subspaces orthogonal to one another. We characterize conditions on the signal to noise ratio (SNR) of the model parameters giving rise to benign versus non-benign (or harmful) overfitting: in particular, if the SNR is high then benign overfitting occurs, conversely if the SNR is low then harmful overfitting occurs. We attribute both benign and non-benign overfitting to an approximate margin maximization property and show that leaky ReLU networks trained on hinge loss with gradient descent (GD) satisfy this property. In contrast to prior work we do not require the training data to be nearly orthogonal. Notably, for input dimension $d$ and training sample size $n$, while results in prior work require $d = Ω(n^2 \log n)$, here we require only $d = Ω\left(n\right)$. △ Less

Submitted 9 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 39 pages

arXiv:2403.01204 [pdf, ps, other]

Stochastic gradient descent for streaming linear and rectified linear systems with Massart noise

Authors: Halyun Jeong, Deanna Needell, Elizaveta Rebrova

Abstract: We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is t… ▽ More We propose SGD-exp, a stochastic gradient descent approach for linear and ReLU regressions under Massart noise (adversarial semi-random corruption model) for the fully streaming setting. We show novel nearly linear convergence guarantees of SGD-exp to the true parameter with up to $50\%$ Massart corruption rate, and with any corruption rate in the case of symmetric oblivious corruptions. This is the first convergence guarantee result for robust ReLU regression in the streaming setting, and it shows the improved convergence rate over previous robust methods for $L_1$ linear regression due to a choice of an exponentially decaying step size, known for its efficiency in practice. Our analysis is based on the drift analysis of a discrete stochastic process, which could also be interesting on its own. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: Submitted to a journal

MSC Class: 65F10; 60-XX

arXiv:2312.10330 [pdf, other]

Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Authors: Yuchen Li, Laura Balzano, Deanna Needell, Hanbaek Lyu

Abstract: Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing smooth nonconvex objectives, where each parameter block is constrained within a subset of a Riema… ▽ More Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing smooth nonconvex objectives, where each parameter block is constrained within a subset of a Riemannian manifold. We establish that this algorithm converges asymptotically to the set of stationary points, and attains an $ε$-stationary point within $\widetilde{O}(ε^{-2})$ iterations. In particular, the assumptions for our complexity results are completely Euclidean when the underlying manifold is a product of Euclidean or Stiefel manifolds, although our analysis makes explicit use of the Riemannian geometry. Our general analysis applies to a wide range of algorithms with Riemannian constraints: Riemannian MM, block projected gradient descent, optimistic likelihood estimation, geodesically constrained subspace tracking, robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate that our algorithm converges faster than standard Euclidean algorithms applied to the Riemannian setting. △ Less

Submitted 6 August, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

Comments: 54 pages, 8 figures. Related work updated

arXiv:2307.04056 [pdf, other]

Manifold Filter-Combine Networks

Authors: Joyce Chew, Edward De Brouwer, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter

Abstract: We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then co… ▽ More We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific graph constructions), our rate of convergence does not directly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. Additionally, we provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions. △ Less

Submitted 5 September, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

arXiv:2306.04730 [pdf, other]

Stochastic Natural Thresholding Algorithms

Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, Jing Qin

Abstract: Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc… ▽ More Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and discusses convergence guarantees for stochastic natural thresholding algorithms by extending the NT from the deterministic version with linear measurements to the stochastic version with a general objective function. We also conduct various numerical experiments on linear and nonlinear measurements to demonstrate the performance of StoNT. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2304.10123 [pdf, other]

Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints

Authors: Halyun Jeong, Deanna Needell

Abstract: The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure… ▽ More The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure, and optimization with nonconvex constraints. Recently, a hybrid method called Kaczmarz-based IHT (KZIHT) has been proposed, combining the benefits of both approaches, but its theoretical guarantees are missing. In this paper, we provide the first theoretical convergence guarantees for KZIHT by showing that it converges linearly to the solution of a system with sparsity constraints up to optimal statistical bias when the reshuffling data sampling scheme is used. We also propose the Kaczmarz with periodic thresholding (KZPT) method, which generalizes KZIHT by applying the thresholding operation for every certain number of KZ iterations and by employing two different types of step sizes. We establish a linear convergence guarantee for KZPT for randomly subsampled bounded orthonormal systems (BOS) and mean-zero isotropic sub-Gaussian random matrices, which are most commonly used models in compressed sensing, dimension reduction, matrix sketching, and many inverse problems in neural networks. Our analysis shows that KZPT with an optimal thresholding period outperforms KZIHT. To support our theory, we include several numerical experiments. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Submitted to a journal

MSC Class: 65F10; 65F22; 90C26

arXiv:2303.00058 [pdf, other]

Neural Nonnegative Matrix Factorization for Hierarchical Multilayer Topic Modeling

Authors: Tyler Will, Runyu Zhang, Eli Sadovnik, Mengdi Gao, Joshua Vendrow, Jamie Haddock, Denali Molitor, Deanna Needell

Abstract: We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We deriv… ▽ More We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We derive a backpropagation optimization scheme that allows us to frame hierarchical NMF as a neural network. We test Neural NMF on a synthetic hierarchical dataset, the 20 Newsgroups dataset, and the MyLymeData symptoms dataset. Numerical results demonstrate that Neural NMF outperforms other hierarchical NMF methods on these data sets and offers better learned hierarchical structure and interpretability of topics. △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2212.12606 [pdf, other]

A Convergence Rate for Manifold Neural Networks

Authors: Joyce Chew, Deanna Needell, Michael Perlmutter

Abstract: High-dimensional data arises in numerous applications, and the rapidly developing field of geometric deep learning seeks to develop neural network architectures to analyze such data in non-Euclidean domains, such as graphs and manifolds. Recent work by Z. Wang, L. Ruiz, and A. Ribeiro has introduced a method for constructing manifold neural networks using the spectral decomposition of the Laplace… ▽ More High-dimensional data arises in numerous applications, and the rapidly developing field of geometric deep learning seeks to develop neural network architectures to analyze such data in non-Euclidean domains, such as graphs and manifolds. Recent work by Z. Wang, L. Ruiz, and A. Ribeiro has introduced a method for constructing manifold neural networks using the spectral decomposition of the Laplace Beltrami operator. Moreover, in this work, the authors provide a numerical scheme for implementing such neural networks when the manifold is unknown and one only has access to finitely many sample points. The authors show that this scheme, which relies upon building a data-driven graph, converges to the continuum limit as the number of sample points tends to infinity. Here, we build upon this result by establishing a rate of convergence that depends on the intrinsic dimension of the manifold but is independent of the ambient dimension. We also discuss how the rate of convergence depends on the depth of the network and the number of filters used in each layer. △ Less

Submitted 20 July, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

arXiv:2211.13496 [pdf, other]

Multi-scale Hybridized Topic Modeling: A Pipeline for Analyzing Unstructured Text Datasets via Topic Modeling

Authors: Keyi Cheng, Stefan Inzer, Adrian Leung, Xiaoxian Shen, Michael Perlmutter, Michael Lindstrom, Joyce Chew, Todd Presner, Deanna Needell

Abstract: We propose a multi-scale hybridized topic modeling method to find hidden topics from transcribed interviews more accurately and efficiently than traditional topic modeling methods. Our multi-scale hybridized topic modeling method (MSHTM) approaches data at different scales and performs topic modeling in a hierarchical way utilizing first a classical method, Nonnegative Matrix Factorization, and th… ▽ More We propose a multi-scale hybridized topic modeling method to find hidden topics from transcribed interviews more accurately and efficiently than traditional topic modeling methods. Our multi-scale hybridized topic modeling method (MSHTM) approaches data at different scales and performs topic modeling in a hierarchical way utilizing first a classical method, Nonnegative Matrix Factorization, and then a transformer-based method, BERTopic. It harnesses the strengths of both NMF and BERTopic. Our method can help researchers and the public better extract and interpret the interview information. Additionally, it provides insights for new indexing systems based on the topic level. We then deploy our method on real-world interview transcripts and find promising results. △ Less

Submitted 24 November, 2022; originally announced November 2022.

arXiv:2211.05749 [pdf, other]

Sketched Gaussian Model Linear Discriminant Analysis via the Randomized Kaczmarz Method

Authors: Jocelyn T. Chi, Deanna Needell

Abstract: We present sketched linear discriminant analysis, an iterative randomized approach to binary-class Gaussian model linear discriminant analysis (LDA) for very large data. We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access t… ▽ More We present sketched linear discriminant analysis, an iterative randomized approach to binary-class Gaussian model linear discriminant analysis (LDA) for very large data. We harness a least squares formulation and mobilize the stochastic gradient descent framework. Therefore, we obtain a randomized classifier with performance that is very comparable to that of full data LDA while requiring access to only one row of the training data at a time. We present convergence guarantees for the sketched predictions on new data within a fixed number of iterations. These guarantees account for both the Gaussian modeling assumptions on the data and algorithmic randomness from the sketching procedure. Finally, we demonstrate performance with varying step-sizes and numbers of iterations. Our numerical experiments demonstrate that sketched LDA can offer a very viable alternative to full data LDA when the data may be too large for full data analysis. △ Less

Submitted 10 November, 2022; originally announced November 2022.

arXiv:2209.04968 [pdf, other]

Population-Based Hierarchical Non-negative Matrix Factorization for Survey Data

Authors: Xiaofu Ding, Xinyu Dong, Olivia McGough, Chenxin Shen, Annie Ulichney, Ruiyao Xu, William Swartworth, Jocelyn T. Chi, Deanna Needell

Abstract: Motivated by the problem of identifying potential hierarchical population structure on modern survey data containing a wide range of complex data types, we introduce population-based hierarchical non-negative matrix factorization (PHNMF). PHNMF is a variant of hierarchical non-negative matrix factorization based on feature similarity. As such, it enables an automatic and interpretable approach for… ▽ More Motivated by the problem of identifying potential hierarchical population structure on modern survey data containing a wide range of complex data types, we introduce population-based hierarchical non-negative matrix factorization (PHNMF). PHNMF is a variant of hierarchical non-negative matrix factorization based on feature similarity. As such, it enables an automatic and interpretable approach for identifying and understanding hierarchical structure in a data matrix constructed from a wide range of data types. Our numerical experiments on synthetic and real survey data demonstrate that PHNMF can recover latent hierarchical population structure in complex data with high accuracy. Moreover, the recovered subpopulation structure is meaningful and can be useful for improving downstream inference. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:2208.08561 [pdf, other]

Geometric Scattering on Measure Spaces

Authors: Joyce Chew, Matthew Hirn, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter, Holly Steach, Siddharth Viswanath, Hau-Tieng Wu

Abstract: The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and man… ▽ More The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and manifolds, leading to the emerging field of geometric deep learning. In order to improve our understanding of the architectures used in this new field, several papers have proposed generalizations of the scattering transform for non-Euclidean data structures such as undirected graphs and compact Riemannian manifolds without boundary. In this paper, we introduce a general, unified model for geometric scattering on measure spaces. Our proposed framework includes previous work on geometric scattering as special cases but also applies to more general settings such as directed graphs, signed graphs, and manifolds with boundary. We propose a new criterion that identifies to which groups a useful representation should be invariant and show that this criterion is sufficient to guarantee that the scattering transform has desirable stability and invariance properties. Additionally, we consider finite measure spaces that are obtained from randomly sampling an unknown manifold. We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold. Moreover, we use a diffusion-maps based approach to prove quantitative estimates on the rate of convergence of one of these approximations as the number of sample points tends to infinity. Lastly, we showcase the utility of our method on spherical images, directed graphs, and on high-dimensional single-cell data. △ Less

Submitted 13 October, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

MSC Class: 68T07

arXiv:2206.10078 [pdf, other]

The Manifold Scattering Transform for High-Dimensional Point Cloud Data

Authors: Joyce Chew, Holly R. Steach, Siddharth Viswanath, Hau-Tieng Wu, Matthew Hirn, Deanna Needell, Smita Krishnaswamy, Michael Perlmutter

Abstract: The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case… ▽ More The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case of two-dimensional surfaces with predefined meshes. In this work, we present practical schemes, based on the theory of diffusion maps, for implementing the manifold scattering transform to datasets arising in naturalistic systems, such as single cell genetics, where the data is a high-dimensional point cloud modeled as lying on a low-dimensional manifold. We show that our methods are effective for signal classification and manifold classification tasks. △ Less

Submitted 21 January, 2024; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: Accepted for publication in the TAG in DS Workshop at ICML. For subsequent theoretical guarantees, please see Section 6 of arXiv:2208.08561

MSC Class: 68T07 ACM Class: I.2.6

arXiv:2201.13324 [pdf, other]

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Authors: Pengyu Li, Christine Tseng, Yaxuan Zheng, Joyce A. Chew, Longxiu Huang, Benjamin Jarman, Deanna Needell

Abstract: Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we… ▽ More Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF). △ Less

Submitted 31 January, 2022; originally announced January 2022.

Comments: 14 pages, 4 figures

arXiv:2109.14820 [pdf, other]

A Generalized Hierarchical Nonnegative Tensor Decomposition

Authors: Joshua Vendrow, Jamie Haddock, Deanna Needell

Abstract: Nonnegative matrix factorization (NMF) has found many applications including topic modeling and document analysis. Hierarchical NMF (HNMF) variants are able to learn topics at various levels of granularity and illustrate their hierarchical relationship. Recently, nonnegative tensor factorization (NTF) methods have been applied in a similar fashion in order to handle data sets with complex, multi-m… ▽ More Nonnegative matrix factorization (NMF) has found many applications including topic modeling and document analysis. Hierarchical NMF (HNMF) variants are able to learn topics at various levels of granularity and illustrate their hierarchical relationship. Recently, nonnegative tensor factorization (NTF) methods have been applied in a similar fashion in order to handle data sets with complex, multi-modal structure. Hierarchical NTF (HNTF) methods have been proposed, however these methods do not naturally generalize their matrix-based counterparts. Here, we propose a new HNTF model which directly generalizes a HNMF model special case, and provide a supervised extension. We also provide a multiplicative updates training method for this model. Our experimental results show that this model more naturally illuminates the topic hierarchy than previous HNMF and HNTF methods. △ Less

Submitted 15 February, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: 6 pages, 2 figues, 3 tables

arXiv:2109.14079 [pdf, other]

Robust recovery of bandlimited graph signals via randomized dynamical sampling

Authors: Longxiu Huang, Deanna Needell, Sui Tang

Abstract: Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a $k$-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time… ▽ More Heat diffusion processes have found wide applications in modelling dynamical systems over graphs. In this paper, we consider the recovery of a $k$-bandlimited graph signal that is an initial signal of a heat diffusion process from its space-time samples. We propose three random space-time sampling regimes, termed dynamical sampling techniques, that consist in selecting a small subset of space-time nodes at random according to some probability distribution. We show that the number of space-time samples required to ensure stable recovery for each regime depends on a parameter called the spectral graph weighted coherence, that depends on the interplay between the dynamics over the graphs and sampling probability distributions. In optimal scenarios, no more than $\mathcal{O}(k \log(k))$ space-time samples are sufficient to ensure accurate and stable recovery of all $k$-bandlimited signals. In any case, dynamical sampling typically requires much fewer spatial samples than the static case by leveraging the temporal information. Then, we propose a computationally efficient method to reconstruct $k$-bandlimited signals from their space-time samples. We prove that it yields accurate reconstructions and that it is also stable to noise. Finally, we test dynamical sampling techniques on a wide variety of graphs. The numerical results support our theoretical findings and demonstrate the efficiency. △ Less

Submitted 3 October, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: corrected mistakes in plotting. arXiv admin note: text overlap with arXiv:1511.05118 by other authors

MSC Class: 94A20; 94A12

arXiv:2105.09065 [pdf, other]

Statistical Learning for Best Practices in Tattoo Removal

Authors: Richard Yim, Jamie Haddock, Deanna Needell

Abstract: The causes behind complications in laser-assisted tattoo removal are currently not well understood, and in the literature relating to tattoo removal the emphasis on removal treatment is on removal technologies and tools, not best parameters involved in the treatment process. Additionally, the very challenge of determining best practices is difficult given the complexity of interactions between fac… ▽ More The causes behind complications in laser-assisted tattoo removal are currently not well understood, and in the literature relating to tattoo removal the emphasis on removal treatment is on removal technologies and tools, not best parameters involved in the treatment process. Additionally, the very challenge of determining best practices is difficult given the complexity of interactions between factors that may correlate to these complications. In this paper we apply a battery of classical statistical methods and techniques to identify features that may be closely correlated to causes of complication during the tattoo removal process, and report quantitative evidence for potential best practices. We develop elementary statistical descriptions of tattoo data collected by the largest gang rehabilitation and reentry organization in the world, Homeboy Industries; perform parametric and nonparametric tests of significance; and finally, produce a statistical model explaining treatment parameter interactions, as well as develop a ranking system for treatment parameters utilizing bootstrapping and gradient boosting. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: 15 pages, 2 figures, 9 tables

arXiv:2009.09087 [pdf, other]

Feature Selection on Lyme Disease Patient Survey Data

Authors: Joshua Vendrow, Jamie Haddock, Deanna Needell, Lorraine Johnson

Abstract: Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease pa… ▽ More Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants' answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and $k$-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the "key" features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically. △ Less

Submitted 24 August, 2020; originally announced September 2020.

Comments: 9 pages, 8 figures, 6 tables

arXiv:2009.09074 [pdf, other]

COVID-19 Literature Topic-Based Search via Hierarchical NMF

Authors: Rachel Grotheer, Yihuan Huang, Pengyu Li, Elizaveta Rebrova, Deanna Needell, Longxiu Huang, Alona Kryshchenko, Xia Li, Kyung Ha, Oleksandr Kryshchenko

Abstract: A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics… ▽ More A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines, genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure. △ Less

Submitted 7 September, 2020; originally announced September 2020.

arXiv:2009.07612 [pdf, other]

Online nonnegative CP-dictionary learning for Markovian data

Authors: Hanbaek Lyu, Christopher Strohmeier, Deanna Needell

Abstract: Online Tensor Factorization (OTF) is a fundamental tool in learning low-dimensional interpretable features from streaming multi-modal data. While various algorithmic and theoretical aspects of OTF have been investigated recently, a general convergence guarantee to stationary points of the objective function without any incoherence or sparsity assumptions is still lacking even for the i.i.d. case.… ▽ More Online Tensor Factorization (OTF) is a fundamental tool in learning low-dimensional interpretable features from streaming multi-modal data. While various algorithmic and theoretical aspects of OTF have been investigated recently, a general convergence guarantee to stationary points of the objective function without any incoherence or sparsity assumptions is still lacking even for the i.i.d. case. In this work, we introduce a novel algorithm that learns a CANDECOMP/PARAFAC (CP) basis from a given stream of tensor-valued data under general constraints, including nonnegativity constraints that induce interpretability of the learned CP basis. We prove that our algorithm converges almost surely to the set of stationary points of the objective function under the hypothesis that the sequence of data tensors is generated by an underlying Markov chain. Our setting covers the classical i.i.d. case as well as a wide range of application contexts including data streams generated by independent or MCMC sampling. Our result closes a gap between OTF and Online Matrix Factorization in global convergence analysis \commHL{for CP-decompositions}. Experimentally, we show that our algorithm converges much faster than standard algorithms for nonnegative tensor factorization tasks on both synthetic and real-world data. Also, we demonstrate the utility of our algorithm on a diverse set of examples from image, video, and time-series data, illustrating how one may learn qualitatively different CP-dictionaries from the same tensor data by exploiting the tensor structure in multiple ways. △ Less

Submitted 2 April, 2022; v1 submitted 16 September, 2020; originally announced September 2020.

Comments: 41 pages, 5 figures

arXiv:2009.01279 [pdf, other]

Clustering of Nonnegative Data and an Application to Matrix Completion

Authors: C. Strohmeier, D. Needell

Abstract: In this paper, we propose a simple algorithm to cluster nonnegative data lying in disjoint subspaces. We analyze its performance in relation to a certain measure of correlation between said subspaces. We use our clustering algorithm to develop a matrix completion algorithm which can outperform standard matrix completion algorithms on data matrices satisfying certain natural conditions. In this paper, we propose a simple algorithm to cluster nonnegative data lying in disjoint subspaces. We analyze its performance in relation to a certain measure of correlation between said subspaces. We use our clustering algorithm to develop a matrix completion algorithm which can outperform standard matrix completion algorithms on data matrices satisfying certain natural conditions. △ Less

Submitted 2 September, 2020; originally announced September 2020.

arXiv:2007.15776 [pdf, other]

doi 10.3389/fams.2024.1284706

Random Vector Functional Link Networks for Function Approximation on Manifolds

Authors: Deanna Needell, Aaron A. Nelson, Rayan Saab, Palina Salanevich, Olov Schavemaker

Abstract: The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried… ▽ More The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of Igelnik and Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. In this paper, we begin to fill this theoretical gap. We provide a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number $n$ of network nodes. We then extend this result to the non-asymptotic setting, proving that one can achieve any desired approximation error with high probability provided $n$ is sufficiently large. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic forms. Finally, we illustrate our results on manifolds with numerical experiments. △ Less

Submitted 26 August, 2024; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: 37 pages, 1 figure

MSC Class: 62M45

Journal ref: Frontiers in Applied Mathematics and Statistics 10 (2024), 2297-4687

arXiv:2004.09112 [pdf, other]

COVID-19 Time-series Prediction by Joint Dictionary Learning and Online NMF

Authors: Hanbaek Lyu, Christopher Strohmeier, Georg Menz, Deanna Needell

Abstract: Predicting the spread and containment of COVID-19 is a challenge of utmost importance that the broader scientific community is currently facing. One of the main sources of difficulty is that a very limited amount of daily COVID-19 case data is available, and with few exceptions, the majority of countries are currently in the "exponential spread stage," and thus there is scarce information availabl… ▽ More Predicting the spread and containment of COVID-19 is a challenge of utmost importance that the broader scientific community is currently facing. One of the main sources of difficulty is that a very limited amount of daily COVID-19 case data is available, and with few exceptions, the majority of countries are currently in the "exponential spread stage," and thus there is scarce information available which would enable one to predict the phase transition between spread and containment. In this paper, we propose a novel approach to predicting the spread of COVID-19 based on dictionary learning and online nonnegative matrix factorization (online NMF). The key idea is to learn dictionary patterns of short evolution instances of the new daily cases in multiple countries at the same time, so that their latent correlation structures are captured in the dictionary patterns. We first learn such patterns by minibatch learning from the entire time-series and then further adapt them to the time-series by online NMF. As we progressively adapt and improve the learned dictionary patterns to the more recent observations, we also use them to make one-step predictions by the partial fitting. Lastly, by recursively applying the one-step predictions, we can extrapolate our predictions into the near future. Our prediction results can be directly attributed to the learned dictionary patterns due to their interpretability. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: 8 pages, 4 figures

arXiv:2001.00631 [pdf, other]

On Large-Scale Dynamic Topic Modeling with Nonnegative CP Tensor Decomposition

Authors: Miju Ahn, Nicole Eikmeier, Jamie Haddock, Lara Kassab, Alona Kryshchenko, Kathryn Leonard, Deanna Needell, R. W. M. A. Madushani, Elena Sizikova, Chuntian Wang

Abstract: There is currently an unprecedented demand for large-scale temporal data analysis due to the explosive growth of data. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of t… ▽ More There is currently an unprecedented demand for large-scale temporal data analysis due to the explosive growth of data. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower-dimensional nonnegative matrices. With this approach, however, information contained in the temporal dimension of the data is often neglected or underutilized. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAPAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors, thereby preserving the temporal information. The viability of NNCPD is demonstrated through application to both synthetic and real data, where significantly improved results are obtained compared to those of typical NMF-based methods. The advantages of NNCPD over such approaches are studied and discussed. To the best of our knowledge, this is the first time that NNCPD has been utilized for the purpose of dynamic topic modeling, and our findings will be transformative for both applications and further developments. △ Less

Submitted 14 October, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: 23 pages, 29 figures, submitted to Women in Data Science and Mathematics (WiSDM) Workshop Proceedings, "Advances in Data Science", AWM-Springer series

arXiv:1912.08294 [pdf, other]

Lower Memory Oblivious (Tensor) Subspace Embeddings with Fewer Random Bits: Modewise Methods for Least Squares

Authors: M. A. Iwen, D. Needell, E. Rebrova, A. Zare

Abstract: In this paper new general modewise Johnson-Lindenstrauss (JL) subspace embeddings are proposed that are both considerably faster to generate and easier to store than traditional JL embeddings when working with extremely large vectors and/or tensors. Corresponding embedding results are then proven for two different types of low-dimensional (tensor) subspaces. The first of these new subspace embed… ▽ More In this paper new general modewise Johnson-Lindenstrauss (JL) subspace embeddings are proposed that are both considerably faster to generate and easier to store than traditional JL embeddings when working with extremely large vectors and/or tensors. Corresponding embedding results are then proven for two different types of low-dimensional (tensor) subspaces. The first of these new subspace embedding results produces improved space complexity bounds for embeddings of rank-$r$ tensors whose CP decompositions are contained in the span of a fixed (but unknown) set of $r$ rank-one basis tensors. In the traditional vector setting this first result yields new and very general near-optimal oblivious subspace embedding constructions that require fewer random bits to generate than standard JL embeddings when embedding subspaces of $\mathbb{C}^N$ spanned by basis vectors with special Kronecker structure. The second result proven herein provides new fast JL embeddings of arbitrary $r$-dimensional subspaces $\mathcal{S} \subset \mathbb{C}^N$ which also require fewer random bits (and so are easier to store - i.e., require less space) than standard fast JL embedding methods in order to achieve small $ε$-distortions. These new oblivious subspace embedding results work by $(i)$ effectively folding any given vector in $\mathcal{S}$ into a (not necessarily low-rank) tensor, and then $(ii)$ embedding the resulting tensor into $\mathbb{C}^m$ for $m \leq C r \log^c(N) / ε^2$. Applications related to compression and fast compressed least squares solution methods are also considered, including those used for fitting low-rank CP decompositions, and the proposed JL embedding results are shown to work well numerically in both settings. △ Less

Submitted 16 December, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

arXiv:1912.00315 [pdf, other]

Topic-aware chatbot using Recurrent Neural Networks and Nonnegative Matrix Factorization

Authors: Yuchen Guo, Nicholas Hanoian, Zhexiao Lin, Nicholas Liskij, Hanbaek Lyu, Deanna Needell, Jiahao Qu, Henry Sojico, Yuliang Wang, Zhe Xiong, Zhenhong Zou

Abstract: We propose a novel model for a topic-aware chatbot by combining the traditional Recurrent Neural Network (RNN) encoder-decoder model with a topic attention layer based on Nonnegative Matrix Factorization (NMF). After learning topic vectors from an auxiliary text corpus via NMF, the decoder is trained so that it is more likely to sample response words from the most correlated topic vectors. One of… ▽ More We propose a novel model for a topic-aware chatbot by combining the traditional Recurrent Neural Network (RNN) encoder-decoder model with a topic attention layer based on Nonnegative Matrix Factorization (NMF). After learning topic vectors from an auxiliary text corpus via NMF, the decoder is trained so that it is more likely to sample response words from the most correlated topic vectors. One of the main advantages in our architecture is that the user can easily switch the NMF-learned topic vectors so that the chatbot obtains desired topic-awareness. We demonstrate our model by training on a single conversational data set which is then augmented with topic matrices learned from different auxiliary data sets. We show that our topic-aware chatbot not only outperforms the non-topic counterpart, but also that each topic-aware model qualitatively and contextually gives the most relevant answer depending on the topic of question. △ Less

Submitted 4 December, 2019; v1 submitted 30 November, 2019; originally announced December 2019.

Comments: 14 pages, 1 figure, 2 tables

arXiv:1911.01931 [pdf, other]

Online matrix factorization for Markovian data and applications to Network Dictionary Learning

Authors: Hanbaek Lyu, Deanna Needell, Laura Balzano

Abstract: Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of dependent data streams remains largely unexplored. In this paper, we… ▽ More Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of dependent data streams remains largely unexplored. In this paper, we show that a non-convex generalization of the well-known OMF algorithm for i.i.d. stream of data in \citep{mairal2010online} converges almost surely to the set of critical points of the expected loss function, even when the data matrices are functions of some underlying Markov chain satisfying a mild mixing condition. This allows one to extract features more efficiently from dependent data streams, as there is no need to subsample the data sequence to approximately satisfy the independence assumption. As the main application, by combining online non-negative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts ``network dictionary patches' from a given network in an online manner that encodes main features of the network. We demonstrate this technique and its application to network denoising problems on real-world network data. △ Less

Submitted 7 November, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

Comments: 39 pages, 13 figures

Journal ref: Journal of Machine Learning Research 21 (2020)

arXiv:1908.08479 [pdf, other]

Iterative Hard Thresholding for Low CP-rank Tensor Models

Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, Jing Qin

Abstract: Recovery of low-rank matrices from a small number of linear measurements is now well-known to be possible under various model assumptions on the measurements. Such results demonstrate robustness and are backed with provable theoretical guarantees. However, extensions to tensor recovery have only recently began to be studied and developed, despite an abundance of practical tensor applications. Rece… ▽ More Recovery of low-rank matrices from a small number of linear measurements is now well-known to be possible under various model assumptions on the measurements. Such results demonstrate robustness and are backed with provable theoretical guarantees. However, extensions to tensor recovery have only recently began to be studied and developed, despite an abundance of practical tensor applications. Recently, a tensor variant of the Iterative Hard Thresholding method was proposed and theoretical results were obtained that guarantee exact recovery of tensors with low Tucker rank. In this paper, we utilize the same tensor version of the Restricted Isometry Property (RIP) to extend these results for tensors with low CANDECOMP/PARAFAC (CP) rank. In doing so, we leverage recent results on efficient approximations of CP decompositions that remove the need for challenging assumptions in prior works. We complement our theoretical findings with empirical results that showcase the potential of the approach. △ Less

Submitted 22 August, 2019; originally announced August 2019.

arXiv:1907.11746 [pdf, other]

Bias of Homotopic Gradient Descent for the Hinge Loss

Authors: Denali Molitor, Deanna Needell, Rachel Ward

Abstract: Gradient descent is a simple and widely used optimization method for machine learning. For homogeneous linear classifiers applied to separable data, gradient descent has been shown to converge to the maximal margin (or equivalently, the minimal norm) solution for various smooth loss functions. The previous theory does not, however, apply to non-smooth functions such as the hinge loss which is wide… ▽ More Gradient descent is a simple and widely used optimization method for machine learning. For homogeneous linear classifiers applied to separable data, gradient descent has been shown to converge to the maximal margin (or equivalently, the minimal norm) solution for various smooth loss functions. The previous theory does not, however, apply to non-smooth functions such as the hinge loss which is widely used in practice. Here, we study the convergence of a homotopic variant of gradient descent applied to the hinge loss and provide explicit convergence rates to the max-margin solution for linearly separable data. △ Less

Submitted 26 July, 2019; originally announced July 2019.

arXiv:1905.13404 [pdf, other]

Data-driven Algorithm Selection and Parameter Tuning: Two Case studies in Optimization and Signal Processing

Authors: Jesus A. De Loera, Jamie Haddock, Anna Ma, Deanna Needell

Abstract: Machine learning algorithms typically rely on optimization subroutines and are well-known to provide very effective outcomes for many types of problems. Here, we flip the reliance and ask the reverse question: can machine learning algorithms lead to more effective outcomes for optimization problems? Our goal is to train machine learning methods to automatically improve the performance of optimizat… ▽ More Machine learning algorithms typically rely on optimization subroutines and are well-known to provide very effective outcomes for many types of problems. Here, we flip the reliance and ask the reverse question: can machine learning algorithms lead to more effective outcomes for optimization problems? Our goal is to train machine learning methods to automatically improve the performance of optimization and signal processing algorithms. As a proof of concept, we use our approach to improve two popular data processing subroutines in data science: stochastic gradient descent and greedy methods in compressed sensing. We provide experimental results that demonstrate the answer is ``yes'', machine learning algorithms do lead to more effective outcomes for optimization problems, and show the future potential for this research direction. △ Less

Submitted 26 July, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

arXiv:1904.08540 [pdf, other]

Matrix Completion With Selective Sampling

Authors: Christian Parkinson, Kevin Huynh, Deanna Needell

Abstract: Matrix completion is a classical problem in data science wherein one attempts to reconstruct a low-rank matrix while only observing some subset of the entries. Previous authors have phrased this problem as a nuclear norm minimization problem. Almost all previous work assumes no explicit structure of the matrix and uses uniform sampling to decide the observed entries. We suggest methods for selecti… ▽ More Matrix completion is a classical problem in data science wherein one attempts to reconstruct a low-rank matrix while only observing some subset of the entries. Previous authors have phrased this problem as a nuclear norm minimization problem. Almost all previous work assumes no explicit structure of the matrix and uses uniform sampling to decide the observed entries. We suggest methods for selective sampling in the case where we have some knowledge about the structure of the matrix and are allowed to design the observation set. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: 4 pages, 4 figures

arXiv:1809.03041 [pdf, other]

An iterative method for classification of binary data

Authors: Denali Molitor, Deanna Needell

Abstract: In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that… ▽ More In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that one can improve classification accuracy of this approach through iterative applications whose output serves as input to the next application. As a side consequence, we show that the original framework can be used as a data preprocessing step to improve the performance of other methods, such as support vector machines. For several simple settings, we showcase the ability to obtain theoretical guarantees for the accuracy of the iterative classification method. The simplicity of the underlying classification framework makes it amenable to theoretical analysis and studying this approach will hopefully serve as a step toward developing theory for more sophisticated deep learning technologies. △ Less

Submitted 9 September, 2018; originally announced September 2018.

MSC Class: 68T05; 68P30; 68U10

arXiv:1807.08825 [pdf, other]

Hierarchical Classification using Binary Data

Authors: Denali Molitor, Deanna Needell

Abstract: In classification problems, especially those that categorize data into a large number of classes, the classes often naturally follow a hierarchical structure. That is, some classes are likely to share similar structures and features. Those characteristics can be captured by considering a hierarchical relationship among the class labels. Here, we extend a recent simple classification approach on bi… ▽ More In classification problems, especially those that categorize data into a large number of classes, the classes often naturally follow a hierarchical structure. That is, some classes are likely to share similar structures and features. Those characteristics can be captured by considering a hierarchical relationship among the class labels. Here, we extend a recent simple classification approach on binary data in order to efficiently classify hierarchical data. In certain settings, specifically, when some classes are significantly easier to identify than others, we showcase computational and accuracy advantages. △ Less

Submitted 23 July, 2018; originally announced July 2018.

Comments: AAAI Magazine special Issue on Deep Models, Machine Learning and Artificial Intelligence Applications in National and International Security, June, 2018

arXiv:1805.12529 [pdf, other]

Analysis of Fast Structured Dictionary Learning

Authors: Saiprasad Ravishankar, Anna Ma, Deanna Needell

Abstract: Sparsity-based models and techniques have been exploited in many signal processing and imaging applications. Data-driven methods based on dictionary and sparsifying transform learning enable learning rich image features from data, and can outperform analytical models. In particular, alternating optimization algorithms have been popular for learning such models. In this work, we focus on alternatin… ▽ More Sparsity-based models and techniques have been exploited in many signal processing and imaging applications. Data-driven methods based on dictionary and sparsifying transform learning enable learning rich image features from data, and can outperform analytical models. In particular, alternating optimization algorithms have been popular for learning such models. In this work, we focus on alternating minimization for a specific structured unitary sparsifying operator learning problem, and provide a convergence analysis. While the algorithm converges to the critical points of the problem generally, our analysis establishes under mild assumptions, the local linear convergence of the algorithm to the underlying sparsifying model of the data. Analysis and numerical simulations show that our assumptions hold for standard probabilistic data models. In practice, the algorithm is robust to initialization. △ Less

Submitted 23 September, 2019; v1 submitted 31 May, 2018; originally announced May 2018.

Comments: This article has been accepted for publication in Information and Inference Published by Oxford University Press

arXiv:1801.09657 [pdf, other]

Matrix Completion for Structured Observations

Authors: Denali Molitor, Deanna Needell

Abstract: The need to predict or fill-in missing data, often referred to as matrix completion, is a common challenge in today's data-driven world. Previous strategies typically assume that no structural difference between observed and missing entries exists. Unfortunately, this assumption is woefully unrealistic in many applications. For example, in the classic Netflix challenge, in which one hopes to predi… ▽ More The need to predict or fill-in missing data, often referred to as matrix completion, is a common challenge in today's data-driven world. Previous strategies typically assume that no structural difference between observed and missing entries exists. Unfortunately, this assumption is woefully unrealistic in many applications. For example, in the classic Netflix challenge, in which one hopes to predict user-movie ratings for unseen films, the fact that the viewer has not watched a given movie may indicate a lack of interest in that movie, thus suggesting a lower rating than otherwise expected. We propose adjusting the standard nuclear norm minimization strategy for matrix completion to account for such structural differences between observed and unobserved entries by regularizing the values of the unobserved entries. We show that the proposed method outperforms nuclear norm minimization in certain settings. △ Less

Submitted 29 January, 2018; originally announced January 2018.

arXiv:1707.01945 [pdf, other]

Simple Classification using Binary Data

Authors: Deanna Needell, Rayan Saab, Tina Woolf

Abstract: Binary, or one-bit, representations of data arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this work, we study the problem of data classification from binary data and propose a framework with low computation and resource costs. We illustrate the utility of the proposed approach through stylized and realistic numerical experiments,… ▽ More Binary, or one-bit, representations of data arise naturally in many applications, and are appealing in both hardware implementations and algorithm design. In this work, we study the problem of data classification from binary data and propose a framework with low computation and resource costs. We illustrate the utility of the proposed approach through stylized and realistic numerical experiments, and provide a theoretical analysis for a simple case. We hope that our framework and analysis will serve as a foundation for studying similar types of approaches. △ Less

Submitted 6 July, 2017; originally announced July 2017.

arXiv:1310.5715 [pdf, ps, other]

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

Authors: Deanna Needell, Nathan Srebro, Rachel Ward

Abstract: We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $(L/μ)^2$ (where $L$ is a bound on the smoothness and $μ$ on the strong convexity) to a linear dependence on $L/μ$. Furthermore, we show how reweighting the sampling distribution (i.e. importance… ▽ More We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $(L/μ)^2$ (where $L$ is a bound on the smoothness and $μ$ on the strong convexity) to a linear dependence on $L/μ$. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results. We also discuss importance sampling for SGD more broadly and show how it can improve convergence also in other scenarios. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. In particular, we recast the randomized Kaczmarz algorithm as an instance of SGD, and apply our results to prove its exponential convergence, but to the solution of a weighted least squares problem rather than the original least squares problem. We then present a modified Kaczmarz algorithm with partially biased sampling which does converge to the original least squares solution with the same exponential convergence rate. △ Less

Submitted 16 January, 2015; v1 submitted 21 October, 2013; originally announced October 2013.

Comments: 22 pages, 6 figures

MSC Class: 65B99; 52A99; 60G99; 62L20

arXiv:1211.3444 [pdf, other]

Spectral Clustering: An empirical study of Approximation Algorithms and its Application to the Attrition Problem

Authors: B. Cung, T. Jin, J. Ramirez, A. Thompson, C. Boutsidis, D. Needell

Abstract: Clustering is the problem of separating a set of objects into groups (called clusters) so that objects within the same cluster are more similar to each other than to those in different clusters. Spectral clustering is a now well-known method for clustering which utilizes the spectrum of the data similarity matrix to perform this separation. Since the method relies on solving an eigenvector problem… ▽ More Clustering is the problem of separating a set of objects into groups (called clusters) so that objects within the same cluster are more similar to each other than to those in different clusters. Spectral clustering is a now well-known method for clustering which utilizes the spectrum of the data similarity matrix to perform this separation. Since the method relies on solving an eigenvector problem, it is computationally expensive for large datasets. To overcome this constraint, approximation methods have been developed which aim to reduce running time while maintaining accurate classification. In this article, we summarize and experimentally evaluate several approximation methods for spectral clustering. From an applications standpoint, we employ spectral clustering to solve the so-called attrition problem, where one aims to identify from a set of employees those who are likely to voluntarily leave the company from those who are not. Our study sheds light on the empirical performance of existing approximate spectral clustering methods and shows the applicability of these methods in an important business optimization related problem. △ Less

Submitted 14 November, 2012; originally announced November 2012.

Showing 1–39 of 39 results for author: Needell, D