Search | arXiv e-print repository

Estimating Value at Risk and Expected Shortfall: A Brief Review and Some New Developments

Authors: Kanon Kamronnaher, Andrew Bellucco, Whitney K. Huang, Colin M. Gallagher

Abstract: Value-at-risk (VaR) and expected shortfall (ES) are two commonly utilized metrics for quantifying financial risk. In this study, we review the widely employed Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models. These models are explored with diverse distributional assumptions on innovation, including parametric, non-parametric, and `semi-parametric' that incorporates a parame… ▽ More Value-at-risk (VaR) and expected shortfall (ES) are two commonly utilized metrics for quantifying financial risk. In this study, we review the widely employed Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models. These models are explored with diverse distributional assumptions on innovation, including parametric, non-parametric, and `semi-parametric' that incorporates a parametric tail distribution based on extreme value theory. Additionally, we introduce a non-parametric local linear quantile autoregression (LLQAR) with kernel weights depending on the distance between the current loss and past losses, and decreasing in the time lag. To evaluate the performance of different methods for VaR and ES estimation, we employ a multi-criteria approach. This involves mean squared error assessment using simulated data, backtesting on both simulated data and US stocks, and application of the ESBootstrap test. The LLQAR method, which does not necessarily require stationarity assumptions, seems to perform better for simulated non-stationary data as well as real-world data, for estimating VaR and ES. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 54 pages, 21 figures

MSC Class: 62-08

arXiv:2306.02857 [pdf, other]

Topological Data Analysis Assisted Automated Sleep Stage Scoring Using Airflow Signals

Authors: Yu-Min Chung, Whitney K. Huang, Hau-Tieng Wu

Abstract: Objective: Breathing pattern variability (BPV), as a universal physiological feature, encodes rich health information. We aim to show that, a high-quality automatic sleep stage scoring based on a proper quantification of BPV extracting from the single airflow signal can be achieved. Methods: Topological data analysis (TDA) is applied to characterize BPV from the intrinsically nonstationary airfl… ▽ More Objective: Breathing pattern variability (BPV), as a universal physiological feature, encodes rich health information. We aim to show that, a high-quality automatic sleep stage scoring based on a proper quantification of BPV extracting from the single airflow signal can be achieved. Methods: Topological data analysis (TDA) is applied to characterize BPV from the intrinsically nonstationary airflow signal, where the extracted features are used to train an automatic sleep stage scoring model using the XGBoost learner. The noise and artifacts commonly present in the airflow signal are recycled to enhance the performance of the trained system. The state-of-the-art approach is implemented for a comparison. Results: When applied to 30 whole night polysomnogram signals with standard annotations, the leave-one-subject-out cross-validation shows that the proposed features (overall accuracy 78.8\%$\pm$8.7\% and Cohen's kappa 0.56$\pm 0.15$) outperforms those considered in the state-of-the-art work (overall accuracy 75.0\%$\pm$9.6\% and Cohen's kappa 0.50$\pm 0.15$) when applied to automatically score wake, rapid eyeball movement (REM) and non-REM (NREM). The TDA features are shown to contain complementary information to the traditional features commonly used in the literature via examining the feature importance. The respiratory quality index is found to be essential in the trained system. Conclusion: The proposed TDA-assisted automatic annotation system can accurately distinguish wake, REM and NREM from the airflow signal. Significance: Since only one single airflow channel is needed and BPV is universal, the result suggests that the TDA-assisted signal processing has potential to be applied to other biomedical signals and homecare problems other than the sleep stage annotation. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.14535 [pdf, other]

Uncertainty Quantification over Graph with Conformalized Graph Neural Networks

Authors: Kexin Huang, Ying Jin, Emmanuel Candès, Jure Leskovec

Abstract: Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the… ▽ More Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the graph, CF-GNN produces a prediction set/interval that provably contains the true label with pre-defined coverage probability (e.g. 90%). We establish a permutation invariance condition that enables the validity of CP on graph data and provide an exact characterization of the test-time coverage. Moreover, besides valid coverage, it is crucial to reduce the prediction set size/interval length for practical use. We observe a key connection between non-conformity scores and network structures, which motivates us to develop a topology-aware output correction model that learns to update the prediction and produces more efficient prediction sets/intervals. Extensive experiments show that CF-GNN achieves any pre-defined target marginal coverage while significantly reducing the prediction set/interval size by up to 74% over the baselines. It also empirically achieves satisfactory conditional coverage over various raw and network features. △ Less

Submitted 30 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Published at NeurIPS 2023

arXiv:2305.14067 [pdf, other]

DIVA: A Dirichlet Process Mixtures Based Incremental Deep Clustering Algorithm via Variational Auto-Encoder

Authors: Zhenshan Bing, Yuan Meng, Yuqi Yun, Hang Su, Xiaojie Su, Kai Huang, Alois Knoll

Abstract: Generative model-based deep clustering frameworks excel in classifying complex data, but are limited in handling dynamic and complex features because they require prior knowledge of the number of clusters. In this paper, we propose a nonparametric deep clustering framework that employs an infinite mixture of Gaussians as a prior. Our framework utilizes a memoized online variational inference metho… ▽ More Generative model-based deep clustering frameworks excel in classifying complex data, but are limited in handling dynamic and complex features because they require prior knowledge of the number of clusters. In this paper, we propose a nonparametric deep clustering framework that employs an infinite mixture of Gaussians as a prior. Our framework utilizes a memoized online variational inference method that enables the "birth" and "merge" moves of clusters, allowing our framework to cluster data in a "dynamic-adaptive" manner, without requiring prior knowledge of the number of features. We name the framework as DIVA, a Dirichlet Process-based Incremental deep clustering framework via Variational Auto-Encoder. Our framework, which outperforms state-of-the-art baselines, exhibits superior performance in classifying complex data with dynamically changing features, particularly in the case of incremental features. We released our source code implementation at: https://github.com/Ghiara/diva △ Less

Submitted 24 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: static datasets comparision updated

arXiv:2305.04086 [pdf, other]

Efficient Learning for Selecting Top-m Context-Dependent Designs

Authors: Gongbo Zhang, Sihua Chen, Kuihua Huang, Yijie Peng

Abstract: We consider a simulation optimization problem for a context-dependent decision-making, which aims to determine the top-m designs for all contexts. Under a Bayesian framework, we formulate the optimal dynamic sampling decision as a stochastic dynamic programming problem, and develop a sequential sampling policy to efficiently learn the performance of each design under each context. The asymptotical… ▽ More We consider a simulation optimization problem for a context-dependent decision-making, which aims to determine the top-m designs for all contexts. Under a Bayesian framework, we formulate the optimal dynamic sampling decision as a stochastic dynamic programming problem, and develop a sequential sampling policy to efficiently learn the performance of each design under each context. The asymptotically optimal sampling ratios are derived to attain the optimal large deviations rate of the worst-case of probability of false selection. The proposed sampling policy is proved to be consistent and its asymptotic sampling ratios are asymptotically optimal. Numerical experiments demonstrate that the proposed method improves the efficiency for selection of top-m context-dependent designs. △ Less

Submitted 9 June, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2304.05223 [pdf, other]

Inhomogeneous graph trend filtering via a l2,0 cardinality penalty

Authors: Xiaoqing Huang, Andersen Ang, Kun Huang, Jie Zhang, Yijie Wang

Abstract: We study estimation of piecewise smooth signals over a graph. We propose a $\ell_{2,0}$-norm penalized Graph Trend Filtering (GTF) model to estimate piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness across the nodes. We prove that the proposed GTF model is simultaneously a k-means clustering on the signal over the nodes and a minimum graph cut on the edges of the graph… ▽ More We study estimation of piecewise smooth signals over a graph. We propose a $\ell_{2,0}$-norm penalized Graph Trend Filtering (GTF) model to estimate piecewise smooth graph signals that exhibit inhomogeneous levels of smoothness across the nodes. We prove that the proposed GTF model is simultaneously a k-means clustering on the signal over the nodes and a minimum graph cut on the edges of the graph, where the clustering and the cut share the same assignment matrix. We propose two methods to solve the proposed GTF model: a spectral decomposition method and a method based on simulated annealing. In the experiment on synthetic and real-world datasets, we show that the proposed GTF model has a better performances compared with existing approaches on the tasks of denoising, support recovery and semi-supervised classification. We also show that the proposed GTF model can be solved more efficiently than existing models for the dataset with a large edge set. △ Less

Submitted 4 June, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: 13 pages, 3 figures, 4 tables

MSC Class: 65F50; 68U01; 68R01 ACM Class: G.1.6; G.1.10

arXiv:2304.05187 [pdf, other]

Automatic Gradient Descent: Deep Learning without Hyperparameters

Authors: Jeremy Bernstein, Chris Mingard, Kevin Huang, Navid Azizan, Yisong Yue

Abstract: The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology. Existing optimisation frameworks neglect this information in favour of implicit architectural information (e.g. second-order methods) or architecture-agnostic distance functions (e.g. mirror descent). Meanwhile, the most popular optimiser in pr… ▽ More The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology. Existing optimisation frameworks neglect this information in favour of implicit architectural information (e.g. second-order methods) or architecture-agnostic distance functions (e.g. mirror descent). Meanwhile, the most popular optimiser in practice, Adam, is based on heuristics. This paper builds a new framework for deriving optimisation algorithms that explicitly leverage neural architecture. The theory extends mirror descent to non-convex composite objective functions: the idea is to transform a Bregman divergence to account for the non-linear structure of neural architecture. Working through the details for deep fully-connected networks yields automatic gradient descent: a first-order optimiser without any hyperparameters. Automatic gradient descent trains both fully-connected and convolutional networks out-of-the-box and at ImageNet scale. A PyTorch implementation is available at https://github.com/jxbz/agd and also in Appendix B. Overall, the paper supplies a rigorous theoretical foundation for a next-generation of architecture-dependent optimisers that work automatically and without hyperparameters. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2302.07194 [pdf, other]

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

Authors: Minshuo Chen, Kaixuan Huang, Tuo Zhao, Mengdi Wang

Abstract: Diffusion models achieve state-of-the-art performance in various generation tasks. However, their theoretical foundations fall far behind. This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. Our result provides sample complexity bounds for distribution estimation using diffusion mo… ▽ More Diffusion models achieve state-of-the-art performance in various generation tasks. However, their theoretical foundations fall far behind. This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. Our result provides sample complexity bounds for distribution estimation using diffusion models. We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated. Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution. The convergence rate depends on the subspace dimension, indicating that diffusion models can circumvent the curse of data ambient dimensionality. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 52 pages, 4 figures

arXiv:2302.05881 [pdf, other]

doi 10.1016/j.patcog.2024.110678

A generalizable framework for low-rank tensor completion with numerical priors

Authors: Shiran Yuan, Kaizhu Huang

Abstract: Low-Rank Tensor Completion, a method which exploits the inherent structure of tensors, has been studied extensively as an effective approach to tensor completion. Whilst such methods attained great success, none have systematically considered exploiting the numerical priors of tensor elements. Ignoring numerical priors causes loss of important information regarding the data, and therefore prevents… ▽ More Low-Rank Tensor Completion, a method which exploits the inherent structure of tensors, has been studied extensively as an effective approach to tensor completion. Whilst such methods attained great success, none have systematically considered exploiting the numerical priors of tensor elements. Ignoring numerical priors causes loss of important information regarding the data, and therefore prevents the algorithms from reaching optimal accuracy. Despite the existence of some individual works which consider ad hoc numerical priors for specific tasks, no generalizable frameworks for incorporating numerical priors have appeared. We present the Generalized CP Decomposition Tensor Completion (GCDTC) framework, the first generalizable framework for low-rank tensor completion that takes numerical priors of the data into account. We test GCDTC by further proposing the Smooth Poisson Tensor Completion (SPTC) algorithm, an instantiation of the GCDTC framework, whose performance exceeds current state-of-the-arts by considerable margins in the task of non-negative tensor completion, exemplifying GCDTC's effectiveness. Our code is open-source. △ Less

Submitted 18 June, 2024; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: Accepted to Pattern Recognition

arXiv:2302.05686 [pdf, other]

A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

Authors: Kevin H. Huang, Xing Liu, Andrew B. Duncan, Axel Gandy

Abstract: We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate… ▽ More We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth. △ Less

Submitted 2 July, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: COLT camera-ready version

arXiv:2301.12677 [pdf, other]

Distributed Stochastic Optimization under a General Variance Condition

Authors: Kun Huang, Xiao Li, Shi Pu

Abstract: Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. Though numerous algorithms have been proposed and successfully applied to general practical problems, their theoretical guarantees mainly rely on certain boundedness conditions on the stochastic gradients, varying from uniform boundedness to the relaxed… ▽ More Distributed stochastic optimization has drawn great attention recently due to its effectiveness in solving large-scale machine learning problems. Though numerous algorithms have been proposed and successfully applied to general practical problems, their theoretical guarantees mainly rely on certain boundedness conditions on the stochastic gradients, varying from uniform boundedness to the relaxed growth condition. In addition, how to characterize the data heterogeneity among the agents and its impacts on the algorithmic performance remains challenging. In light of such motivations, we revisit the classical Federated Averaging (FedAvg) algorithm (McMahan et al., 2017) as well as the more recent SCAFFOLD method (Karimireddy et al., 2020) for solving the distributed stochastic optimization problem and establish the convergence results under only a mild variance condition on the stochastic gradients for smooth nonconvex objective functions. Almost sure convergence to a stationary point is also established under the condition. Moreover, we discuss a more informative measurement for data heterogeneity as well as its implications. △ Less

Submitted 13 December, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: 16 pages, 2 figure

arXiv:2210.14843 [pdf, other]

TuneUp: A Simple Improved Training Strategy for Graph Neural Networks

Authors: Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Kenji Kawaguchi, Jure Leskovec

Abstract: Despite recent advances in Graph Neural Networks (GNNs), their training strategies remain largely under-explored. The conventional training strategy learns over all nodes in the original graph(s) equally, which can be sub-optimal as certain nodes are often more difficult to learn than others. Here we present TuneUp, a simple curriculum-based training strategy for improving the predictive performan… ▽ More Despite recent advances in Graph Neural Networks (GNNs), their training strategies remain largely under-explored. The conventional training strategy learns over all nodes in the original graph(s) equally, which can be sub-optimal as certain nodes are often more difficult to learn than others. Here we present TuneUp, a simple curriculum-based training strategy for improving the predictive performance of GNNs. TuneUp trains a GNN in two stages. In the first stage, TuneUp applies conventional training to obtain a strong base GNN. The base GNN tends to perform well on head nodes (nodes with large degrees) but less so on tail nodes (nodes with small degrees). Therefore, the second stage of TuneUp focuses on improving prediction on the difficult tail nodes by further training the base GNN on synthetically generated tail node data. We theoretically analyze TuneUp and show it provably improves generalization performance on tail nodes. TuneUp is simple to implement and applicable to a broad range of GNN architectures and prediction tasks. Extensive evaluation of TuneUp on five diverse GNN architectures, three types of prediction tasks, and both transductive and inductive settings shows that TuneUp significantly improves the performance of the base GNN on tail nodes, while often even improving the performance on head nodes. Altogether, TuneUp produces up to 57.6% and 92.2% relative predictive performance improvement in the transductive and the challenging inductive settings, respectively. △ Less

Submitted 26 August, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2206.14846 [pdf, other]

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

Authors: Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang

Abstract: Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes. Recent studies followed a non-adaptive setting, where the seed nodes are selected before the start of the diffusion process and network parameters are updated when the diffusion stops. We consider an adaptive version of content-dependent onl… ▽ More Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes. Recent studies followed a non-adaptive setting, where the seed nodes are selected before the start of the diffusion process and network parameters are updated when the diffusion stops. We consider an adaptive version of content-dependent online influence maximization problem where the seed nodes are sequentially activated based on real-time feedback. In this paper, we formulate the problem as an infinite-horizon discounted MDP under a linear diffusion process and present a model-based reinforcement learning solution. Our algorithm maintains a network model estimate and selects seed users adaptively, exploring the social network while improving the optimal policy optimistically. We establish $\widetilde O(\sqrt{T})$ regret bound for our algorithm. Empirical evaluations on synthetic network demonstrate the efficiency of our algorithm. △ Less

Submitted 29 June, 2022; originally announced June 2022.

arXiv:2204.01960 [pdf, other]

FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes

Authors: Paarth Neekhara, Shehzeen Hussain, Xinqiao Zhang, Ke Huang, Julian McAuley, Farinaz Koushanfar

Abstract: Deepfakes and manipulated media are becoming a prominent threat due to the recent advances in realistic image and video synthesis techniques. There have been several attempts at combating Deepfakes using machine learning classifiers. However, such classifiers do not generalize well to black-box image synthesis techniques and have been shown to be vulnerable to adversarial examples. To address thes… ▽ More Deepfakes and manipulated media are becoming a prominent threat due to the recent advances in realistic image and video synthesis techniques. There have been several attempts at combating Deepfakes using machine learning classifiers. However, such classifiers do not generalize well to black-box image synthesis techniques and have been shown to be vulnerable to adversarial examples. To address these challenges, we introduce a deep learning based semi-fragile watermarking technique that allows media authentication by verifying an invisible secret message embedded in the image pixels. Instead of identifying and detecting fake media using visual artifacts, we propose to proactively embed a semi-fragile watermark into a real image so that we can prove its authenticity when needed. Our watermarking framework is designed to be fragile to facial manipulations or tampering while being robust to benign image-processing operations such as image compression, scaling, saturation, contrast adjustments etc. This allows images shared over the internet to retain the verifiable watermark as long as face-swapping or any other Deepfake modification technique is not applied. We demonstrate that FaceSigns can embed a 128 bit secret as an imperceptible image watermark that can be recovered with a high bit recovery accuracy at several compression levels, while being non-recoverable when unseen Deepfake manipulations are applied. For a set of unseen benign and Deepfake manipulations studied in our work, FaceSigns can reliably detect manipulated content with an AUC score of 0.996 which is significantly higher than prior image watermarking and steganography techniques. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 13 pages, 8 figures

arXiv:2202.09134 [pdf, other]

Data Augmentation in the Underparameterized and Overparameterized Regimes

Authors: Kevin Han Huang, Peter Orbanz, Morgane Austern

Abstract: We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction ris… ▽ More We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables. △ Less

Submitted 28 September, 2023; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: Changed title and added an analysis on the effect of augmentations on the double-descent risk curve of a high-dimensional ridgeless estimator

arXiv:2202.00071 [pdf, other]

JULIA: Joint Multi-linear and Nonlinear Identification for Tensor Completion

Authors: Cheng Qian, Kejun Huang, Lucas Glass, Rakshith S. Srinivasa, Jimeng Sun

Abstract: Tensor completion aims at imputing missing entries from a partially observed tensor. Existing tensor completion methods often assume either multi-linear or nonlinear relationships between latent components. However, real-world tensors have much more complex patterns where both multi-linear and nonlinear relationships may coexist. In such cases, the existing methods are insufficient to describe t… ▽ More Tensor completion aims at imputing missing entries from a partially observed tensor. Existing tensor completion methods often assume either multi-linear or nonlinear relationships between latent components. However, real-world tensors have much more complex patterns where both multi-linear and nonlinear relationships may coexist. In such cases, the existing methods are insufficient to describe the data structure. This paper proposes a Joint mUlti-linear and nonLinear IdentificAtion (JULIA) framework for large-scale tensor completion. JULIA unifies the multi-linear and nonlinear tensor completion models with several advantages over the existing methods: 1) Flexible model selection, i.e., it fits a tensor by assigning its values as a combination of multi-linear and nonlinear components; 2) Compatible with existing nonlinear tensor completion methods; 3) Efficient training based on a well-designed alternating optimization approach. Experiments on six real large-scale tensors demonstrate that JULIA outperforms many existing tensor completion algorithms. Furthermore, JULIA can improve the performance of a class of nonlinear tensor completion methods. The results show that in some large-scale tensor completion scenarios, baseline methods with JULIA are able to obtain up to 55% lower root mean-squared-error and save 67% computational complexity. △ Less

Submitted 31 January, 2022; originally announced February 2022.

arXiv:2112.07746 [pdf, other]

CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning

Authors: Kevin Huang, Sahin Lale, Ugo Rosolia, Yuanyuan Shi, Anima Anandkumar

Abstract: Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces. First-order… ▽ More Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces. First-order methods that use the gradients of the rewards with respect to the actions as an update can mitigate this issue, but suffer from local optima due to the non-convex optimization landscape. To overcome these issues and achieve the best of both worlds, we propose a novel planner, Cross-Entropy Method with Gradient Descent (CEM-GD), that combines first-order methods with CEM. At the beginning of execution, CEM-GD uses CEM to sample a significant amount of trajectory rollouts to explore the optimization landscape and avoid poor local minima. It then uses the top trajectories as initialization for gradient descent and applies gradient updates to each of these trajectories to find the optimal action sequence. At each subsequent time step, however, CEM-GD samples much fewer trajectories from CEM before applying gradient updates. We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples by using the gradient information, while avoiding local optima using initially well-sampled trajectories. Furthermore, CEM-GD achieves better performance than CEM on a variety of continuous control benchmarks in MuJoCo with 100x fewer samples per time step, resulting in around 25% less computation time and 10% less memory usage. The implementation of CEM-GD is available at $\href{https://github.com/KevinHuang8/CEM-GD}{\text{https://github.com/KevinHuang8/CEM-GD}}$. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2107.06466 [pdf, other]

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

Authors: Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Abstract: Deep Reinforcement Learning (RL) powered by neural net approximation of the Q function has had enormous empirical success. While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions. This is the focus of this work, where we study function approximation with… ▽ More Deep Reinforcement Learning (RL) powered by neural net approximation of the Q function has had enormous empirical success. While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions. This is the focus of this work, where we study function approximation with two-layer neural networks (considering both ReLU and polynomial activation functions). Our first result is a computationally and statistically efficient algorithm in the generative model setting under completeness for two-layer neural networks. Our second result considers this setting but under only realizability of the neural net function class. Here, assuming deterministic dynamics, the sample complexity scales linearly in the algebraic dimension. In all cases, our results significantly improve upon what can be attained with linear (or eluder dimension) methods. △ Less

Submitted 25 December, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2107.04518 [pdf, ps, other]

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

Authors: Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Abstract: Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the… ▽ More Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality and attains optimal rates in several structured polynomial settings (in the dimension). We further demonstrate the applicability of our algorithms in RL in the generative model setting, resulting in improved sample complexity over prior approaches. Finally, we show that the standard optimistic algorithms (e.g., UCB) are sub-optimal by dimension factors. In the neural net setting (with polynomial activation functions) with noiseless reward, we provide a bandit algorithm with sample complexity equal to the intrinsic algebraic dimension. Again, we show that optimistic approaches have worse sample complexity, polynomial in the extrinsic dimension (which could be exponentially worse in the polynomial degree). △ Less

Submitted 9 July, 2021; originally announced July 2021.

arXiv:2107.02377 [pdf, ps, other]

A Short Note on the Relationship of Information Gain and Eluder Dimension

Authors: Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei

Abstract: Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning. Eluder dimension was originally proposed as a general complexity measure of function classes, but the common examples of where it is known to be small are function spaces (vector spaces). In these cases, the primary tool to upper bound the eluder dimension is the elliptic… ▽ More Eluder dimension and information gain are two widely used methods of complexity measures in bandit and reinforcement learning. Eluder dimension was originally proposed as a general complexity measure of function classes, but the common examples of where it is known to be small are function spaces (vector spaces). In these cases, the primary tool to upper bound the eluder dimension is the elliptic potential lemma. Interestingly, the elliptic potential lemma also features prominently in the analysis of linear bandits/reinforcement learning and their nonparametric generalization, the information gain. We show that this is not a coincidence -- eluder dimension and information gain are equivalent in a precise sense for reproducing kernel Hilbert spaces. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2104.13790 [pdf, other]

doi 10.1109/TNNLS.2022.3143554

FastAdaBelief: Improving Convergence Rate for Belief-based Adaptive Optimizers by Exploiting Strong Convexity

Authors: Yangfan Zhou, Kaizhu Huang, Cheng Cheng, Xuguang Wang, Amir Hussain, Xin Liu

Abstract: AdaBelief, one of the current best optimizers, demonstrates superior generalization ability compared to the popular Adam algorithm by viewing the exponential moving average of observed gradients. AdaBelief is theoretically appealing in that it has a data-dependent $O(\sqrt{T})$ regret bound when objective functions are convex, where $T$ is a time horizon. It remains however an open problem whether… ▽ More AdaBelief, one of the current best optimizers, demonstrates superior generalization ability compared to the popular Adam algorithm by viewing the exponential moving average of observed gradients. AdaBelief is theoretically appealing in that it has a data-dependent $O(\sqrt{T})$ regret bound when objective functions are convex, where $T$ is a time horizon. It remains however an open problem whether the convergence rate can be further improved without sacrificing its generalization ability. %on how to exploit strong convexity to further improve the convergence rate of AdaBelief. To this end, we make a first attempt in this work and design a novel optimization algorithm called FastAdaBelief that aims to exploit its strong convexity in order to achieve an even faster convergence rate. In particular, by adjusting the step size that better considers strong convexity and prevents fluctuation, our proposed FastAdaBelief demonstrates excellent generalization ability as well as superior convergence. As an important theoretical contribution, we prove that FastAdaBelief attains a data-dependant $O(\log T)$ regret bound, which is substantially lower than AdaBelief. On the empirical side, we validate our theoretical analysis with extensive experiments in both scenarios of strong and non-strong convexity on three popular baseline models. Experimental results are very encouraging: FastAdaBelief converges the quickest in comparison to all mainstream algorithms while maintaining an excellent generalization ability, in cases of both strong or non-strong convexity. FastAdaBelief is thus posited as a new benchmark model for the research community. △ Less

Submitted 25 May, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

arXiv:2009.11098 [pdf, other]

Modeling short-ranged dependence in block extrema with application to polar temperature data

Authors: Brook T. Russell, Whitney K. Huang

Abstract: The block maxima approach is an important method in univariate extreme value analysis. While assuming that block maxima are independent results in straightforward analysis, the resulting inferences maybe invalid when a series of block maxima exhibits dependence. We propose a model, based on a first-order Markov assumption, that incorporates dependence between successive block maxima through the us… ▽ More The block maxima approach is an important method in univariate extreme value analysis. While assuming that block maxima are independent results in straightforward analysis, the resulting inferences maybe invalid when a series of block maxima exhibits dependence. We propose a model, based on a first-order Markov assumption, that incorporates dependence between successive block maxima through the use of a bivariate logistic dependence structure while maintaining generalized extreme value (GEV) marginal distributions. Modeling dependence in this manner allows us to better estimate extreme quantiles when block maxima exhibit short-ranged dependence. We demonstrate via a simulation study that our first-order Markov GEV model performs well when successive block maxima are dependent, while still being reasonably robust when maxima are independent. We apply our method to two polar annual minimum air temperature data sets that exhibit short-ranged dependence structures, and find that the proposed model yields modified estimates of high quantiles. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: 40 pages, 8 figures, and 9 tables

MSC Class: 62G32

arXiv:2008.11721 [pdf, other]

How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels

Authors: Hua Shen, Ting-Hao Kenneth Huang

Abstract: Explaining to users why automated systems make certain mistakes is important and challenging. Researchers have proposed ways to automatically produce interpretations for deep neural network models. However, it is unclear how useful these interpretations are in helping users figure out why they are getting an error. If an interpretation effectively explains to users how the underlying deep neural n… ▽ More Explaining to users why automated systems make certain mistakes is important and challenging. Researchers have proposed ways to automatically produce interpretations for deep neural network models. However, it is unclear how useful these interpretations are in helping users figure out why they are getting an error. If an interpretation effectively explains to users how the underlying deep neural network model works, people who were presented with the interpretation should be better at predicting the model's outputs than those who were not. This paper presents an investigation on whether or not showing machine-generated visual interpretations helps users understand the incorrectly predicted labels produced by image classifiers. We showed the images and the correct labels to 150 online crowd workers and asked them to select the incorrectly predicted labels with or without showing them the machine-generated visual interpretations. The results demonstrated that displaying the visual interpretations did not increase, but rather decreased, the average guessing accuracy by roughly 10%. △ Less

Submitted 27 August, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

Comments: Accepted by The 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2020) https://github.com/huashen218/GuessWrongLabel

arXiv:2008.04473 [pdf, other]

Airflow recovery from thoracic and abdominal movements using Synchrosqueezing Transform and Locally Stationary Gaussian Process Regression

Authors: Whitney K. Huang, Yu-Min Chung, Yu-Bo Wang, Jeff E. Mandel, Hau-Tieng Wu

Abstract: Airflow signal encodes rich information about respiratory system. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, this is not practical for ambulatory monitoring of patients. Advances in sensor technology have made measurement of motion of the thorax and abdomen feasible with small inexpensive devices, but estimation of airflow from these time series is… ▽ More Airflow signal encodes rich information about respiratory system. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, this is not practical for ambulatory monitoring of patients. Advances in sensor technology have made measurement of motion of the thorax and abdomen feasible with small inexpensive devices, but estimation of airflow from these time series is challenging. We propose to use the nonlinear-type time-frequency analysis tool, synchrosqueezing transform, to properly represent the thoracic and abdominal movement signals as the features, which are used to recover the airflow by the locally stationary Gaussian process. We show that, using a dataset that contains respiratory signals under normal sleep conditions, an accurate prediction can be achieved by fitting the proposed model in the feature space both in the intra- and inter-subject setups. We also apply our method to a more challenging case, where subjects under general anesthesia underwent transitions from pressure support to unassisted ventilation to further demonstrate the utility of the proposed method. △ Less

Submitted 10 August, 2020; originally announced August 2020.

arXiv:2008.03776 [pdf, other]

Low-Rank Reorganization via Proportional Hazards Non-negative Matrix Factorization Unveils Survival Associated Gene Clusters

Authors: Zhi Huang, Paul Salama, Wei Shao, Jie Zhang, Kun Huang

Abstract: One of the central goals in precision health is the understanding and interpretation of high-dimensional biological data to identify genes and markers associated with disease initiation, development, and outcomes. Though significant effort has been committed to harness gene expression data for multiple analyses while accounting for time-to-event modeling by including survival times, many tradition… ▽ More One of the central goals in precision health is the understanding and interpretation of high-dimensional biological data to identify genes and markers associated with disease initiation, development, and outcomes. Though significant effort has been committed to harness gene expression data for multiple analyses while accounting for time-to-event modeling by including survival times, many traditional analyses have focused separately on non-negative matrix factorization (NMF) of the gene expression data matrix and survival regression with Cox proportional hazards model. In this work, Cox proportional hazards regression is integrated with NMF by imposing survival constraints. This is accomplished by jointly optimizing the Frobenius norm and partial log likelihood for events such as death or relapse. Simulation results on synthetic data demonstrated the superiority of the proposed method, when compared to other algorithms, in finding survival associated gene clusters. In addition, using human cancer gene expression data, the proposed technique can unravel critical clusters of cancer genes. The discovered gene clusters reflect rich biological implications and can help identify survival-related biomarkers. Towards the goal of precision health and cancer treatments, the proposed algorithm can help understand and interpret high-dimensional heterogeneous genomics data with accurate identification of survival-associated gene clusters. △ Less

Submitted 17 September, 2020; v1 submitted 9 August, 2020; originally announced August 2020.

arXiv:2006.08720 [pdf, other]

Estimating Concurrent Climate Extremes: A Conditional Approach

Authors: Whitney K. Huang, Adam H. Monahan, Francis W. Zwiers

Abstract: Simultaneous concurrence of extreme values across multiple climate variables can result in large societal and environmental impacts. Therefore, there is growing interest in understanding these concurrent extremes. In many applications, not only the frequency but also the magnitude of concurrent extremes are of interest. One way to approach this problem is to study the distribution of one climate v… ▽ More Simultaneous concurrence of extreme values across multiple climate variables can result in large societal and environmental impacts. Therefore, there is growing interest in understanding these concurrent extremes. In many applications, not only the frequency but also the magnitude of concurrent extremes are of interest. One way to approach this problem is to study the distribution of one climate variable given that another is extreme. In this work we develop a statistical framework for estimating bivariate concurrent extremes via a conditional approach, where univariate extreme value modeling is combined with dependence modeling of the conditional tail distribution using techniques from quantile regression and extreme value analysis to quantify concurrent extremes. We focus on the distribution of daily wind speed conditioned on daily precipitation taking its seasonal maximum. The Canadian Regional Climate Model large ensemble is used to assess the performance of the proposed framework both via a simulation study with specified dependence structure and via an analysis of the climate model-simulated dependence structure. △ Less

Submitted 12 March, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 39 pages, 18 figures, 1 table

MSC Class: 62G32; 62H10; 62P12

arXiv:2006.07889 [pdf, other]

Graph Meta Learning via Local Subgraphs

Authors: Kexin Huang, Marinka Zitnik

Abstract: Prevailing methods for graphs require abundant label and edge information for learning. When data for a new task are scarce, meta-learning can learn from prior experiences and form much-needed inductive biases for fast adaption to new tasks. Here, we introduce G-Meta, a novel meta-learning algorithm for graphs. G-Meta uses local subgraphs to transfer subgraph-specific information and learn transfe… ▽ More Prevailing methods for graphs require abundant label and edge information for learning. When data for a new task are scarce, meta-learning can learn from prior experiences and form much-needed inductive biases for fast adaption to new tasks. Here, we introduce G-Meta, a novel meta-learning algorithm for graphs. G-Meta uses local subgraphs to transfer subgraph-specific information and learn transferable knowledge faster via meta gradients. G-Meta learns how to quickly adapt to a new task using only a handful of nodes or edges in the new task and does so by learning from data points in other graphs or related, albeit disjoint label sets. G-Meta is theoretically justified as we show that the evidence for a prediction can be found in the local subgraph surrounding the target node or edge. Experiments on seven datasets and nine baseline methods show that G-Meta outperforms existing methods by up to 16.3%. Unlike previous methods, G-Meta successfully learns in challenging, few-shot learning settings that require generalization to completely new graphs and never-before-seen labels. Finally, G-Meta scales to large graphs, which we demonstrate on a new Tree-of-Life dataset comprising of 1,840 graphs, a two-orders of magnitude increase in the number of graphs used in prior work. △ Less

Submitted 8 January, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020

arXiv:2005.00718 [pdf, other]

Large-scale Uncertainty Estimation and Its Application in Revenue Forecast of SMEs

Authors: Zebang Zhang, Kui Zhao, Kai Huang, Quanhui Jia, Yanming Fang, Quan Yu

Abstract: The economic and banking importance of the small and medium enterprise (SME) sector is well recognized in contemporary society. Business credit loans are very important for the operation of SMEs, and the revenue is a key indicator of credit limit management. Therefore, it is very beneficial to construct a reliable revenue forecasting model. If the uncertainty of an enterprise's revenue forecasting… ▽ More The economic and banking importance of the small and medium enterprise (SME) sector is well recognized in contemporary society. Business credit loans are very important for the operation of SMEs, and the revenue is a key indicator of credit limit management. Therefore, it is very beneficial to construct a reliable revenue forecasting model. If the uncertainty of an enterprise's revenue forecasting can be estimated, a more proper credit limit can be granted. Natural gradient boosting approach, which estimates the uncertainty of prediction by a multi-parameter boosting algorithm based on the natural gradient. However, its original implementation is not easy to scale into big data scenarios, and computationally expensive compared to state-of-the-art tree-based models (such as XGBoost). In this paper, we propose a Scalable Natural Gradient Boosting Machines that is simple to implement, readily parallelizable, interpretable and yields high-quality predictive uncertainty estimates. According to the characteristics of revenue distribution, we derive an uncertainty quantification function. We demonstrate that our method can distinguish between samples that are accurate and inaccurate on revenue forecasting of SMEs. What's more, interpretability can be naturally obtained from the model, satisfying the financial needs. △ Less

Submitted 2 May, 2020; originally announced May 2020.

arXiv:2004.13344 [pdf, ps, other]

Robust Generative Adversarial Network

Authors: Shufei Zhang, Zhuang Qian, Kaizhu Huang, Jimin Xiao, Yuan He

Abstract: Generative adversarial networks (GANs) are powerful generative models, but usually suffer from instability and generalization problem which may lead to poor generations. Most existing works focus on stabilizing the training of the discriminator while ignoring the generalization properties. In this work, we aim to improve the generalization capability of GANs by promoting the local robustness withi… ▽ More Generative adversarial networks (GANs) are powerful generative models, but usually suffer from instability and generalization problem which may lead to poor generations. Most existing works focus on stabilizing the training of the discriminator while ignoring the generalization properties. In this work, we aim to improve the generalization capability of GANs by promoting the local robustness within the small neighborhood of the training samples. We also prove that the robustness in small neighborhood of training sets can lead to better generalization. Particularly, we design a robust optimization framework where the generator and discriminator compete with each other in a \textit{worst-case} setting within a small Wasserstein ball. The generator tries to map \textit{the worst input distribution} (rather than a Gaussian distribution used in most GANs) to the real data distribution, while the discriminator attempts to distinguish the real and fake distribution \textit{with the worst perturbation}. We have proved that our robust method can obtain a tighter generalization upper bound than traditional GANs under mild assumptions, ensuring a theoretical superiority of RGAN over GANs. A series of experiments on CIFAR-10, STL-10 and CelebA datasets indicate that our proposed robust framework can improve on five baseline GAN models substantially and consistently. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: This paper has been submitted to ICLR in Sep 25. 2019

arXiv:2004.08919 [pdf, other]

doi 10.1093/bioinformatics/btaa1005

DeepPurpose: a Deep Learning Library for Drug-Target Interaction Prediction

Authors: Kexin Huang, Tianfan Fu, Lucas Glass, Marinka Zitnik, Cao Xiao, Jimeng Sun

Abstract: Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use deep learning lib… ▽ More Accurate prediction of drug-target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use deep learning library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. △ Less

Submitted 9 December, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

Comments: Published in Bioinformatics (2020)

arXiv:2002.08675 [pdf, other]

Unsupervised Domain Adaptation via Discriminative Manifold Embedding and Alignment

Authors: You-Wei Luo, Chuan-Xian Ren, Pengfei Ge, Ke-Kun Huang, Yu-Feng Yu

Abstract: Unsupervised domain adaptation is effective in leveraging the rich information from the source domain to the unsupervised target domain. Though deep learning and adversarial strategy make an important breakthrough in the adaptability of features, there are two issues to be further explored. First, the hard-assigned pseudo labels on the target domain are risky to the intrinsic data structure. Secon… ▽ More Unsupervised domain adaptation is effective in leveraging the rich information from the source domain to the unsupervised target domain. Though deep learning and adversarial strategy make an important breakthrough in the adaptability of features, there are two issues to be further explored. First, the hard-assigned pseudo labels on the target domain are risky to the intrinsic data structure. Second, the batch-wise training manner in deep learning limits the description of the global structure. In this paper, a Riemannian manifold learning framework is proposed to achieve transferability and discriminability consistently. As to the first problem, this method establishes a probabilistic discriminant criterion on the target domain via soft labels. Further, this criterion is extended to a global approximation scheme for the second issue; such approximation is also memory-saving. The manifold metric alignment is exploited to be compatible with the embedding space. A theoretical error bound is derived to facilitate the alignment. Extensive experiments have been conducted to investigate the proposal and results of the comparison study manifest the superiority of consistent manifold learning framework. △ Less

Submitted 28 February, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

Comments: Accepted to AAAI 2020. Code available: \<https://github.com/LavieLuo/DRMEA>

arXiv:2002.06262 [pdf, other]

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective

Authors: Kaixuan Huang, Yuqing Wang, Molei Tao, Tuo Zhao

Abstract: Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity,… ▽ More Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim. △ Less

Submitted 22 December, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

Comments: Accepted in NeurIPS 2020

arXiv:1912.05122 [pdf, other]

Towards Better Forecasting by Fusing Near and Distant Future Visions

Authors: Jiezhu Cheng, Kaizhu Huang, Zibin Zheng

Abstract: Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting… ▽ More Multivariate time series forecasting is an important yet challenging problem in machine learning. Most existing approaches only forecast the series value of one future moment, ignoring the interactions between predictions of future moments with different temporal distance. Such a deficiency probably prevents the model from getting enough information about the future, thus limiting the forecasting accuracy. To address this problem, we propose Multi-Level Construal Neural Network (MLCNN), a novel multi-task deep learning framework. Inspired by the Construal Level Theory of psychology, this model aims to improve the predictive performance by fusing forecasting information (i.e., future visions) of different future time. We first use the Convolution Neural Network to extract multi-level abstract representations of the raw data for near and distant future predictions. We then model the interplay between multiple predictive tasks and fuse their future visions through a modified Encoder-Decoder architecture. Finally, we combine traditional Autoregression model with the neural network to solve the scale insensitive problem. Experiments on three real-world datasets show that our method achieves statistically significant improvements compared to the most state-of-the-art baseline methods, with average 4.59% reduction on RMSE metric and average 6.87% reduction on MAE metric. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Accepted by AAAI 2020

arXiv:1911.08723 [pdf, other]

Deep Minimax Probability Machine

Authors: Lirong He, Ziyi Guo, Kaizhu Huang, Zenglin Xu

Abstract: Deep neural networks enjoy a powerful representation and have proven effective in a number of applications. However, recent advances show that deep neural networks are vulnerable to adversarial attacks incurred by the so-called adversarial examples. Although the adversarial example is only slightly different from the input sample, the neural network classifies it as the wrong class. In order to al… ▽ More Deep neural networks enjoy a powerful representation and have proven effective in a number of applications. However, recent advances show that deep neural networks are vulnerable to adversarial attacks incurred by the so-called adversarial examples. Although the adversarial example is only slightly different from the input sample, the neural network classifies it as the wrong class. In order to alleviate this problem, we propose the Deep Minimax Probability Machine (DeepMPM), which applies MPM to deep neural networks in an end-to-end fashion. In a worst-case scenario, MPM tries to minimize an upper bound of misclassification probabilities, considering the global information (i.e., mean and covariance information of each class). DeepMPM can be more robust since it learns the worst-case bound on the probability of misclassification of future data. Experiments on two real-world datasets can achieve comparable classification performance with CNN, while can be more robust on adversarial attacks. △ Less

Submitted 20 November, 2019; originally announced November 2019.

arXiv:1911.06479

On Model Robustness Against Adversarial Examples

Authors: Shufei Zhang, Kaizhu Huang, Zenglin Xu

Abstract: We study the model robustness against adversarial examples, referred to as small perturbed input data that may however fool many state-of-the-art deep learning models. Unlike previous research, we establish a novel theory addressing the robustness issue from the perspective of stability of the loss function in the small neighborhood of natural examples. We propose to exploit an energy function to… ▽ More We study the model robustness against adversarial examples, referred to as small perturbed input data that may however fool many state-of-the-art deep learning models. Unlike previous research, we establish a novel theory addressing the robustness issue from the perspective of stability of the loss function in the small neighborhood of natural examples. We propose to exploit an energy function to describe the stability and prove that reducing such energy guarantees the robustness against adversarial examples. We also show that the traditional training methods including adversarial training with the $l_2$ norm constraint (AT) and Virtual Adversarial Training (VAT) tend to minimize the lower bound of our proposed energy function. We make an analysis showing that minimization of such lower bound can however lead to insufficient robustness within the neighborhood around the input sample. Furthermore, we design a more rational method with the energy regularization which proves to achieve better robustness than previous methods. Through a series of experiments, we demonstrate the superiority of our model on both supervised tasks and semi-supervised tasks. In particular, our proposed adversarial framework achieves the best performance compared with previous adversarial training methods on benchmark datasets MNIST, CIFAR-10, and SVHN. Importantly, they demonstrate much better robustness against adversarial examples than all the other comparison methods. △ Less

Submitted 10 June, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

Comments: some theoretical bounds need to be revised

arXiv:1911.06446 [pdf, other]

CASTER: Predicting Drug Interactions with Chemical Substructure Representation

Authors: Kexin Huang, Cao Xiao, Trong Nghia Hoang, Lucas M. Glass, Jimeng Sun

Abstract: Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on li… ▽ More Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on limited labelled data and do not generalize well to unseen drugs or DDIs; and (3) models are characterized by a large number of parameters, thus are hard to interpret. In this work, we develop a ChemicAl SubstrucTurE Representation (CASTER) framework that predicts DDIs given chemical structures of drugs.CASTER aims to mitigate these limitations via (1) a sequential pattern mining module rooted in the DDI mechanism to efficiently characterize functional sub-structures of drugs; (2) an auto-encoding module that leverages both labelled and unlabelled chemical structure data to improve predictive accuracy and generalizability; and (3) a dictionary learning module that explains the prediction via a small set of coefficients which measure the relevance of each input sub-structures to the DDI outcome. We evaluated CASTER on two real-world DDI datasets and showed that it performed better than state-of-the-art baselines and provided interpretable predictions. △ Less

Submitted 19 November, 2019; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: Accepted by AAAI 2020

arXiv:1909.12325 [pdf, other]

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Authors: Shahana Ibrahim, Xiao Fu, Nikos Kargas, Kejun Huang

Abstract: The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor met… ▽ More The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: 28 pages, 5 figures, to appear in 33rd NeurIPS conference, Vancouver, Canada

arXiv:1907.04450 [pdf, ps, other]

SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

Authors: Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

Abstract: This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementa… ▽ More This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions (with probability one). The SC condition is then used to establish an equivalence relationship between two different notions of SOSPs, one of which is computationally easy to verify. Based on this particular notion of SOSP, we design an algorithm named the Successive Negative-curvature grAdient Projection (SNAP), which successively performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP and its first-order extension SNAP$^+$, require $\mathcal{O}(1/ε^{2.5})$ iterations to compute an $(ε, \sqrtε)$-SOSP, and their per-iteration computational complexities are polynomial in the number of constraints and problem dimension. To our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate have been designed to find SOSPs of the important class of non-convex problems with linear constraints. △ Less

Submitted 9 July, 2019; originally announced July 2019.

arXiv:1907.02189 [pdf, other]

On the Convergence of FedAvg on Non-IID Data

Authors: Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

Abstract: Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realis… ▽ More Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $η$ must decay, even if full-gradient is used; otherwise, the solution will be $Ω(η)$ away from the optimal. △ Less

Submitted 25 June, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

Comments: 2020 International Conference on Learning Representations

arXiv:1901.08169 [pdf, other]

New Exploratory Tools for Extremal Dependence: Chi Networks and Annual Extremal Networks

Authors: Whitney K. Huang, Daniel S. Cooley, Imme Ebert-Uphoff, Chen Chen, Snigdhansu Chatterjee

Abstract: Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work we propose the $χ$ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A $χ$ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, $\hat χ$, exceeds a prescri… ▽ More Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work we propose the $χ$ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A $χ$ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, $\hat χ$, exceeds a prescribed threshold. We develop an initial $χ$ network estimator and we use a spatial block bootstrap to assess both the bias and variance of our estimator. We then develop a method to correct the bias of the initial estimator by incorporating the spatial structure in $χ$. In addition to the $χ$ network, which assesses spatial extremal dependence over an extended period of time, we further introduce an annual extremal network to explore the year-to-year temporal variation of extremal connections. We illustrate the $χ$ and the annual extremal networks by analyzing the hurricane season maximum precipitation at the US Gulf Coast and surrounding area. Analysis suggests there exists long distance extremal dependence for precipitation extremes in the study region and the strength of the extremal dependence may depend on some regional scale meteorological conditions, for example, sea surface temperature. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Comments: 26 pages, 7 figures, 1 table

MSC Class: 62

arXiv:1901.01860 [pdf, other]

JECL: Joint Embedding and Cluster Learning for Image-Text Pairs

Authors: Sean T. Yang, Kuan-Hao Huang, Bill Howe

Abstract: We propose JECL, a method for clustering image-caption pairs by training parallel encoders with regularized clustering and alignment objectives, simultaneously learning both representations and cluster assignments. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce, but free-text descriptions are common. JECL trains by minim… ▽ More We propose JECL, a method for clustering image-caption pairs by training parallel encoders with regularized clustering and alignment objectives, simultaneously learning both representations and cluster assignments. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce, but free-text descriptions are common. JECL trains by minimizing the Kullback-Leibler divergence between the distribution of the images and text to that of a combined joint target distribution and optimizing the Jensen-Shannon divergence between the soft cluster assignments of the images and text. Regularizers are also applied to JECL to prevent trivial solutions. Experiments show that JECL outperforms both single-view and multi-view methods on large benchmark image-caption datasets, and is remarkably robust to missing captions and varying data sizes. △ Less

Submitted 16 October, 2020; v1 submitted 4 January, 2019; originally announced January 2019.

Comments: ICPR2020

arXiv:1901.01568 [pdf, other]

doi 10.1109/TSP.2020.2989551

Learning Nonlinear Mixtures: Identifiability and Algorithm

Authors: Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, Kejun Huang

Abstract: Linear mixture models have proven very useful in a plethora of applications, e.g., topic modeling, clustering, and source separation. As a critical aspect of the linear mixture models, identifiability of the model parameters is well-studied, under frameworks such as independent component analysis and constrained matrix factorization. Nevertheless, when the linear mixtures are distorted by an unkno… ▽ More Linear mixture models have proven very useful in a plethora of applications, e.g., topic modeling, clustering, and source separation. As a critical aspect of the linear mixture models, identifiability of the model parameters is well-studied, under frameworks such as independent component analysis and constrained matrix factorization. Nevertheless, when the linear mixtures are distorted by an unknown nonlinear functions -- which is well-motivated and more realistic in many cases -- the identifiability issues are much less studied. This work proposes an identification criterion for a nonlinear mixture model that is well grounded in many real-world applications, and offers identifiability guarantees. A practical implementation based on a judiciously designed neural network is proposed to realize the criterion, and an effective learning algorithm is proposed. Numerical results on synthetic and real-data corroborate effectiveness of the proposed method. △ Less

Submitted 6 January, 2019; originally announced January 2019.

Comments: 15 pages

arXiv:1901.00032 [pdf, other]

Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks

Authors: Edward Kim, Zach Jensen, Alexander van Grootel, Kevin Huang, Matthew Staib, Sheshera Mysore, Haw-Shiuan Chang, Emma Strubell, Andrew McCallum, Stefanie Jegelka, Elsa Olivetti

Abstract: Leveraging new data sources is a key step in accelerating the pace of materials design and discovery. To complement the strides in synthesis planning driven by historical, experimental, and computed data, we present an automated method for connecting scientific literature to synthesis insights. Starting from natural language text, we apply word embeddings from language models, which are fed into a… ▽ More Leveraging new data sources is a key step in accelerating the pace of materials design and discovery. To complement the strides in synthesis planning driven by historical, experimental, and computed data, we present an automated method for connecting scientific literature to synthesis insights. Starting from natural language text, we apply word embeddings from language models, which are fed into a named entity recognition model, upon which a conditional variational autoencoder is trained to generate syntheses for arbitrary materials. We show the potential of this technique by predicting precursors for two perovskite materials, using only training data published over a decade prior to their first reported syntheses. We demonstrate that the model learns representations of materials corresponding to synthesis-related properties, and that the model's behavior complements existing thermodynamic knowledge. Finally, we apply the model to perform synthesizability screening for proposed novel perovskite compounds. △ Less

Submitted 17 February, 2019; v1 submitted 31 December, 2018; originally announced January 2019.

Comments: Added new funding support to the acknowledgments section in this version

arXiv:1812.01164 [pdf, other]

doi 10.1109/JETCAS.2019.2910864

Pre-Defined Sparse Neural Networks with Hardware Acceleration

Authors: Sourya Dey, Kuan-Wen Huang, Peter A. Beerel, Keith M. Chugg

Abstract: Neural networks have proven to be extremely powerful tools for modern artificial intelligence applications, but computational and storage complexity remain limiting factors. This paper presents two compatible contributions towards reducing the time, energy, computational, and storage complexities associated with multilayer perceptrons. Pre-defined sparsity is proposed to reduce the complexity duri… ▽ More Neural networks have proven to be extremely powerful tools for modern artificial intelligence applications, but computational and storage complexity remain limiting factors. This paper presents two compatible contributions towards reducing the time, energy, computational, and storage complexities associated with multilayer perceptrons. Pre-defined sparsity is proposed to reduce the complexity during both training and inference, regardless of the implementation platform. Our results show that storage and computational complexity can be reduced by factors greater than 5X without significant performance loss. The second contribution is an architecture for hardware acceleration that is compatible with pre-defined sparsity. This architecture supports both training and inference modes and is flexible in the sense that it is not tied to a specific number of neurons. For example, this flexibility implies that various sized neural networks can be supported on various sized Field Programmable Gate Array (FPGA)s. △ Less

Submitted 3 December, 2018; originally announced December 2018.

Comments: This work has been submitted to the IEEE Journal on Emerging and Selected Topics in Circuits and Systems for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1809.09143 [pdf, other]

EpiRL: A Reinforcement Learning Agent to Facilitate Epistasis Detection

Authors: Kexin Huang, Rodrigo Nogueira

Abstract: Epistasis (gene-gene interaction) is crucial to predicting genetic disease. Our work tackles the computational challenges faced by previous works in epistasis detection by modeling it as a one-step Markov Decision Process where the state is genome data, the actions are the interacted genes, and the reward is an interaction measurement for the selected actions. A reinforcement learning agent using… ▽ More Epistasis (gene-gene interaction) is crucial to predicting genetic disease. Our work tackles the computational challenges faced by previous works in epistasis detection by modeling it as a one-step Markov Decision Process where the state is genome data, the actions are the interacted genes, and the reward is an interaction measurement for the selected actions. A reinforcement learning agent using policy gradient method then learns to discover a set of highly interacted genes. △ Less

Submitted 24 September, 2018; originally announced September 2018.

arXiv:1808.02229 [pdf, other]

Grassmannian Learning: Embedding Geometry Awareness in Shallow and Deep Learning

Authors: Jiayao Zhang, Guangxu Zhu, Robert W. Heath Jr., Kaibin Huang

Abstract: Modern machine learning algorithms have been adopted in a range of signal-processing applications spanning computer vision, natural language processing, and artificial intelligence. Many relevant problems involve subspace-structured features, orthogonality constrained or low-rank constrained objective functions, or subspace distances. These mathematical characteristics are expressed naturally usin… ▽ More Modern machine learning algorithms have been adopted in a range of signal-processing applications spanning computer vision, natural language processing, and artificial intelligence. Many relevant problems involve subspace-structured features, orthogonality constrained or low-rank constrained objective functions, or subspace distances. These mathematical characteristics are expressed naturally using the Grassmann manifold. Unfortunately, this fact is not yet explored in many traditional learning algorithms. In the last few years, there have been growing interests in studying Grassmann manifold to tackle new learning problems. Such attempts have been reassured by substantial performance improvements in both classic learning and learning using deep neural networks. We term the former as shallow and the latter deep Grassmannian learning. The aim of this paper is to introduce the emerging area of Grassmannian learning by surveying common mathematical problems and primary solution approaches, and overviewing various applications. We hope to inspire practitioners in different fields to adopt the powerful tool of Grassmannian learning in their research. △ Less

Submitted 12 August, 2018; v1 submitted 7 August, 2018; originally announced August 2018.

Comments: Submitted to IEEE Signal Processing Magazine

arXiv:1807.05832 [pdf, ps, other]

Manifold Adversarial Learning

Authors: Shufei Zhang, Kaizhu Huang, Jianke Zhu, Yang Liu

Abstract: Recently proposed adversarial training methods show the robustness to both adversarial and original examples and achieve state-of-the-art results in supervised and semi-supervised learning. All the existing adversarial training methods consider only how the worst perturbed examples (i.e., adversarial examples) could affect the model output. Despite their success, we argue that such setting may be… ▽ More Recently proposed adversarial training methods show the robustness to both adversarial and original examples and achieve state-of-the-art results in supervised and semi-supervised learning. All the existing adversarial training methods consider only how the worst perturbed examples (i.e., adversarial examples) could affect the model output. Despite their success, we argue that such setting may be in lack of generalization, since the output space (or label space) is apparently less informative.In this paper, we propose a novel method, called Manifold Adversarial Training (MAT). MAT manages to build an adversarial framework based on how the worst perturbation could affect the distributional manifold rather than the output space. Particularly, a latent data space with the Gaussian Mixture Model (GMM) will be first derived.On one hand, MAT tries to perturb the input samples in the way that would rough the distributional manifold the worst. On the other hand, the deep learning model is trained trying to promote in the latent space the manifold smoothness, measured by the variation of Gaussian mixtures (given the local perturbation around the data point). Importantly, since the latent space is more informative than the output space, the proposed MAT can learn better a robust and compact data representation, leading to further performance improvement. The proposed MAT is important in that it can be considered as a superset of one recently-proposed discriminative feature learning approach called center loss. We conducted a series of experiments in both supervised and semi-supervised learning on three benchmark data sets, showing that the proposed MAT can achieve remarkable performance, much better than those of the state-of-the-art adversarial approaches. We also present a series of visualization which could generate further understanding or explanation on adversarial examples. △ Less

Submitted 14 November, 2019; v1 submitted 16 July, 2018; originally announced July 2018.

Comments: 11 pages, 26 figures

arXiv:1805.08311 [pdf, other]

AgileNet: Lightweight Dictionary-based Few-shot Learning

Authors: Mohammad Ghasemzadeh, Fang Lin, Bita Darvish Rouhani, Farinaz Koushanfar, Ke Huang

Abstract: The success of deep learning models is heavily tied to the use of massive amount of labeled data and excessively long training time. With the emergence of intelligent edge applications that use these models, the critical challenge is to obtain the same inference capability on a resource-constrained device while providing adaptability to cope with the dynamic changes in the data. We propose AgileNe… ▽ More The success of deep learning models is heavily tied to the use of massive amount of labeled data and excessively long training time. With the emergence of intelligent edge applications that use these models, the critical challenge is to obtain the same inference capability on a resource-constrained device while providing adaptability to cope with the dynamic changes in the data. We propose AgileNet, a novel lightweight dictionary-based few-shot learning methodology which provides reduced complexity deep neural network for efficient execution at the edge while enabling low-cost updates to capture the dynamics of the new data. Evaluations of state-of-the-art few-shot learning benchmarks demonstrate the superior accuracy of AgileNet compared to prior arts. Additionally, AgileNet is the first few-shot learning approach that prevents model updates by eliminating the knowledge obtained from the primary training. This property is ensured through the dictionaries learned by our novel end-to-end structured decomposition, which also reduces the memory footprint and computation complexity to match the edge device constraints. △ Less

Submitted 21 May, 2018; originally announced May 2018.

Comments: 10 Pages

arXiv:1803.01257 [pdf, other]

doi 10.1109/MSP.2018.2877582

Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications

Authors: Xiao Fu, Kejun Huang, Nicholas D. Sidiropoulos, Wing-Kin Ma

Abstract: Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from t… ▽ More Nonnegative matrix factorization (NMF) has become a workhorse for signal and data analytics, triggered by its model parsimony and interpretability. Perhaps a bit surprisingly, the understanding to its model identifiability---the major reason behind the interpretability in many applications such as topic mining and hyperspectral imaging---had been rather limited until recent years. Beginning from the 2010s, the identifiability research of NMF has progressed considerably: Many interesting and important results have been discovered by the signal processing (SP) and machine learning (ML) communities. NMF identifiability has a great impact on many aspects in practice, such as ill-posed formulation avoidance and performance-guaranteed algorithm design. On the other hand, there is no tutorial paper that introduces NMF from an identifiability viewpoint. In this paper, we aim at filling this gap by offering a comprehensive and deep tutorial on model identifiability of NMF as well as the connections to algorithms and applications. This tutorial will help researchers and graduate students grasp the essence and insights of NMF, thereby avoiding typical `pitfalls' that are often times due to unidentifiable NMF formulations. This paper will also help practitioners pick/design suitable factorization tools for their own problems. △ Less

Submitted 16 November, 2018; v1 submitted 3 March, 2018; originally announced March 2018.

Comments: accepted version, IEEE Signal Processing Magazine; supplementary materials added. Some minor revisions implemented

arXiv:1802.09387 [pdf, other]

Estimating Precipitation Extremes using Log-Histospline

Authors: Whitney K. Huang, Douglas W. Nychka, Hao Zhang

Abstract: One of the commonly used approaches to modeling extremes is the peaks-over-threshold (POT) method. The POT method models exceedances over a threshold that is sufficiently high or low so that the exceedance has approximately a generalized Pareto distribution (GPD). This method requires the selection of a threshold that might affect the estimates. Here we propose an alternative method, the Log-Histo… ▽ More One of the commonly used approaches to modeling extremes is the peaks-over-threshold (POT) method. The POT method models exceedances over a threshold that is sufficiently high or low so that the exceedance has approximately a generalized Pareto distribution (GPD). This method requires the selection of a threshold that might affect the estimates. Here we propose an alternative method, the Log-Histospline (LHSpline), to explore modeling the tail behavior and the remainder of the density in one step using the full range of the data. LHSpline applies a smoothing spline model to a finely binned histogram of the log transformed data to estimate its log density. By construction, a LHSpline estimation is constrained to have polynomial tail behavior, a feature commonly observed in daily rainfall observations. We illustrate the LHSpline method by analyzing precipitation data collected in Houston, Texas. △ Less

Submitted 4 October, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

Comments: 32 pages, 13 figures, 2 table

MSC Class: 62

Showing 1–50 of 69 results for author: Huang, K