Skip to main content

Showing 1–26 of 26 results for author: Pan, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.06120  [pdf, other

    cs.LG stat.ML

    Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

    Authors: Yijun Dong, Hoang Phan, Xiang Pan, Qi Lei

    Abstract: We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2403.11125  [pdf

    stat.ML cs.LG math.PR

    Machine learning-based system reliability analysis with Gaussian Process Regression

    Authors: Lisang Zhou, Ziqian Luo, Xueting Pan

    Abstract: Machine learning-based reliability analysis methods have shown great advancements for their computational efficiency and accuracy. Recently, many efficient learning strategies have been proposed to enhance the computational performance. However, few of them explores the theoretical optimal learning strategy. In this article, we propose several theorems that facilitates such exploration. Specifical… ▽ More

    Submitted 20 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  3. arXiv:2403.06424  [pdf, other

    stat.ML cs.CV cs.LG

    Bridging Domains with Approximately Shared Features

    Authors: Ziliang Samuel Zhong, Xiang Pan, Qi Lei

    Abstract: Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains. A fundamental challenge is devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. To address the challenge, we propose a… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  4. arXiv:2207.08556  [pdf, other

    cs.CR stat.ML

    A Certifiable Security Patch for Object Tracking in Self-Driving Systems via Historical Deviation Modeling

    Authors: Xudong Pan, Qifan Xiao, Mi Zhang, Min Yang

    Abstract: Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  5. arXiv:2206.14371  [pdf, other

    stat.ML cs.AI cs.CR cs.LG

    Matryoshka: Stealing Functionality of Private ML Data by Hiding Models in Model

    Authors: Xudong Pan, Yifan Yan, Shengyao Zhang, Mi Zhang, Min Yang

    Abstract: In this paper, we present a novel insider attack called Matryoshka, which employs an irrelevant scheduled-to-publish DNN model as a carrier model for covert transmission of multiple secret models which memorize the functionality of private ML data stored in local data centers. Instead of treating the parameters of the carrier model as bit strings and applying conventional steganography, we devise… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: A preprint work

  6. arXiv:2205.02432  [pdf, other

    stat.ME stat.CO

    A Unified Algorithm for Penalized Convolution Smoothed Quantile Regression

    Authors: Rebeka Man, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou

    Abstract: Penalized quantile regression (QR) is widely used for studying the relationship between a response variable and a set of predictors under data heterogeneity in high-dimensional settings. Compared to penalized least squares, scalable algorithms for fitting penalized QR are lacking due to the non-differentiable piecewise linear loss function. To overcome the lack of smoothness, a recently proposed c… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  7. arXiv:2104.09368  [pdf, other

    econ.EM econ.GN stat.ML

    Deep Reinforcement Learning in a Monetary Model

    Authors: Mingli Chen, Andreas Joseph, Michael Kumhof, Xinlei Pan, Xuan Zhou

    Abstract: We propose using deep reinforcement learning to solve dynamic stochastic general equilibrium models. Agents are represented by deep artificial neural networks and learn to solve their dynamic optimisation problem by interacting with the model environment, of which they have no a priori knowledge. Deep reinforcement learning offers a flexible yet principled way to model bounded rationality within t… ▽ More

    Submitted 5 January, 2023; v1 submitted 19 April, 2021; originally announced April 2021.

  8. arXiv:2012.05187  [pdf, other

    math.ST stat.ME

    Smoothed Quantile Regression with Large-Scale Inference

    Authors: Xuming He, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou

    Abstract: Quantile regression is a powerful tool for learning the relationship between a response variable and a multivariate predictor while exploring heterogeneous effects. In this paper, we consider statistical inference for quantile regression with large-scale data in the "increasing dimension" regime. We provide a comprehensive and in-depth analysis of a convolution-type smoothing approach that achieve… ▽ More

    Submitted 17 May, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

    Comments: An R package conquer for fitting smoothed quantile regression is available in CRAN, https://cran.r-project.org/web/packages/conquer/index.html

  9. arXiv:2010.13356  [pdf, other

    cs.CR cs.LG stat.ML

    Exploring the Security Boundary of Data Reconstruction via Neuron Exclusivity Analysis

    Authors: Xudong Pan, Mi Zhang, Yifan Yan, Jiaming Zhu, Min Yang

    Abstract: Among existing privacy attacks on the gradient of neural networks, \emph{data reconstruction attack}, which reverse engineers the training batch from the gradient, poses a severe threat on the private training data. Despite its empirical success on large architectures and small training batches, unstable reconstruction accuracy is also observed when a smaller architecture or a larger batch is unde… ▽ More

    Submitted 22 December, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: Accepted by USENIX Security'22; A preprint version

  10. arXiv:2005.10831  [pdf

    q-bio.QM cs.LG stat.ML

    Repurpose Open Data to Discover Therapeutics for COVID-19 using Deep Learning

    Authors: Xiangxiang Zeng, Xiang Song, Tengfei Ma, Xiaoqin Pan, Yadi Zhou, Yuan Hou, Zheng Zhang, George Karypis, Feixiong Cheng

    Abstract: There have been more than 850,000 confirmed cases and over 48,000 deaths from the human coronavirus disease 2019 (COVID-19) pandemic, caused by novel severe acute respiratory syndrome coronavirus (SARS-CoV-2), in the United States alone. However, there are currently no proven effective medications against COVID-19. Drug repurposing offers a promising way for the development of prevention and treat… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    MSC Class: I.2.1

  11. arXiv:2004.12909  [pdf, other

    cs.LG stat.ML

    Evolutionary Stochastic Policy Distillation

    Authors: Hao Sun, Xinyu Pan, Bo Dai, Dahua Lin, Bolei Zhou

    Abstract: Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging reinforcement learning problem due to the sparsity of reward signals. In this work, we propose a new formulation of GCRS tasks from the perspective of the drifted random walk on the state space, and design a novel method called Evolutionary Stochastic Policy Distillation (ESPD) to solve them based on the insight of reducing th… ▽ More

    Submitted 30 April, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

  12. arXiv:1909.12220  [pdf, other

    cs.CV cs.LG stat.ML

    Implicit Semantic Data Augmentation for Deep Networks

    Authors: Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Cheng Wu, Gao Huang

    Abstract: In this paper, we propose a novel implicit semantic data augmentation (ISDA) approach to complement traditional augmentation techniques like flipping, translation or rotation. Our work is motivated by the intriguing property that deep networks are surprisingly good at linearizing features, such that certain directions in the deep feature space correspond to meaningful semantic transformations, e.g… ▽ More

    Submitted 24 April, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Accepted by NeurIPS 2019

  13. arXiv:1909.03403  [pdf, other

    cs.CV cs.LG stat.ML

    Open Compound Domain Adaptation

    Authors: Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong

    Abstract: A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e.g., sunny weather) for achieving high performance on the test data in a target domain (e.g., rainy weather). Whether the target contains a single homogeneous domain or multiple heterogeneous domains, existing works always assume that there exist clear distinctions between the domains, which… ▽ More

    Submitted 29 March, 2020; v1 submitted 8 September, 2019; originally announced September 2019.

    Comments: To appear in CVPR 2020 as an oral presentation. Code, datasets and models are available at: https://liuziwei7.github.io/projects/CompoundDomain.html

  14. arXiv:1907.09470  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Characterizing Attacks on Deep Reinforcement Learning

    Authors: Xinlei Pan, Chaowei Xiao, Warren He, Shuang Yang, Jian Peng, Mingjie Sun, Jinfeng Yi, Zijiang Yang, Mingyan Liu, Bo Li, Dawn Song

    Abstract: Recent studies show that Deep Reinforcement Learning (DRL) models are vulnerable to adversarial attacks, which attack DRL models by adding small perturbations to the observations. However, some attacks assume full availability of the victim model, and some require a huge amount of computation, making them less feasible for real world applications. In this work, we make further explorations of the… ▽ More

    Submitted 16 February, 2022; v1 submitted 21 July, 2019; originally announced July 2019.

    Comments: AAMAS 2022, 13 pages, 6 figures

  15. arXiv:1907.04027  [pdf, other

    math.ST stat.ML

    Iteratively Reweighted $\ell_1$-Penalized Robust Regression

    Authors: Xiaoou Pan, Qiang Sun, Wen-Xin Zhou

    Abstract: This paper investigates tradeoffs among optimization errors, statistical rates of convergence and the effect of heavy-tailed errors for high-dimensional robust regression with nonconvex regularization. When the additive errors in linear models have only bounded second moment, we show that iteratively reweighted $\ell_1$-penalized adaptive Huber regression estimator satisfies exponential deviation… ▽ More

    Submitted 29 December, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: 62 pages

  16. arXiv:1904.11082  [pdf, other

    cs.LG stat.ML

    How You Act Tells a Lot: Privacy-Leakage Attack on Deep Reinforcement Learning

    Authors: Xinlei Pan, Weiyao Wang, Xiaoshuai Zhang, Bo Li, Jinfeng Yi, Dawn Song

    Abstract: Machine learning has been widely applied to various applications, some of which involve training with privacy-sensitive data. A modest number of data breaches have been studied, including credit card information in natural language data and identities from face dataset. However, most of these studies focus on supervised learning models. As deep reinforcement learning (DRL) has been deployed in a n… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: The first three authors contributed equally. Accepted by AAMAS 2019

  17. arXiv:1806.07001  [pdf, ps, other

    stat.ML cs.LG

    Theoretical Analysis of Image-to-Image Translation with Adversarial Learning

    Authors: Xudong Pan, Mi Zhang, Daizong Ding

    Abstract: Recently, a unified model for image-to-image translation tasks within adversarial learning framework has aroused widespread research interests in computer vision practitioners. Their reported empirical success however lacks solid theoretical interpretations for its inherent mechanism. In this paper, we reformulate their model from a brand-new geometrical perspective and have eventually reached a f… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: will appear in ICML2018

  18. arXiv:1806.04245  [pdf, other

    cs.LG stat.ML

    Learning to Speed Up Structured Output Prediction

    Authors: Xingyuan Pan, Vivek Srikumar

    Abstract: Predicting structured outputs can be computationally onerous due to the combinatorially large output spaces. In this paper, we focus on reducing the prediction time of a trained black-box structured classifier without losing accuracy. To do so, we train a speedup classifier that learns to mimic a black-box classifier under the learning-to-search approach. As the structured classifier predicts more… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: International Conference on Machine Learning, Stockholm, Sweden, 2018

  19. arXiv:1712.02270  [pdf, ps, other

    q-bio.GN cs.LG stat.ML

    Attention based convolutional neural network for predicting RNA-protein binding sites

    Authors: Xiaoyong Pan, Junchi Yan

    Abstract: RNA-binding proteins (RBPs) play crucial roles in many biological processes, e.g. gene regulation. Computational identification of RBP binding sites on RNAs are urgently needed. In particular, RBPs bind to RNAs by recognizing sequence motifs. Thus, fast locating those motifs on RNA sequences is crucial and time-efficient for determining whether the RNAs interact with the RBPs or not. In this study… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.

    Journal ref: NIPS 2017 Computational Biology Workshop

  20. arXiv:1706.04572  [pdf, other

    stat.ML cs.CV cs.LG

    Deep Learning Methods for Efficient Large Scale Video Labeling

    Authors: Miha Skalic, Marcin Pekalski, Xingguo E. Pan

    Abstract: We present a solution to "Google Cloud and YouTube-8M Video Understanding Challenge" that ranked 5th place. The proposed model is an ensemble of three model families, two frame level and one video level. The training was performed on augmented dataset, with cross validation.

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: 7 pages, 5 tables, 1 figure

  21. arXiv:1610.06848  [pdf, other

    cs.LG stat.ML

    An Efficient Minibatch Acceptance Test for Metropolis-Hastings

    Authors: Daniel Seita, Xinlei Pan, Haoyu Chen, John Canny

    Abstract: We present a novel Metropolis-Hastings method for large datasets that uses small expected-size minibatches of data. Previous work on reducing the cost of Metropolis-Hastings tests yield variable data consumed per sample, with only constant factor reductions versus using the full dataset for each sample. Here we present a method that can be tuned to provide arbitrarily small batch sizes, by adjusti… ▽ More

    Submitted 9 July, 2017; v1 submitted 18 October, 2016; originally announced October 2016.

    Comments: Final version for UAI 2017

  22. arXiv:1605.09721  [pdf, other

    stat.ML cs.DC cs.DS cs.LG math.OC

    CYCLADES: Conflict-free Asynchronous Machine Learning

    Authors: Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael I. Jordan, Kannan Ramchandran, Chris Re, Benjamin Recht

    Abstract: We present CYCLADES, a general framework for parallelizing stochastic optimization algorithms in a shared memory setting. CYCLADES is asynchronous during shared model updates, and requires no memory locking mechanisms, similar to HOGWILD!-type algorithms. Unlike HOGWILD!, CYCLADES introduces no conflicts during the parallel execution, and offers a black-box analysis for provable speedups across a… ▽ More

    Submitted 31 May, 2016; originally announced May 2016.

  23. arXiv:1507.06970  [pdf, ps, other

    stat.ML cs.DC cs.DS cs.LG math.OC

    Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

    Authors: Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan

    Abstract: We introduce and analyze stochastic optimization methods where the input to each gradient update is perturbed by bounded noise. We show that this framework forms the basis of a unified approach to analyze asynchronous implementations of stochastic optimization algorithms.In this framework, asynchronous stochastic optimization algorithms can be thought of as serial methods operating on noisy inputs… ▽ More

    Submitted 25 March, 2016; v1 submitted 24 July, 2015; originally announced July 2015.

    Comments: 30 pages

    MSC Class: 65K10; 65Y05; 68W10; 68W20

  24. arXiv:1507.05086  [pdf, other

    cs.DC cs.DS stat.ML

    Parallel Correlation Clustering on Big Graphs

    Authors: Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan

    Abstract: Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large gra… ▽ More

    Submitted 20 July, 2015; v1 submitted 17 July, 2015; originally announced July 2015.

  25. arXiv:1310.5426  [pdf, other

    cs.LG cs.DC stat.ML

    MLI: An API for Distributed Machine Learning

    Authors: Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael I. Jordan, Tim Kraska

    Abstract: MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implement… ▽ More

    Submitted 25 October, 2013; v1 submitted 21 October, 2013; originally announced October 2013.

  26. arXiv:1308.1624  [pdf

    stat.AP

    Poisson-type Multivariate Transfer Function Model Reveals Short-term Effects of Ambient Air Pollutants on Hospital Emergency room Visits for Cerebro-cardiovascular Diseases

    Authors: Menghui Li, Dasheng Luo, Chenghua Cao, Xiaochuan Pan, Qixin Wang

    Abstract: Laboratory experiments have shown that cardiovascular diseases are positively correlated to the concentration of ambient air pollutants, such as SO2, NO2, PM10, etc. It has also been repeatedly reported in many countries that increased concentration of ambient air pollutants leads to rise in hospital emergency room visitss for these diseases. These studies mainly adopt either regression analysis o… ▽ More

    Submitted 7 August, 2013; originally announced August 2013.