Skip to main content

Showing 1–50 of 107 results for author: Zhao, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.16605  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    CLEAR: Can Language Models Really Understand Causal Graphs?

    Authors: Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu

    Abstract: Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we devel… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.13876  [pdf, other

    stat.ME

    An Empirical Bayes Jackknife Regression Framework for Covariance Matrix Estimation

    Authors: Huqin Xin, Sihai Dave Zhao

    Abstract: Covariance matrix estimation, a classical statistical topic, poses significant challenges when the sample size is comparable to or smaller than the number of features. In this paper, we frame covariance matrix estimation as a compound decision problem and apply an optimal decision rule to estimate covariance parameters. To approximate this rule, we introduce an algorithm that integrates jackknife… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages, 3 figures

    MSC Class: 62C25

  3. arXiv:2405.13785  [pdf, other

    cs.LG cs.AI math.PR stat.ML

    Efficient Two-Stage Gaussian Process Regression Via Automatic Kernel Search and Subsampling

    Authors: Shifan Zhao, Jiaying Lu, Ji Yang, Edmond Chow, Yuanzhe Xi

    Abstract: Gaussian Process Regression (GPR) is widely used in statistics and machine learning for prediction tasks requiring uncertainty measures. Its efficacy depends on the appropriate specification of the mean function, covariance kernel function, and associated hyperparameters. Severe misspecifications can lead to inaccurate results and problematic consequences, especially in safety-critical application… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    ACM Class: G.3; J.3

  4. arXiv:2405.03879  [pdf, other

    stat.ML cs.LG q-bio.GN stat.AP

    Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

    Authors: Sarah Zhao, Aditya Ravuri, Vidhi Lalchand, Neil D. Lawrence

    Abstract: Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  5. arXiv:2404.17546  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

    Authors: Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse

    Abstract: Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence. In this work, we leverage the rich toolkit of Sequential Monte Carlo (SMC) for these probabilistic inference probl… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  6. arXiv:2402.14090  [pdf, other

    cs.AI econ.GN stat.ML

    Social Environment Design

    Authors: Edwin Zhang, Sadie Zhao, Tonghan Wang, Safwan Hossain, Henry Gasztowtt, Stephan Zheng, David C. Parkes, Milind Tambe, Yiling Chen

    Abstract: Artificial Intelligence (AI) holds promise as a technology that can be used to improve government and economic policy-making. This paper proposes a new research agenda towards this end by introducing Social Environment Design, a general framework for the use of AI for automated policy-making that connects with the Reinforcement Learning, EconCS, and Computational Social Choice communities. The fra… ▽ More

    Submitted 17 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024 Position Paper. Website at https://sed.eddie.win

  7. arXiv:2402.07210  [pdf, other

    math.DS econ.GN physics.soc-ph stat.AP

    Fukushima Nuclear Wastewater Discharge: An Evolutionary Game Theory Approach to International and Domestic Interaction and Strategic Decision-Making

    Authors: Mingyang Li, Han Pengsihua, Songqing Zhao, Zejun Wang, Limin Yang, Weian Liu

    Abstract: On August 24, 2023, Japan controversially decided to discharge nuclear wastewater from the Fukushima Daiichi Nuclear Power Plant into the ocean, sparking intense domestic and global debates. This study uses evolutionary game theory to analyze the strategic dynamics between Japan, other countries, and the Japan Fisheries Association. By incorporating economic, legal, international aid, and environm… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  8. arXiv:2311.02273  [pdf, ps, other

    stat.ME

    A Sequential Learning Procedure with Applications to Online Sales Examination

    Authors: Jun Hu, Yan Zhuang, Shunan Zhao

    Abstract: In this paper, we consider the problem of estimating parameters in a linear regression model. We propose a sequential learning procedure to determine the sample size for achieving a given small estimation risk, under the widely used Gauss-Markov setup with independent normal errors. The procedure is proven to enjoy the second-order efficiency and risk-efficiency properties, which are validated thr… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    MSC Class: 62L12; 62L05; 62L10

  9. arXiv:2310.07187  [pdf, other

    stat.ML cs.LG

    Kernel Cox partially linear regression: building predictive models for cancer patients' survival

    Authors: Yaohua Rong, Sihai Dave Zhao, Xia Zheng, Yi Li

    Abstract: Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  10. arXiv:2308.12666  [pdf, other

    cs.LG stat.ML

    Geodesic Mode Connectivity

    Authors: Charlie Tan, Theodore Long, Sarah Zhao, Rudolf Laine

    Abstract: Mode connectivity is a phenomenon where trained models are connected by a path of low loss. We reframe this in the context of Information Geometry, where neural networks are studied as spaces of parameterized distributions with curved geometry. We hypothesize that shortest paths in these spaces, known as geodesics, correspond to mode-connecting paths in the loss landscape. We propose an algorithm… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Published as a TinyPaper at ICLR 2023

  11. arXiv:2306.15380  [pdf, ps, other

    stat.ME stat.OT

    Multivariate Rank-Based Analysis of Multiple Endpoints in Clinical Trials: A Global Test Approach

    Authors: Kexuan Li, Lingli Yang, Shaofei Zhao, Susie Sinks, Luan Lin, Peng Sun

    Abstract: Clinical trials often involve the assessment of multiple endpoints to comprehensively evaluate the efficacy and safety of interventions. In the work, we consider a global nonparametric testing procedure based on multivariate rank for the analysis of multiple endpoints in clinical trials. Unlike other existing approaches that rely on pairwise comparisons for each individual endpoint, the proposed m… ▽ More

    Submitted 27 June, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  12. arXiv:2306.07239  [pdf, ps, other

    stat.ME

    Nonparametric empirical Bayes biomarker imputation and estimation

    Authors: Alton Barbehenn, Sihai Dave Zhao

    Abstract: Biomarkers are often measured in bulk to diagnose patients, monitor patient conditions, and research novel drug pathways. The measurement of these biomarkers often suffers from detection limits that result in missing and untrustworthy measurements. Frequently, missing biomarkers are imputed so that down-stream analysis can be conducted with modern statistical methods that cannot normally handle da… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  13. arXiv:2306.01757  [pdf, ps, other

    stat.AP eess.SY math.DS

    State estimation for one-dimensional agro-hydrological processes with model mismatch

    Authors: Zhuangyu Liu, Jinfeng Liu, Shunyi Zhao, Xiaoli Luan, Fei Liu

    Abstract: The importance of accurate soil moisture data for the development of modern closed-loop irrigation systems cannot be overstated. Due to the diversity of soil, it is difficult to obtain an accurate model for agro-hydrological system. In this study, soil moisture estimation in 1D agro-hydrological systems with model mismatch is the focus. To address the problem of model mismatch, a nonlinear state-s… ▽ More

    Submitted 24 May, 2023; originally announced June 2023.

  14. arXiv:2210.01383  [pdf, other

    stat.ML cs.AI cs.LG

    Generalizing Bayesian Optimization with Decision-theoretic Entropies

    Authors: Willie Neiswanger, Lantao Yu, Shengjia Zhao, Chenlin Meng, Stefano Ermon

    Abstract: Bayesian optimization (BO) is a popular method for efficiently inferring optima of an expensive black-box function via a sequence of queries. Existing information-theoretic BO procedures aim to make queries that most reduce the uncertainty about optima, where the uncertainty is captured by Shannon entropy. However, an optimal measure of uncertainty would, ideally, factor in how we intend to use th… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: Appears in Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  15. arXiv:2207.05471  [pdf, other

    stat.ML cs.LG

    Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets

    Authors: Yingsong Huang, Bing Bai, Shengwei Zhao, Kun Bai, Fei Wang

    Abstract: Learning against label noise is a vital topic to guarantee a reliable performance for deep neural networks. Recent research usually refers to dynamic noise modeling with model output probabilities and loss values, and then separates clean and noisy samples. These methods have gained notable success. However, unlike cherry-picked data, existing approaches often cannot perform well when facing imbal… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

  16. arXiv:2206.11468  [pdf, other

    cs.LG stat.ML

    Modular Conformal Calibration

    Authors: Charles Marx, Shengjia Zhao, Willie Neiswanger, Stefano Ermon

    Abstract: Uncertainty estimates must be calibrated (i.e., accurate) and sharp (i.e., informative) in order to be useful. This has motivated a variety of methods for recalibration, which use held-out data to turn an uncalibrated model into a calibrated model. However, the applicability of existing methods is limited due to their assumption that the original model is also a probabilistic model. We introduce a… ▽ More

    Submitted 5 July, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  17. arXiv:2205.12202  [pdf, other

    stat.ME

    From differential abundance to mtGWAS: accurate and scalable methodology for metabolomics data with non-ignorable missing observations and latent factors

    Authors: Shangshu Zhao, Kedir Turi, Tina Hartert, Carole Ober, Klaus Bonnelykke, Bo Chawes, Hans Bisgaard, Chris McKennan

    Abstract: Metabolomics is the high-throughput study of small molecule metabolites. Besides offering novel biological insights, these data contain unique statistical challenges, the most glaring of which is the many non-ignorable missing metabolite observations. To address this issue, nearly all analysis pipelines first impute missing observations, and subsequently perform analyses with methods designed for… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: 19 pages of main text; 89 pages with supplement; 3 figures and 2 tables

  18. arXiv:2202.07652  [pdf, other

    cs.LG cs.AI stat.ML

    Predicting on the Edge: Identifying Where a Larger Model Does Better

    Authors: Taman Narayan, Heinrich Jiang, Sen Zhao, Sanjiv Kumar

    Abstract: Much effort has been devoted to making large and more accurate models, but relatively little has been put into understanding which examples are benefiting from the added complexity. In this paper, we demonstrate and analyze the surprisingly tight link between a model's predictive uncertainty on individual examples and the likelihood that larger models will improve prediction on them. Through exten… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  19. arXiv:2202.01940  [pdf, other

    stat.ML cs.LG

    Distribution Embedding Networks for Generalization from a Diverse Set of Classification Tasks

    Authors: Lang Liu, Mahdi Milani Fard, Sen Zhao

    Abstract: We propose Distribution Embedding Networks (DEN) for classification with small data. In the same spirit of meta-learning, DEN learns from a diverse set of training tasks with the goal to generalize to unseen target tasks. Unlike existing approaches which require the inputs of training and target tasks to have the same dimension with possibly similar distributions, DEN allows training and target ta… ▽ More

    Submitted 31 December, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: This paper is accepted at TMLR https://openreview.net/forum?id=F2rG2CXsgO

  20. arXiv:2202.01277  [pdf, other

    stat.ML cs.LG

    Global Optimization Networks

    Authors: Sen Zhao, Erez Louidor, Olexander Mangylov, Maya Gupta

    Abstract: We consider the problem of estimating a good maximizer of a black-box function given noisy examples. To solve such problems, we propose to fit a new type of function which we call a global optimization network (GON), defined as any composition of an invertible function and a unimodal function, whose unique global maximizer can be inferred in $\mathcal{O}(D)$ time. In this paper, we show how to con… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  21. arXiv:2201.11936  [pdf, other

    cs.IR cs.LG stat.ML

    Consistent Collaborative Filtering via Tensor Decomposition

    Authors: Shiwen Zhao, Charles Crissman, Guillermo R Sapiro

    Abstract: Collaborative filtering is the de facto standard for analyzing users' activities and building recommendation systems for items. In this work we develop Sliced Anti-symmetric Decomposition (SAD), a new model for collaborative filtering based on implicit feedback. In contrast to traditional techniques where a latent representation of users (user vectors) and items (item vectors) are estimated, SAD i… ▽ More

    Submitted 10 July, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

  22. arXiv:2111.07878  [pdf, other

    stat.ME

    An Approach of Bayesian Variable Selection for Ultrahigh Dimensional Multivariate Regression

    Authors: Xiaotian Dai, Guifang Fu, Randall Reese, Shaofei Zhao, Zuofeng Shang

    Abstract: In many practices, scientists are particularly interested in detecting which of the predictors are truly associated with a multivariate response. It is more accurate to model multiple responses as one vector rather than separating each component one by one. This is particularly true for complex traits having multiple correlated components. A Bayesian multivariate variable selection (BMVS) approach… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  23. arXiv:2111.05708  [pdf

    cs.LG stat.ML

    STNN-DDI: A Substructure-aware Tensor Neural Network to Predict Drug-Drug Interactions

    Authors: Hui Yu, ShiYu Zhao, JianYu Shi

    Abstract: Motivation: Computational prediction of multiple-type drug-drug interaction (DDI) helps reduce unexpected side effects in poly-drug treatments. Although existing computational approaches achieve inspiring results, they ignore that the action of a drug is mainly caused by its chemical substructures. In addition, their interpretability is still weak. Results: In this paper, by supposing that the int… ▽ More

    Submitted 5 December, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

  24. Distribution-free and Model-free Multivariate Feature Screening via Multivariate Rank Distance Correlation

    Authors: Shaofei Zhao, Guifang Fu

    Abstract: Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, the majority of existing feature screening approaches are either restricted to a univariate response or rely on some distribution or model assumptions. In this article, we propose a novel sure independence screening approach based on the multivariate r… ▽ More

    Submitted 5 May, 2023; v1 submitted 6 October, 2021; originally announced October 2021.

    Journal ref: Journal of Multivariate Analysis 192 (2022): 105081

  25. arXiv:2107.05719  [pdf, other

    stat.ML cs.LG stat.ME

    Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration

    Authors: Shengjia Zhao, Michael P. Kim, Roshni Sahoo, Tengyu Ma, Stefano Ermon

    Abstract: When facing uncertainty, decision-makers want predictions they can trust. A machine learning provider can convey confidence to decision-makers by guaranteeing their predictions are distribution calibrated -- amongst the inputs that receive a predicted class probabilities vector $q$, the actual distribution over classes is $q$. For multi-class prediction problems, however, achieving distribution ca… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  26. arXiv:2104.08970  [pdf, other

    stat.ME

    Linear shrinkage for predicting responses in large-scale multivariate linear regression

    Authors: Yihe Wang, Sihai Dave Zhao

    Abstract: We propose a new prediction method for multivariate linear regression problems where the number of features is less than the sample size but the number of outcomes is extremely large. Many popular procedures, such as penalized regression procedures, require parameter tuning that is computationally untenable in such large-scale problems. We take a different approach, motivated by ideas from simulta… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  27. arXiv:2104.08157  [pdf, other

    cs.LG stat.ME

    Capturing patterns of variation unique to a specific dataset

    Authors: Robin Tu, Alexander H. Foss, Sihai D. Zhao

    Abstract: Capturing patterns of variation present in a dataset is important in exploratory data analysis and unsupervised learning. Contrastive dimension reduction methods, such as contrastive principal component analysis (cPCA), find patterns unique to a target dataset of interest by contrasting with a carefully chosen background dataset representing unwanted or uninteresting variation. However, such metho… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  28. arXiv:2011.07476  [pdf, other

    stat.ML cs.GT cs.LG math.PR stat.AP

    Right Decisions from Wrong Predictions: A Mechanism Design Alternative to Individual Calibration

    Authors: Shengjia Zhao, Stefano Ermon

    Abstract: Decision makers often need to rely on imperfect probabilistic forecasts. While average performance metrics are typically available, it is difficult to assess the quality of individual forecasts and the corresponding utilities. To convey confidence about individual predictions to decision-makers, we propose a compensation mechanism ensuring that the forecasted utility matches the actually accrued u… ▽ More

    Submitted 2 March, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

  29. arXiv:2010.04966  [pdf, other

    cs.LG stat.ML

    Effective Data-aware Covariance Estimator from Compressed Data

    Authors: Xixian Chen, Haiqin Yang, Shenglin Zhao, Michael R. Lyu, Irwin King

    Abstract: Estimating covariance matrix from massive high-dimensional and distributed data is significant for various real-world applications. In this paper, we propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation and attain more accurate estimation under the same compression ratio. Moreover, we extend our proposed D… ▽ More

    Submitted 10 October, 2020; originally announced October 2020.

    Comments: 12 pages, 5 figures

    Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2019

  30. arXiv:2010.04948  [pdf, other

    cs.LG stat.ML

    Making Online Sketching Hashing Even Faster

    Authors: Xixian Chen, Haiqin Yang, Shenglin Zhao, Michael R. Lyu, Irwin King

    Abstract: Data-dependent hashing methods have demonstrated good performance in various machine learning applications to learn a low-dimensional representation from the original data. However, they still suffer from several obstacles: First, most of existing hashing methods are trained in a batch mode, yielding inefficiency for training streaming data. Second, the computational cost and the memory consumptio… ▽ More

    Submitted 10 October, 2020; originally announced October 2020.

    Comments: 12 pages, 5 figures

    Journal ref: IEEE Transactions on Knowledge and Data Engineering, 2019

  31. arXiv:2008.09643  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Privacy Preserving Recalibration under Domain Shift

    Authors: Rachel Luo, Shengjia Zhao, Jiaming Song, Jonathan Kuck, Stefano Ermon, Silvio Savarese

    Abstract: Classifiers deployed in high-stakes real-world applications must output calibrated confidence scores, i.e. their predicted probabilities should reflect empirical frequencies. Recalibration algorithms can greatly improve a model's probability estimates; however, existing algorithms are not applicable in real-world situations where the test data follows a different distribution from the training dat… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  32. Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

    Authors: Shen-Yi Zhao, Chang-Wei Shi, Yin-Peng Xie, Wu-Jun Li

    Abstract: Stochastic gradient descent~(SGD) and its variants have been the dominating optimization methods in machine learning. Compared to SGD with small-batch training, SGD with large-batch training can better utilize the computational power of current multi-core systems such as graphics processing units~(GPUs) and can reduce the number of communication rounds in distributed training settings. Thus, SGD w… ▽ More

    Submitted 14 April, 2024; v1 submitted 28 July, 2020; originally announced July 2020.

  33. arXiv:2007.12355  [pdf, other

    cs.LG stat.ML

    Dynamic Knowledge Distillation for Black-box Hypothesis Transfer Learning

    Authors: Yiqin Yu, Xu Min, Shiwan Zhao, Jing Mei, Fei Wang, Dongsheng Li, Kenney Ng, Shaochun Li

    Abstract: In real world applications like healthcare, it is usually difficult to build a machine learning prediction model that works universally well across different institutions. At the same time, the available model is often proprietary, i.e., neither the model parameter nor the data set used for model training is accessible. In consequence, leveraging the knowledge hidden in the available model (aka. t… ▽ More

    Submitted 6 August, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 7 pages, 2 figures

  34. arXiv:2007.02832  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

    Authors: Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba

    Abstract: What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to op… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: 12 pages (+12 appendix). Published as a conference paper at ICML 2020. Code available at https://github.com/spitis/mrl

  35. arXiv:2006.13484  [pdf, other

    cs.LG cs.CL cs.DC stat.ML

    Accelerated Large Batch Optimization of BERT Pretraining in 54 minutes

    Authors: Shuai Zheng, Haibin Lin, Sheng Zha, Mu Li

    Abstract: BERT has recently attracted a lot of attention in natural language understanding (NLU) and achieved state-of-the-art results in various NLU tasks. However, its success requires large deep neural networks and huge amount of data, which result in long training time and impede development progress. Using stochastic gradient methods with large mini-batch has been advocated as an efficient tool to redu… ▽ More

    Submitted 18 September, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: Technical Report (not under reviewed in any venue)

  36. arXiv:2006.10288  [pdf, other

    stat.ML cs.LG

    Individual Calibration with Randomized Forecasting

    Authors: Shengjia Zhao, Tengyu Ma, Stefano Ermon

    Abstract: Machine learning applications often require calibrated predictions, e.g. a 90\% credible interval should contain the true outcome 90\% of the times. However, typical definitions of calibration only require this to hold on average, and offer no guarantees on predictions made on individual samples. Thus, predictions can be systematically over or under confident on certain subgroups, leading to issue… ▽ More

    Submitted 9 September, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  37. arXiv:2006.10287  [pdf, other

    stat.ML cs.LG

    A Framework for Sample Efficient Interval Estimation with Control Variates

    Authors: Shengjia Zhao, Christopher Yeh, Stefano Ermon

    Abstract: We consider the problem of estimating confidence intervals for the mean of a random variable, where the goal is to produce the smallest possible interval for a given number of samples. While minimax optimal algorithms are known for this problem in the general case, improved performance is possible under additional assumptions. In particular, we design an estimation algorithm to take advantage of s… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

  38. arXiv:2005.04549  [pdf, other

    stat.ME math.ST

    A Compound Decision Approach to Covariance Matrix Estimation

    Authors: Huiqin Xin, Sihai Dave Zhao

    Abstract: Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is sub-optimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings are common in modern genomics, where covariance matrix estimation is frequently employed as a method for inferring gene networks. To achieve estimation accuracy… ▽ More

    Submitted 2 June, 2022; v1 submitted 9 May, 2020; originally announced May 2020.

    Comments: 20 pages, 4 figures. Biometrics (2022)

    MSC Class: 62C12 (Primary) 62C25 (Secondary)

  39. arXiv:2003.00638  [pdf, other

    cs.LG stat.ML

    Permutation Invariant Graph Generation via Score-Based Generative Modeling

    Authors: Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, Stefano Ermon

    Abstract: Learning generative models for graph-structured data is challenging because graphs are discrete, combinatorial, and the underlying data distribution is invariant to the ordering of nodes. However, most of the existing generative models for graphs are not invariant to the chosen ordering, which might lead to an undesirable bias in the learned distribution. To address this difficulty, we propose a p… ▽ More

    Submitted 1 March, 2020; originally announced March 2020.

    Comments: 14 pages, AISTATS 2020

  40. arXiv:2002.12486  [pdf

    math.OC cs.LG stat.ML

    Distributionally Robust Chance Constrained Programming with Generative Adversarial Networks (GANs)

    Authors: Shipu Zhao, Fengqi You

    Abstract: This paper presents a novel deep learning based data-driven optimization method. A novel generative adversarial network (GAN) based data-driven distributionally robust chance constrained programming framework is proposed. GAN is applied to fully extract distributional information from historical data in a nonparametric and unsupervised way without a priori approximation or assumption. Since GAN ut… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Journal ref: AIChE Journal, Volume 66, Issue 6, June 2020, e16963

  41. arXiv:2002.12169  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey

    Authors: Sicheng Zhao, Bo Li, Colorado Reed, Pengfei Xu, Kurt Keutzer

    Abstract: In many practical applications, it is often difficult and expensive to obtain enough large-scale labeled data to train deep neural networks to their full capability. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  42. arXiv:2002.11601  [pdf, other

    stat.ML cs.LG

    Stagewise Enlargement of Batch Size for SGD-based Learning

    Authors: Shen-Yi Zhao, Yin-Peng Xie, Wu-Jun Li

    Abstract: Existing research shows that the batch size can seriously affect the performance of stochastic gradient descent~(SGD) based learning, including training speed and generalization ability. A larger batch size typically results in less parameter updates. In distributed training, a larger batch size also results in less frequent communication. However, a larger batch size can make a generalization gap… ▽ More

    Submitted 26 February, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

  43. arXiv:2002.10689  [pdf, other

    cs.LG stat.ML

    A Theory of Usable Information Under Computational Constraints

    Authors: Yilun Xu, Shengjia Zhao, Jiaming Song, Russell Stewart, Stefano Ermon

    Abstract: We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive $\mathcal{V}$-information} encompasses mutual information and other notions of informativeness such as the coefficien… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

    Comments: ICLR 2020 (Talk)

  44. arXiv:2002.05033  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Active Learning for Sound Event Detection

    Authors: Shuyang Zhao, Toni Heittola, Tuomas Virtanen

    Abstract: This paper proposes an active learning system for sound event detection (SED). It aims at maximizing the accuracy of a learned SED model with limited annotation effort. The proposed system analyzes an initially unlabeled audio dataset, from which it selects sound segments for manual annotation. The candidate segments are generated based on a proposed change point detection approach, and the select… ▽ More

    Submitted 9 September, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

  45. arXiv:2001.04601  [pdf, other

    stat.ML cs.LG

    For2For: Learning to forecast from forecasts

    Authors: Shi Zhao, Ying Feng

    Abstract: This paper presents a time series forecasting framework which combines standard forecasting methods and a machine learning model. The inputs to the machine learning model are not lagged values or regular time series features, but instead forecasts produced by standard methods. The machine learning model can be either a convolutional neural network model or a recurrent neural network model. The int… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

  46. arXiv:1912.12121  [pdf, other

    cs.CV cs.LG stat.ML

    Approximating Human Judgment of Generated Image Quality

    Authors: Y. Alex Kolchinski, Sharon Zhou, Shengjia Zhao, Mitchell Gordon, Stefano Ermon

    Abstract: Generative models have made immense progress in recent years, particularly in their ability to generate high quality images. However, that quality has been difficult to evaluate rigorously, with evaluation dominated by heuristic approaches that do not correlate well with human judgment, such as the Inception Score and Fréchet Inception Distance. Real human labels have also been used in evaluation,… ▽ More

    Submitted 30 November, 2019; originally announced December 2019.

    Comments: To appear in the Shared Visual Representations in Human and Machine Intelligence workshop at NeurIPS 2019. The first two authors contributed equally to the manuscript

  47. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  48. arXiv:1912.04838  [pdf, other

    cs.CV cs.LG stat.ML

    Scalability in Perception for Autonomous Driving: Waymo Open Dataset

    Authors: Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov

    Abstract: The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help a… ▽ More

    Submitted 12 May, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: CVPR 2020

  49. arXiv:1911.11554  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-source Distilling Domain Adaptation

    Authors: Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, Kurt Keutzer

    Abstract: Deep neural networks suffer from performance decay when there is domain shift between the labeled source domain and unlabeled target domain, which motivates the research on domain adaptation (DA). Conventional DA methods usually assume that the labeled data is sampled from a single source distribution. However, in practice, labeled data may be collected from multiple sources, while naive applicati… ▽ More

    Submitted 7 February, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI 2020

  50. arXiv:1910.12457  [pdf, ps, other

    stat.ME

    Estimation and inference for the indirect effect in high-dimensional linear mediation models

    Authors: Ruixuan Rachel Zhou, Liewei Wang, Sihai Dave Zhao

    Abstract: Mediation analysis is difficult when the number of potential mediators is larger than the sample size. In this paper we propose new inference procedures for the indirect effect in the presence of high-dimensional mediators for linear mediation models. We develop methods for both incomplete mediation, where a direct effect may exist, as well as complete mediation, where the direct effect is known t… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: To appear in Biometrika