Skip to main content

Showing 1–50 of 83 results for author: Liang, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2404.04475  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

    Authors: Yann Dubois, Balázs Galambosi, Percy Liang, Tatsunori B. Hashimoto

    Abstract: LLM-based auto-annotators have become a key component of the LLM development process due to their cost-effectiveness and scalability compared to human-based evaluation. However, these auto-annotators can introduce complex biases that are hard to remove. Even simple, known confounders such as preference for longer outputs remain in existing automated evaluation metrics. We propose a simple regressi… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  2. arXiv:2306.08620  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Anticipatory Music Transformer

    Authors: John Thickstun, David Hall, Chris Donahue, Percy Liang

    Abstract: We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by pro… ▽ More

    Submitted 25 July, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: TMLR accepted version

  3. arXiv:2306.04539  [pdf, other

    cs.LG cs.CL cs.CV cs.IT stat.ML

    Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

    Authors: Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alex Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurri… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: ICLR 2024, Code available at: https://github.com/pliang279/PID

  4. arXiv:2306.04049  [pdf, other

    cs.LG cs.DS stat.ML

    One-sided Matrix Completion from Two Observations Per Row

    Authors: Steven Cao, Percy Liang, Gregory Valiant

    Abstract: Given only a few observed entries from a low-rank matrix $X$, matrix completion is the problem of imputing the missing entries, and it formalizes a wide range of real-world settings that involve estimating missing data. However, when there are too few observed entries to complete the matrix, what other aspects of the underlying matrix can be reliably recovered? We study one such problem setting, t… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  5. arXiv:2305.06884  [pdf, ps, other

    stat.ME cs.AI cs.LG math.ST stat.AP stat.ML

    Risk-limiting Financial Audits via Weighted Sampling without Replacement

    Authors: Shubhanshu Shekhar, Ziyu Xu, Zachary C. Lipton, Pierre J. Liang, Aaditya Ramdas

    Abstract: We introduce the notion of a risk-limiting financial auditing (RLFA): given $N$ transactions, the goal is to estimate the total misstated monetary fraction~($m^*$) to a given accuracy $ε$, with confidence $1-δ$. We do this by constructing new confidence sequences (CSs) for the weighted average of $N$ unknown values, based on samples drawn without replacement according to a (randomized) weighted sa… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: 23 pages, 8 figures, to appear in the Proceedings of Uncertainty in Artificial Intelligence (UAI) 2023

  6. arXiv:2302.03068  [pdf, other

    cs.LG cs.AI stat.ML

    Evaluating Self-Supervised Learning via Risk Decomposition

    Authors: Yann Dubois, Tatsunori Hashimoto, Percy Liang

    Abstract: Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supe… ▽ More

    Submitted 8 January, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Oral at ICML 2023

  7. arXiv:2210.04714  [pdf, other

    cs.CL cs.LG stat.ML

    Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

    Authors: Yuxin Xiao, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when w… ▽ More

    Submitted 14 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022 (Findings)

  8. arXiv:2209.06235  [pdf, other

    cs.LG stat.ML

    Improving Self-Supervised Learning by Characterizing Idealized Representations

    Authors: Yann Dubois, Tatsunori Hashimoto, Stefano Ermon, Percy Liang

    Abstract: Despite the empirical successes of self-supervised learning (SSL) methods, it is unclear what characteristics of their representations lead to high downstream accuracies. In this work, we characterize properties that SSL representations should ideally satisfy. Specifically, we prove necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes (e.… ▽ More

    Submitted 12 December, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: Accepted at NeurIPS 2022

  9. arXiv:2207.08977  [pdf, other

    cs.LG stat.ML

    Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

    Authors: Ananya Kumar, Tengyu Ma, Percy Liang, Aditi Raghunathan

    Abstract: We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM. In this paper, we find that ID-calibrated ensembles -- where we s… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted to UAI 2022

  10. arXiv:2207.07635  [pdf, other

    cs.CV cs.LG stat.ML

    Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning

    Authors: Shibani Santurkar, Yann Dubois, Rohan Taori, Percy Liang, Tatsunori Hashimoto

    Abstract: The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods. Our work studies this question through a carefully controlled comparison of two approaches in terms of their ability to learn representations that generalize to downstream classification tasks. We… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  11. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  12. arXiv:2112.05090  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Extending the WILDS Benchmark for Unsupervised Adaptation

    Authors: Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

    Abstract: Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribu… ▽ More

    Submitted 23 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  13. arXiv:2107.09044  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Just Train Twice: Improving Group Robustness without Training Group Information

    Authors: Evan Zheran Liu, Behzad Haghgoo, Annie S. Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn

    Abstract: Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training… ▽ More

    Submitted 27 September, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: International Conference on Machine Learning (ICML), 2021

  14. arXiv:2107.04649  [pdf, other

    cs.LG stat.ML

    Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

    Authors: John Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt

    Abstract: For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribut… ▽ More

    Submitted 7 October, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

  15. arXiv:2012.04550  [pdf, other

    cs.LG stat.ML

    In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

    Authors: Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

    Abstract: Consider a prediction setting with few in-distribution labeled examples and many unlabeled examples both in- and out-of-distribution (OOD). The goal is to learn a model which performs well both in-distribution and OOD. In these settings, auxiliary information is often cheaply available for every input. How should we best leverage this auxiliary information for the prediction task? Empirically acro… ▽ More

    Submitted 7 April, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: ICLR 2021

  16. arXiv:2012.04104  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

    Authors: Fereshte Khani, Percy Liang

    Abstract: The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove spurious features from the model. However, in this work we show that removal of spurious features can decrease accuracy due to the inductive biases of overparameterized models. We completely characterize how the removal of s… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

  17. arXiv:2012.02359  [pdf, other

    cs.LG cs.CY stat.AP

    Multimodal Privacy-preserving Mood Prediction from Mobile Data: A Preliminary Study

    Authors: Terrance Liu, Paul Pu Liang, Michal Muszynski, Ryo Ishii, David Brent, Randy Auerbach, Nicholas Allen, Louis-Philippe Morency

    Abstract: Mental health conditions remain under-diagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications towards the early detection and intervention of mental health disorders. One promising data source to help monitor human behavior is from daily smartphone usage. However,… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  18. arXiv:2010.14134  [pdf, other

    cs.LG stat.ML

    Selective Classification Can Magnify Disparities Across Groups

    Authors: Erik Jones, Shiori Sagawa, Pang Wei Koh, Ananya Kumar, Percy Liang

    Abstract: Selective classification, in which models can abstain on uncertain predictions, is a natural approach to improving accuracy in settings where errors are costly but abstentions are manageable. In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in… ▽ More

    Submitted 14 April, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: Published at the International Conference on Learning Representations (ICLR) 2021

  19. arXiv:2010.12648  [pdf, other

    cs.LG stat.ML

    An Investigation of how Label Smoothing Affects Generalization

    Authors: Blair Chen, Liu Ziyin, Zihao Wang, Paul Pu Liang

    Abstract: It has been hypothesized that label smoothing can reduce overfitting and improve generalization, and current empirical evidence seems to corroborate these effects. However, there is a lack of mathematical understanding of when and why such empirical improvements occur. In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how la… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  20. arXiv:2008.02790  [pdf, other

    cs.LG cs.AI stat.ML

    Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

    Authors: Evan Zheran Liu, Aditi Raghunathan, Percy Liang, Chelsea Finn

    Abstract: The goal of meta-reinforcement learning (meta-RL) is to build agents that can quickly learn new tasks by leveraging prior experience on related tasks. Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performan… ▽ More

    Submitted 11 November, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: International Conference on Machine Learning (ICML), 2021

  21. arXiv:2007.06661  [pdf, other

    cs.LG stat.ML

    Robustness to Spurious Correlations via Human Annotations

    Authors: Megha Srivastava, Tatsunori Hashimoto, Percy Liang

    Abstract: The reliability of machine learning systems critically assumes that the associations between features and labels remain similar between training and test distributions. However, unmeasured variables, such as confounders, break this assumption---useful correlations between features and labels at training time can become useless or even harmful at test time. For example, high obesity is generally pr… ▽ More

    Submitted 13 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: ICML 2020 final version, 16 pages, 9 figures

  22. arXiv:2007.05896  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

    Authors: Evan Zheran Liu, Ramtin Keramati, Sudarshan Seshadri, Kelvin Guu, Panupong Pasupat, Emma Brunskill, Percy Liang

    Abstract: Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions. However, learning an accurate Markov Decision Process (MDP) over high-dimensional states (e.g., raw pixels) is extremely challenging because it requires function approximation, which… ▽ More

    Submitted 11 July, 2020; originally announced July 2020.

  23. arXiv:2007.04612  [pdf, other

    cs.LG stat.ML

    Concept Bottleneck Models

    Authors: Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang

    Abstract: We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs", as they are trained end-to-end to go directly from raw input (e.g., pixels) to output (e.g., arthri… ▽ More

    Submitted 28 December, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Edited for clarity from the ICML 2020 version

  24. arXiv:2006.16205  [pdf, other

    cs.LG stat.ML

    Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

    Authors: Sang Michael Xie, Tengyu Ma, Percy Liang

    Abstract: We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the… ▽ More

    Submitted 24 October, 2023; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML 2021 Long talk

  25. arXiv:2005.10636  [pdf, other

    cs.SE cs.CL cs.LG cs.PL stat.ML

    Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

    Authors: Michihiro Yasunaga, Percy Liang

    Abstract: We consider the problem of learning to repair programs from diagnostic feedback (e.g., compiler error messages). Program repair is challenging for two reasons: First, it requires reasoning and tracking symbols across source code and diagnostic feedback. Second, labeled datasets available for program repair are relatively small. In this work, we propose novel solutions to these two challenges. Firs… ▽ More

    Submitted 30 June, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: ICML 2020. Code & data available at https://github.com/michiyasunaga/DrRepair

  26. arXiv:2005.04345  [pdf, other

    cs.LG cs.CV stat.ML

    An Investigation of Why Overparameterization Exacerbates Spurious Correlations

    Authors: Shiori Sagawa, Aditi Raghunathan, Pang Wei Koh, Percy Liang

    Abstract: We study why overparameterization -- increasing model size well beyond the point of zero training error -- can hurt test error on minority groups despite improving average test error when there are spurious correlations in the data. Through simulations and experiments on two image datasets, we identify two key properties of the training data that drive this behavior: the proportions of majority ve… ▽ More

    Submitted 26 August, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

  27. arXiv:2005.01932  [pdf, other

    cs.CL cs.LG stat.ML

    ExpBERT: Representation Engineering with Natural Language Explanations

    Authors: Shikhar Murty, Pang Wei Koh, Percy Liang

    Abstract: Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text. In this paper, we allow model developers to specify these types of inductive biases as natural language explanations. We use BERT fine-tuned on MultiNLI to ``interpret'' these explanations with respect to the input sentence, producing explanation-guid… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  28. arXiv:2003.08197  [pdf, other

    cs.LG cs.CL stat.ML

    Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

    Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed

    Abstract: Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing meth… ▽ More

    Submitted 11 March, 2021; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: ICLR 2021, code can be found at http://github.com/pliang279/sparse_discrete

  29. arXiv:2002.11361  [pdf, other

    cs.LG stat.ML

    Understanding Self-Training for Gradual Domain Adaptation

    Authors: Ananya Kumar, Tengyu Ma, Percy Liang

    Abstract: Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces. We consider gradual domain adaptation, where the goal is to adapt an initial classifier trained on a source domain given only unlabeled data that shifts gradually in distribution towards a target domain. W… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  30. arXiv:2002.10716  [pdf, other

    cs.LG stat.ML

    Understanding and Mitigating the Tradeoff Between Robustness and Accuracy

    Authors: Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang

    Abstract: Adversarial training augments the training set with perturbations to improve the robust error (over worst-case perturbations), but it often leads to an increase in the standard error (on unperturbed test inputs). Previous explanations for this tradeoff rely on the assumption that no predictor in the hypothesis class has low standard and robust error. In this work, we precisely characterize the eff… ▽ More

    Submitted 6 July, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Appearing at International Conference on Machine Learning (ICML) 2020

  31. arXiv:2002.06541  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Not to Learn in the Presence of Noisy Labels

    Authors: Liu Ziyin, Blair Chen, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

    Abstract: Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets. In this paper, we discover that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption. We show that training with this loss function encourages the model to… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

  32. arXiv:2001.01523  [pdf, other

    cs.LG cs.DC stat.ML

    Think Locally, Act Globally: Federated Learning with Local and Global Representations

    Authors: Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Federated learning is a method of training models on private data distributed over multiple devices. To keep device data private, the global model is trained by only communicating parameters and updates which poses scalability challenges for large models. To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model a… ▽ More

    Submitted 14 July, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

    Comments: NeurIPS 2019 Workshop on Federated Learning distinguished student paper award. Code: https://github.com/pliang279/LG-FedAvg

  33. arXiv:1911.09876  [pdf, other

    cs.LG stat.ML

    Feature Noise Induces Loss Discrepancy Across Groups

    Authors: Fereshte Khani, Percy Liang

    Abstract: The performance of standard learning procedures has been observed to differ widely across groups. Recent studies usually attribute this loss discrepancy to an information deficiency for one group (e.g., one group has less data). In this work, we point to a more subtle source of loss discrepancy---feature noise. Our main result is that even when there is no information deficiency specific to one gr… ▽ More

    Submitted 5 November, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

    Comments: ICML 2020

  34. arXiv:1911.09826  [pdf, other

    cs.LG cs.CL stat.ML

    Factorized Multimodal Transformer for Multimodal Sequential Learning

    Authors: Amir Zadeh, Chengfeng Mao, Kelly Shi, Yiwei Zhang, Paul Pu Liang, Soujanya Poria, Louis-Philippe Morency

    Abstract: The complex world around us is inherently multimodal and sequential (continuous). Information is scattered across different modalities and requires multiple continuous sensors to be captured. As machine learning leaps towards better generalization to real world, multimodal sequential learning becomes a fundamental research area. Arguably, modeling arbitrarily distributed spatio-temporal dynamics w… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

  35. arXiv:1911.08731  [pdf, other

    cs.LG stat.ML

    Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

    Authors: Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

    Abstract: Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find… ▽ More

    Submitted 2 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

  36. arXiv:1909.10155  [pdf, other

    cs.LG stat.ML

    Verified Uncertainty Calibration

    Authors: Ananya Kumar, Percy Liang, Tengyu Ma

    Abstract: Applications such as weather forecasting and personalized medicine demand models that output calibrated probability estimates---those representative of the true likelihood of a prediction. Most models are not calibrated out of the box but are recalibrated by post-processing model outputs. We find in this work that popular recalibration methods like Platt scaling and temperature scaling are (i) les… ▽ More

    Submitted 31 January, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

    Comments: Accepted as a spotlight to NeurIPS 2019, updated to include experiments for ECE

  37. arXiv:1909.02060  [pdf, other

    cs.CL cs.LG stat.ML

    Distributionally Robust Language Modeling

    Authors: Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang

    Abstract: Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the k… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Camera ready version for EMNLP

  38. arXiv:1907.01011  [pdf, other

    cs.LG cs.CL stat.ML

    Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization

    Authors: Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: There has been an increased interest in multimodal language processing including multimodal dialog, question answering, sentiment analysis, and speech recognition. However, naturally occurring multimodal data is often imperfect as a result of imperfect modalities, missing entries or noise corruption. To address these concerns, we present a regularization method based on tensor rank minimization. O… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  39. arXiv:1907.00208  [pdf, other

    cs.LG stat.ML

    Deep Gamblers: Learning to Abstain with Portfolio Theory

    Authors: Liu Ziyin, Zhikang Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

    Abstract: We deal with the \textit{selective classification} problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data. We transform the original $m$-class classification problem to $(m+1)$-class where the $(m+1)$-th class represents the model abstaining from making a prediction due to disconfidence. Inspired by po… ▽ More

    Submitted 1 October, 2019; v1 submitted 29 June, 2019; originally announced July 2019.

    Comments: Camera-Ready version for NeurIPS2019. Link to our code updated

  40. arXiv:1906.11829  [pdf, other

    cs.LG stat.ML

    Selection via Proxy: Efficient Data Selection for Deep Learning

    Authors: Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

    Abstract: Data selection methods, such as active learning and core-set selection, are useful tools for machine learning on large datasets. However, they can be prohibitively expensive to apply in deep learning because they depend on feature representations that need to be learned. In this work, we show that we can greatly improve the computational efficiency by using a small proxy model to perform data sele… ▽ More

    Submitted 26 October, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: ICLR 2020

  41. arXiv:1906.06032  [pdf, other

    cs.LG stat.ML

    Adversarial Training Can Hurt Generalization

    Authors: Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, Percy Liang

    Abstract: While adversarial training can improve robust accuracy (against an adversary), it sometimes hurts standard accuracy (when there is no adversary). Previous work has studied this tradeoff between standard and robust accuracy, but only in the setting where no predictor performs well on both objectives in the infinite data limit. In this paper, we show that even when the optimal predictor with infinit… ▽ More

    Submitted 26 August, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

  42. arXiv:1906.04908  [pdf, other

    cs.LG cs.CL cs.PL stat.ML

    SPoC: Search-based Pseudocode to Code

    Authors: Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang

    Abstract: We consider the task of mapping pseudocode to long programs that are functionally correct. Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation. However, without proper credit assignment to localize the sources of program failures, it is difficult to guide search toward more promising… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Under submission to NeurIPS 2019

  43. arXiv:1906.03518  [pdf, other

    cs.LG stat.ML

    Maximum Weighted Loss Discrepancy

    Authors: Fereshte Khani, Aditi Raghunathan, Percy Liang

    Abstract: Though machine learning algorithms excel at minimizing the average loss over a population, this might lead to large discrepancies between the losses across groups within the population. To capture this inequality, we introduce and study a notion we call maximum weighted loss discrepancy (MWLD), the maximum (weighted) difference between the loss of a group and the loss of the population. We relate… ▽ More

    Submitted 8 June, 2019; originally announced June 2019.

    Comments: ICLR 2019 Workshop. Safe Machine Learning: Specification, Robustness, and Assurance

  44. arXiv:1906.02125  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS stat.ML

    Strong and Simple Baselines for Multimodal Utterance Embeddings

    Authors: Paul Pu Liang, Yao Chong Lim, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Human language is a rich multimodal signal consisting of spoken words, facial expressions, body gestures, and vocal intonations. Learning representations for these spoken utterances is a complex research problem due to the presence of multiple heterogeneous sources of information. Recent advances in multimodal learning have followed the general trend of building more complex models that utilize va… ▽ More

    Submitted 28 February, 2020; v1 submitted 14 May, 2019; originally announced June 2019.

    Comments: NAACL 2019 oral presentation

  45. arXiv:1905.13736  [pdf, other

    stat.ML cs.CV cs.LG

    Unlabeled Data Improves Adversarial Robustness

    Authors: Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

    Abstract: We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high… ▽ More

    Submitted 13 January, 2022; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Corrected some math typos in the proof of Lemma 1

  46. arXiv:1905.13289  [pdf, other

    cs.LG stat.ML

    On the Accuracy of Influence Functions for Measuring Group Effects

    Authors: Pang Wei Koh, Kai-Siang Ang, Hubert H. K. Teo, Percy Liang

    Abstract: Influence functions estimate the effect of removing a training point on a model without the need to retrain. They are based on a first-order Taylor approximation that is guaranteed to be accurate for sufficiently small changes to the model, and so are commonly used to study the effect of individual points in large datasets. However, we often want to study the effects of large groups of training po… ▽ More

    Submitted 21 November, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

  47. arXiv:1905.12265  [pdf, other

    cs.LG stat.ML

    Strategies for Pre-training Graph Neural Networks

    Authors: Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec

    Abstract: Many applications of machine learning require a model to make accurate pre-dictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training. An effective approach to this challenge is to pre-train a model on related tasks where data is abundant, and then fine-tune it on a downstream task of interest. While pre-training has been… ▽ More

    Submitted 18 February, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Accepted as a spotlight to ICLR 2020

  48. arXiv:1904.02792  [pdf, other

    cs.CL cs.AI stat.ML

    Unifying Human and Statistical Evaluation for Natural Language Generation

    Authors: Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang

    Abstract: How can we measure whether a natural language generation system produces both high quality and diverse outputs? Human evaluation captures quality but not diversity, as it does not catch models that simply plagiarize from the training set. On the other hand, statistical evaluation (i.e., perplexity) captures diversity but not quality, as models that occasionally emit low quality samples would be in… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: NAACL Camera Ready Submission

  49. arXiv:1903.10586  [pdf, other

    cs.LG cs.CR stat.ML

    Defending against Whitebox Adversarial Attacks via Randomized Discretization

    Authors: Yuchen Zhang, Percy Liang

    Abstract: Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence b… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

    Comments: In proceedings of the 22nd International Conference on Artificial Intelligence and Statistics

  50. arXiv:1903.00840  [pdf, other

    cs.LG cs.AI stat.ML

    Variational Auto-Decoder: A Method for Neural Generative Modeling from Incomplete Data

    Authors: Amir Zadeh, Yao-Chong Lim, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Learning a generative model from partial data (data with missingness) is a challenging area of machine learning research. We study a specific implementation of the Auto-Encoding Variational Bayes (AEVB) algorithm, named in this paper as a Variational Auto-Decoder (VAD). VAD is a generic framework which uses Variational Bayes and Markov Chain Monte Carlo (MCMC) methods to learn a generative model f… ▽ More

    Submitted 3 January, 2021; v1 submitted 3 March, 2019; originally announced March 2019.

    Comments: Link to code and data available from https://github.com/A2Zadeh/Variational-Autodecoder