Skip to main content

Showing 1–28 of 28 results for author: Hashimoto, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.10938  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Observational Scaling Laws and the Predictability of Language Model Performance

    Authors: Yangjun Ruan, Chris J. Maddison, Tatsunori Hashimoto

    Abstract: Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~80 publically a… ▽ More

    Submitted 2 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  2. arXiv:2404.04475  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

    Authors: Yann Dubois, Balázs Galambosi, Percy Liang, Tatsunori B. Hashimoto

    Abstract: LLM-based auto-annotators have become a key component of the LLM development process due to their cost-effectiveness and scalability compared to human-based evaluation. However, these auto-annotators can introduce complex biases that are hard to remove. Even simple, known confounders such as preference for longer outputs remain in existing automated evaluation metrics. We propose a simple regressi… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  3. arXiv:2404.00474  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Linguistic Calibration of Long-Form Generations

    Authors: Neil Band, Xuechen Li, Tengyu Ma, Tatsunori Hashimoto

    Abstract: Language models (LMs) may lead their users to make suboptimal downstream decisions when they confidently hallucinate. This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form gen… ▽ More

    Submitted 4 June, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: ICML 2024. Code available at https://github.com/tatsu-lab/linguistic_calibration

  4. arXiv:2310.18413  [pdf, other

    cs.LG cs.AI stat.ML

    On the Fairness ROAD: Robust Optimization for Adversarial Debiasing

    Authors: Vincent Grari, Thibault Laugel, Tatsunori Hashimoto, Sylvain Lamprier, Marcin Detyniecki

    Abstract: In the field of algorithmic fairness, significant attention has been put on group fairness criteria, such as Demographic Parity and Equalized Odds. Nevertheless, these objectives, measured as global averages, have raised concerns about persistent local disparities between sensitive groups. In this work, we address the problem of local fairness, which ensures that the predictor is unbiased not only… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 23 pages, 10 figures

  5. arXiv:2302.03068  [pdf, other

    cs.LG cs.AI stat.ML

    Evaluating Self-Supervised Learning via Risk Decomposition

    Authors: Yann Dubois, Tatsunori Hashimoto, Percy Liang

    Abstract: Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supe… ▽ More

    Submitted 8 January, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Oral at ICML 2023

  6. arXiv:2209.06235  [pdf, other

    cs.LG stat.ML

    Improving Self-Supervised Learning by Characterizing Idealized Representations

    Authors: Yann Dubois, Tatsunori Hashimoto, Stefano Ermon, Percy Liang

    Abstract: Despite the empirical successes of self-supervised learning (SSL) methods, it is unclear what characteristics of their representations lead to high downstream accuracies. In this work, we characterize properties that SSL representations should ideally satisfy. Specifically, we prove necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes (e.… ▽ More

    Submitted 12 December, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: Accepted at NeurIPS 2022

  7. arXiv:2209.03942  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Data Feedback Loops: Model-driven Amplification of Dataset Biases

    Authors: Rohan Taori, Tatsunori B. Hashimoto

    Abstract: Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as trai… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  8. arXiv:2207.07635  [pdf, other

    cs.CV cs.LG stat.ML

    Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning

    Authors: Shibani Santurkar, Yann Dubois, Rohan Taori, Percy Liang, Tatsunori Hashimoto

    Abstract: The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods. Our work studies this question through a carefully controlled comparison of two approaches in terms of their ability to learn representations that generalize to downstream classification tasks. We… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  9. arXiv:2207.00160  [pdf, other

    cs.LG cs.CR stat.ML

    When Does Differentially Private Learning Not Suffer in High Dimensions?

    Authors: Xuechen Li, Daogao Liu, Tatsunori Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin Tat Lee, Abhradeep Guha Thakurta

    Abstract: Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following researc… ▽ More

    Submitted 26 October, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: 26 pages; v3 includes additional experiments and clarification

  10. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  11. arXiv:2205.13094  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

    Authors: Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

    Abstract: While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundam… ▽ More

    Submitted 19 June, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

  12. arXiv:2112.12986  [pdf, other

    cs.LG stat.ML

    Is Importance Weighting Incompatible with Interpolating Classifiers?

    Authors: Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

    Abstract: Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We sh… ▽ More

    Submitted 4 March, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: International Conference on Learning Representations (ICLR), 2022

  13. arXiv:2112.05090  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Extending the WILDS Benchmark for Unsupervised Adaptation

    Authors: Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang

    Abstract: Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data and can often be obtained from distributions beyond the source distribution as well. However, existing distribu… ▽ More

    Submitted 23 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  14. arXiv:2107.12525  [pdf, ps, other

    math.ST cs.DB cs.LG stat.ML

    Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

    Authors: Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

    Abstract: Given a dataset $\mathcal{D}$, we are interested in computing the mean of a subset of $\mathcal{D}$ which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget $N$. In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate $O(N_1^{-1} + N_2^{-1} + N_1^{1/2}N_2^{-3/2})$, where… ▽ More

    Submitted 28 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  15. arXiv:2007.13982  [pdf, other

    cs.LG stat.ML

    Distributionally Robust Losses for Latent Covariate Mixtures

    Authors: John Duchi, Tatsunori Hashimoto, Hongseok Namkoong

    Abstract: While modern large-scale datasets often consist of heterogeneous subpopulations -- for example, multiple demographic groups or multiple text corpora -- the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes… ▽ More

    Submitted 10 August, 2022; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: First released in 2019 on a personal website; published in Operations Research in 2022

  16. arXiv:2007.06661  [pdf, other

    cs.LG stat.ML

    Robustness to Spurious Correlations via Human Annotations

    Authors: Megha Srivastava, Tatsunori Hashimoto, Percy Liang

    Abstract: The reliability of machine learning systems critically assumes that the associations between features and labels remain similar between training and test distributions. However, unmeasured variables, such as confounders, break this assumption---useful correlations between features and labels at training time can become useless or even harmful at test time. For example, high obesity is generally pr… ▽ More

    Submitted 13 August, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: ICML 2020 final version, 16 pages, 9 figures

  17. arXiv:1911.08731  [pdf, other

    cs.LG stat.ML

    Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

    Authors: Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang

    Abstract: Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find… ▽ More

    Submitted 2 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

  18. arXiv:1909.02060  [pdf, other

    cs.CL cs.LG stat.ML

    Distributionally Robust Language Modeling

    Authors: Yonatan Oren, Shiori Sagawa, Tatsunori B. Hashimoto, Percy Liang

    Abstract: Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the k… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Camera ready version for EMNLP

  19. arXiv:1904.02792  [pdf, other

    cs.CL cs.AI stat.ML

    Unifying Human and Statistical Evaluation for Natural Language Generation

    Authors: Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang

    Abstract: How can we measure whether a natural language generation system produces both high quality and diverse outputs? Human evaluation captures quality but not diversity, as it does not catch models that simply plagiarize from the training set. On the other hand, statistical evaluation (i.e., perplexity) captures diversity but not quality, as models that occasionally emit low quality samples would be in… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

    Comments: NAACL Camera Ready Submission

  20. arXiv:1812.01194  [pdf, other

    stat.ML cs.LG

    A Retrieve-and-Edit Framework for Predicting Structured Outputs

    Authors: Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang

    Abstract: For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch. With this motivation, we propose an approach that first retrieves a training example based on the input (e.g., natural language description) and then edits it to the desired output (e.g., code). Our contribution is a computationally efficient method f… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

    Comments: To appear, NeurIPS 2018

  21. arXiv:1807.04709  [pdf, other

    cs.LG stat.ML

    Inferring Multidimensional Rates of Aging from Cross-Sectional Data

    Authors: Emma Pierson, Pang Wei Koh, Tatsunori Hashimoto, Daphne Koller, Jure Leskovec, Nicholas Eriksson, Percy Liang

    Abstract: Modeling how individuals evolve over time is a fundamental problem in the natural and social sciences. However, existing datasets are often cross-sectional with each individual observed only once, making it impossible to apply traditional time-series methods. Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional… ▽ More

    Submitted 5 March, 2019; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: Accepted at AISTATS 2019

  22. arXiv:1806.08010  [pdf, other

    stat.ML cs.LG

    Fairness Without Demographics in Repeated Loss Minimization

    Authors: Tatsunori B. Hashimoto, Megha Srivastava, Hongseok Namkoong, Percy Liang

    Abstract: Machine learning models (e.g., speech recognizers) are usually trained to minimize average loss, which results in representation disparity---minority groups (e.g., non-native speakers) contribute less to the training objective and thus tend to suffer higher loss. Worse, as model accuracy affects user retention, a minority group can shrink over time. In this paper, we first show that the status quo… ▽ More

    Submitted 30 July, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: Final version for ICML2018, corrects typos

  23. arXiv:1804.03761  [pdf, other

    stat.ML cs.LG

    Derivative free optimization via repeated classification

    Authors: Tatsunori B. Hashimoto, Steve Yadlowsky, John C. Duchi

    Abstract: We develop an algorithm for minimizing a function using $n$ batched function value measurements at each of $T$ rounds by using classifiers to identify a function's sublevel set. We show that sufficiently accurate classifiers can achieve linear convergence rates, and show that the convergence rate is tied to the difficulty of active learning sublevel sets. Further, we show that the bootstrap is a c… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: At AISTATS2018

  24. arXiv:1711.02226  [pdf, other

    stat.ML

    Unsupervised Transformation Learning via Convex Relaxations

    Authors: Tatsunori B. Hashimoto, John C. Duchi, Percy Liang

    Abstract: Our goal is to extract meaningful transformations from raw images, such as varying the thickness of lines in handwriting or the lighting in a portrait. We propose an unsupervised approach to learn such transformations by attempting to reconstruct an image from a linear combination of transformations of its nearest neighbors. On handwritten digits and celebrity portraits, we show that even with lin… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

    Comments: To appear at NIPS 2017

  25. arXiv:1709.08878  [pdf, other

    cs.CL cs.AI cs.LG cs.NE stat.ML

    Generating Sentences by Editing Prototypes

    Authors: Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang

    Abstract: We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to hu… ▽ More

    Submitted 7 September, 2018; v1 submitted 26 September, 2017; originally announced September 2017.

    Comments: 14 pages, Transactions of the Association for Computational Linguistics (TACL), 2018

  26. arXiv:1511.00573  [pdf, other

    stat.ML cs.AI cs.SI

    From random walks to distances on unweighted graphs

    Authors: Tatsunori B. Hashimoto, Yi Sun, Tommi S. Jaakkola

    Abstract: Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing r… ▽ More

    Submitted 2 November, 2015; originally announced November 2015.

    Comments: To appear in NIPS 2015

  27. arXiv:1509.05808  [pdf, other

    cs.CL cs.LG stat.ML

    Word, graph and manifold embedding from Markov processes

    Authors: Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola

    Abstract: Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitive-psychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric recov… ▽ More

    Submitted 18 September, 2015; originally announced September 2015.

  28. arXiv:1411.5720  [pdf, other

    stat.ML cs.SI math.ST stat.ME

    Metric recovery from directed unweighted graphs

    Authors: Tatsunori B. Hashimoto, Yi Sun, Tommi S. Jaakkola

    Abstract: We analyze directed, unweighted graphs obtained from $x_i\in \mathbb{R}^d$ by connecting vertex $i$ to $j$ iff $|x_i - x_j| < ε(x_i)$. Examples of such graphs include $k$-nearest neighbor graphs, where $ε(x_i)$ varies from point to point, and, arguably, many real world graphs such as co-purchasing graphs. We ask whether we can recover the underlying Euclidean metric $ε(x_i)$ and the associated den… ▽ More

    Submitted 20 November, 2014; originally announced November 2014.

    Comments: Poster at NIPS workshop on networks. Submitted to AISTATS 2015