Skip to main content

Showing 1–50 of 60 results for author: Ré, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2204.07596  [pdf, other

    stat.ML cs.LG

    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

    Authors: Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré

    Abstract: An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them,… ▽ More

    Submitted 13 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: ICML 2022 Camera Ready

  2. arXiv:2203.13270  [pdf, other

    stat.ML cs.LG

    Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

    Authors: Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré

    Abstract: Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudol… ▽ More

    Submitted 1 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: UAI 2022 Camera Ready

  3. arXiv:2108.06896  [pdf

    cs.LG stat.ME

    Challenges for cognitive decoding using deep learning methods

    Authors: Armin W. Thomas, Christopher Ré, Russell A. Poldrack

    Abstract: In cognitive decoding, researchers aim to characterize a brain region's representations by identifying the cognitive states (e.g., accepting/rejecting a gamble) that can be identified from the region's activity. Deep learning (DL) methods are highly promising for cognitive decoding, with their unmatched ability to learn versatile representations of complex data. Yet, their widespread application i… ▽ More

    Submitted 16 August, 2021; originally announced August 2021.

  4. arXiv:2103.15798  [pdf, other

    cs.LG cs.AI cs.CV math.NA stat.ML

    Rethinking Neural Operations for Diverse Tasks

    Authors: Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, Ameet Talwalkar

    Abstract: An important goal of AutoML is to automate-away the design of neural networks on new tasks in under-explored domains. Motivated by this goal, we study the problem of enabling users to discover the right neural operations given data from their specific domain. We introduce a search space of operations called XD-Operations that mimic the inductive bias of standard multi-channel convolutions while be… ▽ More

    Submitted 4 November, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: NeurIPS 2021

  5. arXiv:2103.02761  [pdf, other

    cs.LG stat.ML

    Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation

    Authors: Mayee F. Chen, Benjamin Cohen-Wang, Stephen Mussmann, Frederic Sala, Christopher Ré

    Abstract: Labeling data for modern machine learning is expensive and time-consuming. Latent variable models can be used to infer labels from weaker, easier-to-acquire sources operating on unlabeled data. Such models can also be trained using labeled data, presenting a key question: should a user invest in few labeled or many unlabeled points? We answer this via a framework centered on model misspecification… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: To appear in AISTATS 2021

  6. arXiv:2012.14966  [pdf, other

    cs.LG stat.ML

    Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

    Authors: Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré

    Abstract: Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off… ▽ More

    Submitted 5 January, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: International Conference on Learning Representations (ICLR) 2020 spotlight

  7. arXiv:2010.11750  [pdf, other

    stat.ML cs.LG

    Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

    Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

    Abstract: The problem of learning one task with samples from another task has received much interest recently. In this paper, we ask a fundamental question: when is combining data from two tasks better than learning one task alone? Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices. However, quantifying such a transfer effect… ▽ More

    Submitted 10 August, 2023; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 64 pages, 6 figures; We thoroughly revised the paper by adding new results and reorganizing the presentation

  8. arXiv:2010.00402  [pdf, other

    cs.DS cs.LG stat.ML

    From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

    Authors: Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher Ré

    Abstract: Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's di… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

  9. arXiv:2008.09983  [pdf, other

    cs.LG cs.DB stat.ML

    Leveraging Organizational Resources to Adapt Models to New Data Modalities

    Authors: Sahaana Suri, Raghuveer Chanda, Neslihan Bulut, Pradyumna Narayana, Yemao Zeng, Peter Bailis, Sugato Basu, Girija Narlikar, Christopher Re, Abishek Sethi

    Abstract: As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e.g., a new video content launch in a social media application requires existing text or image models to extend to video). To solve this problem, organizations typically create ML pipelines from scratch. However, this fails to utiliz… ▽ More

    Submitted 23 August, 2020; originally announced August 2020.

    Journal ref: PVLDB,13(12): 3396-3410, 2020

  10. arXiv:2008.07669  [pdf, other

    cs.LG stat.ML

    HiPPO: Recurrent Memory with Optimal Polynomial Projections

    Authors: Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re

    Abstract: A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal soluti… ▽ More

    Submitted 22 October, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

  11. arXiv:2008.06775  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Model Patching: Closing the Subgroup Performance Gap with Data Augmentation

    Authors: Karan Goel, Albert Gu, Yixuan Li, Christopher Ré

    Abstract: Classifiers in machine learning are often brittle when deployed. Particularly concerning are models with inconsistent performance on specific subgroups of a class, e.g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage. To mitigate these performance differences, we introduce model patching, a two-stage framework for improving robustness that enc… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

  12. arXiv:2006.15168  [pdf, other

    stat.ML cs.LG

    Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

    Authors: Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré

    Abstract: Our goal is to enable machine learning systems to be trained interactively. This requires models that perform well and train quickly, without large amounts of hand-labeled data. We take a step forward in this direction by borrowing from weak supervision (WS), wherein models can be trained with noisy sources of signal instead of hand-labeled data. But WS relies on training downstream deep networks… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  13. arXiv:2005.03675  [pdf, other

    cs.LG cs.NE cs.SI stat.ML

    Machine Learning on Graphs: A Model and Comprehensive Taxonomy

    Authors: Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, Kevin Murphy

    Abstract: There has been a surge of recent interest in learning representations for graph-structured data. Graph representation learning methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding (such as shallow graph embedding or graph auto-encoders), focuses on learning unsupervised representations of relational structure. The second,… ▽ More

    Submitted 11 April, 2022; v1 submitted 7 May, 2020; originally announced May 2020.

  14. arXiv:2005.00695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    On the Generalization Effects of Linear Transformations in Data Augmentation

    Authors: Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré

    Abstract: Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transfor… ▽ More

    Submitted 26 July, 2023; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: 22 pages. Appeared in ICML 2020

  15. arXiv:2005.00545  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Low-Dimensional Hyperbolic Knowledge Graph Embeddings

    Authors: Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, Christopher Ré

    Abstract: Knowledge graph (KG) embeddings learn low-dimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data, hyperbolic embedding methods have shown promise for high-fidelity and parsimonious representations. However, existing hyperbolic embedding methods do not a… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  16. arXiv:2004.05316  [pdf, other

    cs.LG stat.ML

    Ivy: Instrumental Variable Synthesis for Causal Inference

    Authors: Zhaobin Kuang, Frederic Sala, Nimit Sohoni, Sen Wu, Aldo Córdova-Palomera, Jared Dunnmon, James Priest, Christopher Ré

    Abstract: A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates---which are not necessarily str… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

  17. arXiv:2003.07977  [pdf, other

    eess.IV cs.LG stat.ML

    Assessing Robustness to Noise: Low-Cost Head CT Triage

    Authors: Sarah M. Hooper, Jared A. Dunnmon, Matthew P. Lungren, Sanjiv Sam Gambhir, Christopher Ré, Adam S. Wang, Bhavik N. Patel

    Abstract: Automated medical image classification with convolutional neural networks (CNNs) has great potential to impact healthcare, particularly in resource-constrained healthcare systems where fewer trained radiologists are available. However, little is known about how well a trained CNN can perform on images with the increased noise levels, different acquisition protocols, or additional artifacts that ma… ▽ More

    Submitted 28 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: AI for Affordable Healthcare Workshop at ICLR 2020. First two authors have equal contribution; last two authors have equal contribution. Revision made to manuscript header according to workshop guidelines on 3/28/20

  18. arXiv:2003.04983  [pdf, other

    cs.CL cs.LG stat.ML

    Understanding the Downstream Instability of Word Embeddings

    Authors: Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré

    Abstract: Many industrial machine learning (ML) systems require frequent retraining to keep up-to-date with constantly changing data. This retraining exacerbates a large challenge facing ML systems today: model training is unstable, i.e., small changes in training data can cause significant changes in the model's predictions. In this paper, we work on developing a deeper understanding of this instability, w… ▽ More

    Submitted 28 February, 2020; originally announced March 2020.

    Comments: In Proceedings of the 3rd MLSys Conference, 2020

  19. arXiv:2002.11955  [pdf, other

    stat.ML cs.LG

    Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

    Authors: Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré

    Abstract: Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive,… ▽ More

    Submitted 15 July, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  20. arXiv:1910.12933  [pdf, other

    cs.LG stat.ML

    Hyperbolic Graph Convolutional Neural Networks

    Authors: Ines Chami, Rex Ying, Christopher Ré, Jure Leskovec

    Abstract: Graph convolutional neural networks (GCNs) embed nodes in a graph into Euclidean space, which has been shown to incur a large distortion when embedding real-world graphs with scale-free or hierarchical structure. Hyperbolic geometry offers an exciting alternative, as it enables embeddings with much smaller distortion. However, extending GCNs to hyperbolic geometry presents several unique challenge… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: Published at Conference NeurIPS 2019. First 2 authors have equal contribution

  21. arXiv:1910.09505  [pdf, other

    stat.ML cs.CV cs.LG

    Multi-Resolution Weak Supervision for Sequential Data

    Authors: Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher Ré

    Abstract: Since manually labeling training data is slow and expensive, recent industrial and scientific research efforts have turned to weaker or noisier forms of supervision sources. However, existing weak supervision approaches fail to model multi-resolution sources for sequential data, like video, that can assign labels to individual elements or collections of elements in a sequence. A key challenge in w… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019 (Conference on Neural Information Processing Systems)

  22. arXiv:1910.05124  [pdf, other

    cs.DC cs.LG stat.ML

    PipeMare: Asynchronous Pipeline Parallel DNN Training

    Authors: Bowen Yang, Jian Zhang, Jonathan Li, Christopher Ré, Christopher R. Aberger, Christopher De Sa

    Abstract: Pipeline parallelism (PP) when training neural networks enables larger models to be partitioned spatially, leading to both lower network communication and overall higher hardware utilization. Unfortunately, to preserve the statistical efficiency of sequential training, existing PP techniques sacrifice hardware efficiency by decreasing pipeline utilization or incurring extra memory costs. In this p… ▽ More

    Submitted 8 February, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

  23. arXiv:1909.12475  [pdf, other

    cs.LG stat.ML

    Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging

    Authors: Luke Oakden-Rayner, Jared Dunnmon, Gustavo Carneiro, Christopher Ré

    Abstract: Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model still consistently misses a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it re… ▽ More

    Submitted 15 November, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  24. arXiv:1909.06349  [pdf, other

    cs.LG cs.AI stat.ML

    Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

    Authors: Vincent S. Chen, Sen Wu, Zhenzhen Weng, Alexander Ratner, Christopher Ré

    Abstract: In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes. While machine learning models can achieve high quality performance on coarse-grained metrics like F1-score… ▽ More

    Submitted 29 February, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2019

  25. arXiv:1909.01264  [pdf, other

    cs.LG stat.ML

    On the Downstream Performance of Compressed Word Embeddings

    Authors: Avner May, Jian Zhang, Tri Dao, Christopher Ré

    Abstract: Compressing word embeddings is important for deploying NLP models in memory-constrained settings. However, understanding what makes compressed embeddings perform well on downstream tasks is challenging---existing measures of compression quality often fail to distinguish between embeddings that perform well and those that do not. We thus propose the eigenspace overlap score as a new measure. We rel… ▽ More

    Submitted 14 January, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2019 spotlight (Conference on Neural Information Processing Systems)

  26. arXiv:1904.10631  [pdf, other

    cs.LG stat.ML

    Low-Memory Neural Network Training: A Technical Report

    Authors: Nimit S. Sohoni, Christopher R. Aberger, Megan Leszczynski, Jian Zhang, Christopher Ré

    Abstract: Memory is increasingly often the bottleneck when training neural network models. Despite this, techniques to lower the overall memory requirements of training have been less widely studied compared to the extensive literature on reducing the memory requirements of inference. In this paper we study a fundamental question: How much memory is actually needed to train a neural network? To answer this… ▽ More

    Submitted 8 April, 2022; v1 submitted 23 April, 2019; originally announced April 2019.

    Comments: Version notes: Copyedits and citation fixes

  27. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  28. arXiv:1903.11101  [pdf, other

    cs.LG eess.IV stat.ML

    Cross-Modal Data Programming Enables Rapid Medical Machine Learning

    Authors: Jared Dunnmon, Alexander Ratner, Nishith Khandwala, Khaled Saab, Matthew Markert, Hersh Sagreiya, Roger Goldman, Christopher Lee-Messer, Matthew Lungren, Daniel Rubin, Christopher Ré

    Abstract: Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables si… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

  29. arXiv:1903.05895  [pdf, other

    cs.LG stat.ML

    Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

    Authors: Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré

    Abstract: Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and… ▽ More

    Submitted 28 December, 2020; v1 submitted 14 March, 2019; originally announced March 2019.

    Comments: International Conference on Machine Learning (ICML) 2019

  30. arXiv:1903.05844  [pdf, other

    stat.ML cs.LG

    Learning Dependency Structures for Weak Supervision Models

    Authors: Paroma Varma, Frederic Sala, Ann He, Alexander Ratner, Christopher Ré

    Abstract: Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCA-based algorithm for learning these dependency structures, est… ▽ More

    Submitted 14 March, 2019; originally announced March 2019.

  31. arXiv:1812.00417  [pdf, other

    cs.LG stat.ML

    Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

    Authors: Stephen H. Bach, Daniel Rodriguez, Yintao Liu, Chong Luo, Haidong Shao, Cassandra Xia, Souvik Sen, Alexander Ratner, Braden Hancock, Houman Alborzi, Rahul Kuchhal, Christopher Ré, Rob Malkin

    Abstract: Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for… ▽ More

    Submitted 3 June, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

    Journal ref: Proceedings of the International Conference on Management of Data (SIGMOD), 2019

  32. arXiv:1811.00155  [pdf, other

    cs.LG cs.AI stat.ML

    Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

    Authors: Jian Zhang, Avner May, Tri Dao, Christopher Ré

    Abstract: We investigate how to train kernel approximation methods that generalize well under a memory budget. Building on recent theoretical work, we define a measure of kernel approximation error which we find to be more predictive of the empirical generalization performance of kernel approximation methods than conventional metrics. An important consequence of this definition is that a kernel approximatio… ▽ More

    Submitted 20 March, 2019; v1 submitted 31 October, 2018; originally announced November 2018.

    Comments: International Conference on Artificial Intelligence and Statistics (AISTATS) 2019

  33. arXiv:1810.02840  [pdf, other

    stat.ML cs.LG

    Training Complex Models with Multi-Task Weak Supervision

    Authors: Alexander Ratner, Braden Hancock, Jared Dunnmon, Frederic Sala, Shreyash Pandey, Christopher Ré

    Abstract: As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply… ▽ More

    Submitted 7 December, 2018; v1 submitted 5 October, 2018; originally announced October 2018.

  34. arXiv:1810.02309  [pdf, other

    cs.LG stat.ML

    Learning Compressed Transforms with Low Displacement Rank

    Authors: Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré

    Abstract: The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both… ▽ More

    Submitted 1 January, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

    Comments: NeurIPS 2018. Code available at https://github.com/HazyResearch/structured-nets

  35. arXiv:1806.01427  [pdf, other

    cs.LG stat.ML

    Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

    Authors: Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia

    Abstract: Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of… ▽ More

    Submitted 1 December, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

  36. arXiv:1804.03329  [pdf, other

    cs.LG stat.ML

    Representation Tradeoffs for Hyperbolic Embeddings

    Authors: Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

    Abstract: Hyperbolic embeddings offer excellent quality with few dimensions when embedding hierarchical data structures like synonym or type hierarchies. Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization. On WordNet, our combinatorial embedding obtains a mean-average-precision of 0.989 with only two dimensio… ▽ More

    Submitted 24 April, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

  37. arXiv:1803.06084  [pdf, other

    cs.LG stat.ML

    A Kernel Theory of Modern Data Augmentation

    Authors: Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

    Abstract: Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear natural… ▽ More

    Submitted 20 March, 2019; v1 submitted 16 March, 2018; originally announced March 2018.

  38. arXiv:1803.03383  [pdf, other

    cs.LG stat.ML

    High-Accuracy Low-Precision Training

    Authors: Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré

    Abstract: Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, w… ▽ More

    Submitted 8 March, 2018; originally announced March 2018.

  39. Snorkel: Rapid Training Data Creation with Weak Supervision

    Authors: Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

    Abstract: Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs w… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

    Journal ref: Proceedings of the VLDB Endowment, 11(3), 269-282, 2017

  40. arXiv:1709.02477  [pdf, other

    cs.LG cs.AI stat.ML

    Inferring Generative Model Structure with Static Analysis

    Authors: Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, Christopher Ré

    Abstract: Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The structure of these models affects training label quality, but is difficult to learn without any ground truth labels. We instead rely on these weak supervision sources h… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Comments: NIPS 2017

  41. arXiv:1709.01643  [pdf, other

    stat.ML cs.CV cs.LG

    Learning to Compose Domain-Specific Transformations for Data Augmentation

    Authors: Alexander J. Ratner, Henry R. Ehrenberg, Zeshan Hussain, Jared Dunnmon, Christopher Ré

    Abstract: Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual transformations, constructing and tuning the more sophisticated compositions typically needed to achieve state-of-the-art results is a time-consuming manual task in p… ▽ More

    Submitted 30 September, 2017; v1 submitted 5 September, 2017; originally announced September 2017.

    Comments: To appear at Neural Information Processing Systems (NIPS) 2017

    Journal ref: Advances in Neural Information Processing Systems 30, 2017, 3236--3246

  42. arXiv:1707.02670  [pdf, other

    math.OC cs.DS cs.LG math.NA stat.ML

    Accelerated Stochastic Power Iteration

    Authors: Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu

    Abstract: Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires $\mathcal O(1/Δ)$ full-data passes to recover the principal component of a matrix with eigen-gap $Δ$. Lanczos, a significantly more complex method, achieves an accelerated rate of $\mathcal O(1/\sqrtΔ)$ passes. Modern applications, however, motivate m… ▽ More

    Submitted 9 July, 2017; originally announced July 2017.

    Comments: 37 pages, 5 figures

  43. arXiv:1705.07538  [pdf, other

    cs.LG cs.DB stat.ML

    Infrastructure for Usable Machine Learning: The Stanford DAWN Project

    Authors: Peter Bailis, Kunle Olukotun, Christopher Re, Matei Zaharia

    Abstract: Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application developmen… ▽ More

    Submitted 8 June, 2017; v1 submitted 21 May, 2017; originally announced May 2017.

  44. arXiv:1705.04790  [pdf, other

    stat.ML

    ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information

    Authors: Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré, Scott Delp

    Abstract: In healthcare applications, temporal variables that encode movement, health status and longitudinal patient evolution are often accompanied by rich structured information such as demographics, diagnostics and medical exam data. However, current methods do not jointly optimize over structured covariates and time series in the feature extraction process. We present ShortFuse, a method that boosts th… ▽ More

    Submitted 15 May, 2017; v1 submitted 13 May, 2017; originally announced May 2017.

    Comments: Manuscript under review for the Machine Learning in Healthcare Conference, 2017 (www.mucmd.org). 15 pages, 4 figures, 3 tables

  45. arXiv:1703.00854  [pdf, other

    cs.LG stat.ML

    Learning the Structure of Generative Models without Labeled Data

    Authors: Stephen H. Bach, Bryan He, Alexander Ratner, Christopher Ré

    Abstract: Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model's dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge.… ▽ More

    Submitted 9 September, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

    Journal ref: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017

  46. arXiv:1610.08123  [pdf, other

    cs.LG stat.ML

    Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

    Authors: Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

    Abstract: A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to label training data. Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy param… ▽ More

    Submitted 28 September, 2017; v1 submitted 25 October, 2016; originally announced October 2016.

    Comments: 4 figures; 18 pages

  47. arXiv:1607.00559  [pdf, ps, other

    math.OC stat.ML

    Sub-sampled Newton Methods with Non-uniform Sampling

    Authors: Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney

    Abstract: We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) := \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available. We consider the regime where $n \gg d$. As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized… ▽ More

    Submitted 5 July, 2016; v1 submitted 2 July, 2016; originally announced July 2016.

    Comments: minor fix on v1

  48. arXiv:1606.07365  [pdf, other

    stat.ML cs.LG

    Parallel SGD: When does averaging help?

    Authors: Jian Zhang, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

    Abstract: Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice. We study model averaging as a variance-reducing mechanism and describe two ways in which the frequency of averaging affects convergence. For convex objectives, we show the benefit of frequent averaging depends on the gradient v… ▽ More

    Submitted 23 June, 2016; originally announced June 2016.

  49. arXiv:1606.03432  [pdf, other

    cs.LG cs.AI stat.ML

    Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

    Authors: Bryan He, Christopher De Sa, Ioannis Mitliagkas, Christopher Ré

    Abstract: Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured tha… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

  50. arXiv:1605.09774  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Asynchrony begets Momentum, with an Application to Deep Learning

    Authors: Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré

    Abstract: Asynchronous methods are widely used in deep learning, but have limited theoretical justification when applied to non-convex problems. We show that running stochastic gradient descent (SGD) in an asynchronous manner can be viewed as adding a momentum-like term to the SGD iteration. Our result does not assume convexity of the objective function, so it is applicable to deep learning systems. We obse… ▽ More

    Submitted 25 November, 2016; v1 submitted 31 May, 2016; originally announced May 2016.

    Comments: Full version of a paper published in Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2016