Skip to main content

Showing 1–42 of 42 results for author: Stern, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.00707  [pdf

    cs.CV cs.CE cs.LG

    Synthetic dual image generation for reduction of labeling efforts in semantic segmentation of micrographs with a customized metric function

    Authors: Matias Oscar Volman Stern, Dominic Hohs, Andreas Jansche, Timo Bernthaler, Gerhard Schneider

    Abstract: Training of semantic segmentation models for material analysis requires micrographs and their corresponding masks. It is quite unlikely that perfect masks will be drawn, especially at the edges of objects, and sometimes the amount of data that can be obtained is small, since only a few samples are available. These aspects make it very problematic to train a robust model. We demonstrate a workflow… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  2. arXiv:2407.14701  [pdf, other

    cs.CL

    Contextual modulation of language comprehension in a dynamic neural model of lexical meaning

    Authors: Michael C. Stern, Maria M. Piñango

    Abstract: We propose and computationally implement a dynamic neural model of lexical meaning, and experimentally test its behavioral predictions. We demonstrate the architecture and behavior of the model using as a test case the English lexical item 'have', focusing on its polysemous use. In the model, 'have' maps to a semantic space defined by two continuous conceptual dimensions, connectedness and control… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2404.09174  [pdf, other

    cs.HC

    Investigating the impact of virtual element misalignment in collaborative Augmented Reality experiences

    Authors: Francesco Vona, Sina Hinzmann, Michael Stern, Tanja Kojić, Navid Ashrafi, David Grieshammer, Jan-Niklas Voigt-Antons

    Abstract: The collaboration in co-located shared environments has sparked an increased interest in immersive technologies, including Augmented Reality (AR). Since research in this field has primarily focused on individual user experiences in AR, the collaborative aspects within shared AR spaces remain less explored, and fewer studies can provide guidelines for designing this type of experience. This article… ▽ More

    Submitted 25 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: Paper accepted for publication/presentation at QoMEX 2024

  4. arXiv:2403.16331  [pdf, other

    cs.SD cs.LG eess.AS

    Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models

    Authors: Hanzhi Yin, Gang Cheng, Christian J. Steinmetz, Ruibin Yuan, Richard M. Stern, Roger B. Dannenberg

    Abstract: We describe a novel approach for developing realistic digital models of dynamic range compressors for digital audio production by analyzing their analog prototypes. While realistic digital dynamic compressors are potentially useful for many applications, the design process is challenging because the compressors operate nonlinearly over long time scales. Our approach is based on the structured stat… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  5. arXiv:2311.15404  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Applying statistical learning theory to deep learning

    Authors: Cédric Gerbelot, Avetik Karagulyan, Stefani Karp, Kavya Ravichandran, Menachem Stern, Nathan Srebro

    Abstract: Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods. The goal of these lectures is to provide an overview of some of the main questions that arise when attempting to understand deep l… ▽ More

    Submitted 25 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: 66 pages, 20 figures

  6. arXiv:2311.00537  [pdf, other

    cond-mat.soft cs.ET cs.LG

    Machine Learning Without a Processor: Emergent Learning in a Nonlinear Electronic Metamaterial

    Authors: Sam Dillavou, Benjamin D Beyer, Menachem Stern, Andrea J Liu, Marc Z Miskin, Douglas J Durian

    Abstract: Standard deep learning algorithms require differentiating large nonlinear networks, a process that is slow and power-hungry. Electronic learning metamaterials offer potentially fast, efficient, and fault-tolerant hardware for analog machine learning, but existing implementations are linear, severely limiting their capabilities. These systems differ significantly from artificial neural networks as… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 11 pages 8 figures

    Journal ref: Proc. Nat. Acad. Sci. 121 (28), e2319718121 (2024)

  7. arXiv:2309.14460  [pdf, other

    eess.AS cs.AI cs.CL cs.SD eess.SP

    Online Active Learning For Sound Event Detection

    Authors: Mark Lindsey, Ankit Shah, Francis Kubala, Richard M. Stern

    Abstract: Data collection and annotation is a laborious, time-consuming prerequisite for supervised machine learning tasks. Online Active Learning (OAL) is a paradigm that addresses this issue by simultaneously minimizing the amount of annotation required to train a classifier and adapting to changes in the data over the duration of the data collection process. Prior work has indicated that fluctuating clas… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024. Publication will belong to IEEE

  8. arXiv:2211.15254  [pdf, other

    eess.AS cs.SD

    Learnable Front Ends Based on Temporal Modulation for Music Tagging

    Authors: Yinghao Ma, Richard M. Stern

    Abstract: While end-to-end systems are becoming popular in auditory signal processing including automatic music tagging, models using raw audio as input needs a large amount of data and computational resources without domain knowledge. Inspired by the fact that temporal modulation is regarded as an essential component in auditory perception, we introduce the Temporal Modulation Neural Network (TMNN) that co… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  9. arXiv:2209.12278  [pdf

    cs.CL

    Neural inhibition during speech planning contributes to contrastive hyperarticulation

    Authors: Michael C. Stern, Jason A. Shaw

    Abstract: Previous work has demonstrated that words are hyperarticulated on dimensions of speech that differentiate them from a minimal pair competitor. This phenomenon has been termed contrastive hyperarticulation (CH). We present a dynamic neural field (DNF) model of voice onset time (VOT) planning that derives CH from an inhibitory influence of the minimal pair competitor during planning. We test some pr… ▽ More

    Submitted 14 March, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

  10. Quantitative probing: Validating causal models using quantitative domain knowledge

    Authors: Daniel Grünbaum, Maike L. Stern, Elmar W. Lang

    Abstract: We present quantitative probing as a model-agnostic framework for validating causal models in the presence of quantitative domain knowledge. The method is constructed as an analogue of the train/test split in correlation-based machine learning and as an enhancement of current causal validation strategies that are consistent with the logic of scientific discovery. The effectiveness of the method is… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: submitted to the Journal of Causal Inference

    MSC Class: 62D20

    Journal ref: Journal of Causal Inference, vol. 11, no. 1, 2023

  11. arXiv:2201.04626  [pdf, other

    cond-mat.soft cs.LG cs.NE

    Desynchronous Learning in a Physics-Driven Learning Network

    Authors: Jacob F Wycoff, Sam Dillavou, Menachem Stern, Andrea J Liu, Douglas J Durian

    Abstract: In a neuron network, synapses update individually using local information, allowing for entirely decentralized learning. In contrast, elements in an artificial neural network (ANN) are typically updated simultaneously using a central processor. Here we investigate the feasibility and effect of desynchronous learning in a recently introduced decentralized, physics-driven learning network. We show t… ▽ More

    Submitted 1 December, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: 6 pages 4 figures

  12. arXiv:2111.15486  [pdf, other

    physics.optics cs.LG physics.data-an

    Playing Ping Pong with Light: Directional Emission of White Light

    Authors: Heribert Wankerl, Christopher Wiesmann, Laura Kreiner, Rainer Butendeich, Alexander Luce, Sandra Sobczyk, Maike Lorena Stern, Elmar Wolfgang Lang

    Abstract: Over the last decades, light-emitting diodes (LED) have replaced common light bulbs in almost every application, from flashlights in smartphones to automotive headlights. Illuminating nightly streets requires LEDs to emit a light spectrum that is perceived as pure white by the human eye. The power associated with such a white light spectrum is not only distributed over the contributing wavelengths… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

    Comments: Under review for publication

  13. arXiv:2109.14788  [pdf

    cs.CL

    Tipping the Scales: A Corpus-Based Reconstruction of Adjective Scales in the McGill Pain Questionnaire

    Authors: Miriam Stern

    Abstract: Modern medical diagnosis relies on precise pain assessment tools in translating clinical information from patient to physician. The McGill Pain Questionnaire (MPQ) is a clinical pain assessment technique that utilizes 78 adjectives of different intensities in 20 different categories to quantity a patient's pain. The questionnaire's efficacy depends on a predictable pattern of adjective use by pati… ▽ More

    Submitted 5 October, 2021; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: 16 pages, 1 figure

  14. arXiv:2103.14144  [pdf, other

    cs.GT cs.CR econ.TH math.OC

    Dynamic Posted-Price Mechanisms for the Blockchain Transaction Fee Market

    Authors: Matheus V. X. Ferreira, Daniel J. Moroz, David C. Parkes, Mitchell Stern

    Abstract: In recent years, prominent blockchain systems such as Bitcoin and Ethereum have experienced explosive growth in transaction volume, leading to frequent surges in demand for limited block space and causing transaction fees to fluctuate by orders of magnitude. Existing systems sell space using first-price auctions; however, users find it difficult to estimate how much they need to bid in order to ge… ▽ More

    Submitted 16 November, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Journal ref: AFT '21: Proceedings of the 3rd ACM Conference on Advances in Financial Technologies, 2021, 86-99

  15. arXiv:2010.10648  [pdf, other

    cs.CL cs.CV cs.LG

    Towards End-to-End In-Image Neural Machine Translation

    Authors: Elman Mansimov, Mitchell Stern, Mia Chen, Orhan Firat, Jakob Uszkoreit, Puneet Jain

    Abstract: In this paper, we offer a preliminary investigation into the task of in-image machine translation: transforming an image containing text in one language into an image containing the same text in another language. We propose an end-to-end neural model for this task inspired by recent approaches to neural machine translation, and demonstrate promising initial results based purely on pixel-level supe… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted as an oral presentation at EMNLP, NLP Beyond Text workshop, 2020

  16. arXiv:2010.05769  [pdf, other

    cs.LG cs.AI physics.optics

    Parameterized Reinforcement Learning for Optical System Optimization

    Authors: Heribert Wankerl, Maike L. Stern, Ali Mahdavi, Christoph Eichler, Elmar W. Lang

    Abstract: Designing a multi-layer optical system with designated optical characteristics is an inverse design problem in which the resulting design is determined by several discrete and continuous parameters. In particular, we consider three design parameters to describe a multi-layer stack: Each layer's dielectric material and thickness as well as the total number of layers. Such a combination of both, dis… ▽ More

    Submitted 25 November, 2020; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Presented as a poster at the workshop on machine learning for engineering modeling, simulation and design @ NeurIPS 2020

    Journal ref: J. Phys. D: Appl. Phys. 54 305104 (2021)

  17. arXiv:2009.02832  [pdf

    eess.AS cs.SD

    Non causal deep learning based dereverberation

    Authors: Jorge Wuth, Richard M. Stern, Nestor Becerra Yoma

    Abstract: In this paper we demonstrate the effectiveness of non-causal context for mitigating the effects of reverberation in deep-learning-based automatic speech recognition (ASR) systems. First, the value of non-causal context using a non-causal FIR filter is shown by comparing the contributions of previous vs. future information. Second, MLP- and LSTM-based dereverberation networks were trained to confir… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

    Comments: 33 pages

  18. arXiv:2006.02956  [pdf, other

    cs.CR stat.OT

    A Fair, Traceable, Auditable and Participatory Randomization Tool for Legal Systems

    Authors: Marcos Vinicius M. Silva, Marcos Antonio Simplicio Jr., Roberto Augusto Castellanos Pfeiffer, Julio Michael Stern

    Abstract: Many real-world scenarios require the random selection of one or more individuals from a pool of eligible candidates. One example of especial social relevance refers to the legal system, in which the jurors and judges are commonly picked according to some probability distribution aiming to avoid biased decisions. In this scenario, ensuring auditability of the random drawing procedure is imperative… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    MSC Class: 64-04 (Primary); 62D99 (Secondary)

  19. arXiv:2005.13459  [pdf, other

    cs.CE

    Otimizacao e Processos Estocasticos Aplicados a Economia e Financas

    Authors: Julio Michael Stern, Carlos Alberto de Braganca Pereira, Celma de Oliveira Ribeiro, Cibele Dunder, Fabio Nakano, Marcelo Lauretto

    Abstract: Optimization and Stochastic Processes Applied to Economy and Finance -- is the name of this book translated to English; It has been used at the IME-USP - The Institute of Mathematics and Statistics of the University of Sao Paulo, since 1993. Contents: Ch.1: Linear Programming; Ch.2: Non-Linear Programming; Ch.3: Quadratic Programming; Ch.4: Markowitz Model; Ch.5: Dynamic Programming; Ch.6: LQG E… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: in Portuguese

  20. arXiv:2005.05927  [pdf, other

    cs.CL cs.PL

    Semantic Scaffolds for Pseudocode-to-Code Generation

    Authors: Ruiqi Zhong, Mitchell Stern, Dan Klein

    Abstract: We propose a method for program generation based on semantic scaffolds, lightweight structures representing the high-level semantic and syntactic composition of a program. By first searching over plausible scaffolds then using these as constraints for a beam search over programs, we achieve better coverage of the search space when compared with existing techniques. We apply our hierarchical search… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

  21. arXiv:2004.15015  [pdf, other

    cs.CL cs.CR cs.LG

    Imitation Attacks and Defenses for Black-box Machine Translation Systems

    Authors: Eric Wallace, Mitchell Stern, Dawn Song

    Abstract: Adversaries may look to steal or attack black-box NLP systems, either for financial gain or to exploit model errors. One setting of particular interest is machine translation (MT), where models have high commercial value and errors can be costly. We investigate possible exploits of black-box MT systems and explore a preliminary defense against such threats. We first show that MT systems can be sto… ▽ More

    Submitted 3 January, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: EMNLP 2020

  22. arXiv:2003.00594  [pdf, other

    cs.CV cs.LG

    Rethinking Fully Convolutional Networks for the Analysis of Photoluminescence Wafer Images

    Authors: Maike Lorena Stern, Hans Lindberg, Klaus Meyer-Wegener

    Abstract: The manufacturing of light-emitting diodes is a complex semiconductor-manufacturing process, interspersed with different measurements. Among the employed measurements, photoluminescence imaging has several advantages, namely being a non-destructive, fast and thus cost-effective measurement. On a photoluminescence measurement image of an LED wafer, every pixel corresponds to an LED chip's brightnes… ▽ More

    Submitted 1 March, 2020; originally announced March 2020.

  23. arXiv:2001.05540  [pdf, other

    cs.LG cs.CL stat.ML

    Insertion-Deletion Transformer

    Authors: Laura Ruis, Mitchell Stern, Julia Proskurnia, William Chan

    Abstract: We propose the Insertion-Deletion Transformer, a novel transformer-based neural architecture and training method for sequence generation. The model consists of two phases that are executed iteratively, 1) an insertion phase and 2) a deletion phase. The insertion phase parameterizes a distribution of insertions on the current output hypothesis, while the deletion phase parameterizes a distribution… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Comments: Accepted as an Extended Abstract at the Workshop of Neural Generation and Translation (WNGT 2019) at EMNLP 2019

  24. arXiv:1911.02067  [pdf, ps, other

    q-fin.PM cs.LG stat.ML

    Robo-advising: Learning Investors' Risk Preferences via Portfolio Choices

    Authors: Humoud Alsabah, Agostino Capponi, Octavio Ruiz Lacedelli, Matt Stern

    Abstract: We introduce a reinforcement learning framework for retail robo-advising. The robo-advisor does not know the investor's risk preference, but learns it over time by observing her portfolio choices in different market environments. We develop an exploration-exploitation algorithm which trades off costly solicitations of portfolio choices by the investor with autonomous trading decisions based on sta… ▽ More

    Submitted 16 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

    MSC Class: 68T01 ACM Class: I.2.6

  25. arXiv:1910.13437  [pdf, ps, other

    cs.CL cs.LG

    An Empirical Study of Generation Order for Machine Translation

    Authors: William Chan, Mitchell Stern, Jamie Kiros, Jakob Uszkoreit

    Abstract: In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft order-reward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of generation orders, including uninformed orders, location-bas… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  26. Fully Convolutional Networks for Chip-wise Defect Detection Employing Photoluminescence Images

    Authors: Maike Lorena Stern, Martin Schellenberger

    Abstract: Efficient quality control is inevitable in the manufacturing of light-emitting diodes (LEDs). Because defective LED chips may be traced back to different causes, a time and cost-intensive electrical and optical contact measurement is employed. Fast photoluminescence measurements, on the other hand, are commonly used to detect wafer separation damages but also hold the potential to enable an effici… ▽ More

    Submitted 6 October, 2019; originally announced October 2019.

    Comments: 14 pages, 12 figures

    ACM Class: J.2; I.2.10

    Journal ref: J Intell Manuf (2020) 1-14

  27. arXiv:1906.07299  [pdf

    eess.AS cs.SD

    On combining features for single-channel robust speech recognition in reverberant environments

    Authors: José Novoa, Josué Fredes, Jorge Wuth, Fernando Huenupán, Richard M. Stern, Nestor Becerra Yoma

    Abstract: This paper addresses the combination of complementary parallel speech recognition systems to reduce the error rate of speech recognition systems operating in real highly-reverberant environments. First, the testing environment consists of recordings of speech in a calibrated real room with reverberation times from 0.47 to 1.77 seconds and speaker-to-microphone distances of 0.16 to 2.56 meters. We… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

  28. arXiv:1906.01604  [pdf, ps, other

    cs.CL cs.LG stat.ML

    KERMIT: Generative Insertion-Based Modeling for Sequences

    Authors: William Chan, Nikita Kitaev, Kelvin Guu, Mitchell Stern, Jakob Uszkoreit

    Abstract: We present KERMIT, a simple insertion-based approach to generative modeling for sequences and sequence pairs. KERMIT models the joint distribution and its decompositions (i.e., marginals and conditionals) using a single neural network and, unlike much prior work, does not rely on a prespecified factorization of the data distribution. During training, one can feed KERMIT paired data $(x, y)$ to lea… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: William Chan, Nikita Kitaev, Kelvin Guu, and Mitchell Stern contributed equally

  29. Auditable Blockchain Randomization Tool

    Authors: Olivia Saa, Julio Michael Stern

    Abstract: Randomization is an integral part of well-designed statistical trials, and is also a required procedure in legal systems, see Marcondes et al. (2019) This paper presents an easy to implement randomization protocol that assures, in a formal mathematical setting, a statistically sound, computationally efficient, cryptographically secure, traceable and auditable randomization procedure that is also r… ▽ More

    Submitted 20 April, 2019; originally announced April 2019.

    Comments: 7 pages

    Journal ref: MaxEnt 2019 - Proceedings of the 39th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Garching, Germany, 30 June - 5 July 2019

  30. arXiv:1902.03249  [pdf, other

    cs.CL cs.LG stat.ML

    Insertion Transformer: Flexible Sequence Generation via Insertion Operations

    Authors: Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit

    Abstract: We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations. Unlike typical autoregressive models which rely on a fixed, often left-to-right ordering of the output, our approach accommodates arbitrary orderings by allowing for tokens to be inserted anywhere in the sequence during decoding. This flexibility confers a numbe… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  31. arXiv:1811.03115  [pdf, other

    cs.LG cs.CL stat.ML

    Blockwise Parallel Decoding for Deep Autoregressive Models

    Authors: Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

    Abstract: Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make different trade-offs between the amount of computation needed per layer and the length of the critical path at training time, generation still remains an inherent… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: NIPS 2018

  32. arXiv:1805.07569  [pdf, other

    cs.NE cs.LG q-bio.NC stat.ML

    Reliable counting of weakly labeled concepts by a single spiking neuron model

    Authors: Hannes Rapp, Martin Paul Nawrot, Merav Stern

    Abstract: Making an informed, correct and quick decision can be life-saving. It's crucial for animals during an escape behaviour or for autonomous cars during driving. The decision can be complex and may involve an assessment of the amount of threats present and the nature of each threat. Thus, we should expect early sensory processing to supply classification information fast and accurately, even before re… ▽ More

    Submitted 16 November, 2018; v1 submitted 19 May, 2018; originally announced May 2018.

  33. arXiv:1804.07853  [pdf, other

    cs.CL

    What's Going On in Neural Constituency Parsers? An Analysis

    Authors: David Gaddy, Mitchell Stern, Dan Klein

    Abstract: A number of differences have emerged between modern and classic approaches to constituency parsing in recent years, with structural components like grammars and feature-rich lexicons becoming less central while recurrent neural network representations rise in popularity. The goal of this work is to analyze the extent to which information provided directly by the model structure in classical system… ▽ More

    Submitted 20 April, 2018; originally announced April 2018.

    Comments: NAACL 2018

  34. arXiv:1804.04235  [pdf, other

    cs.LG cs.AI stat.ML

    Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

    Authors: Noam Shazeer, Mitchell Stern

    Abstract: In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter second-moment estimators requires memory equal to the number of parameters. For the case of neural network weight matrices, we propose maintaining only the per-… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

  35. arXiv:1711.02838  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic Cubic Regularization for Fast Nonconvex Optimization

    Authors: Nilesh Tripuraneni, Mitchell Stern, Chi Jin, Jeffrey Regier, Michael I. Jordan

    Abstract: This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak 2006]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(ε^{-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed… ▽ More

    Submitted 5 December, 2017; v1 submitted 8 November, 2017; originally announced November 2017.

    Comments: The first two authors contributed equally

  36. arXiv:1707.08976  [pdf, ps, other

    cs.CL

    Effective Inference for Generative Neural Parsing

    Authors: Mitchell Stern, Daniel Fried, Dan Klein

    Abstract: Generative neural models have recently achieved state-of-the-art results for constituency parsing. However, without a feasible search procedure, their use has so far been limited to reranking the output of external parsers in which decoding is more tractable. We describe an alternative to the conventional action-level beam search used for discriminative neural models that enables us to decode dire… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: EMNLP 2017

  37. arXiv:1707.03058  [pdf, ps, other

    cs.CL

    Improving Neural Parsing by Disentangling Model Combination and Reranking Effects

    Authors: Daniel Fried, Mitchell Stern, Dan Klein

    Abstract: Recent work has proposed several generative neural models for constituency parsing that achieve state-of-the-art results. Since direct search in these generative models is difficult, they have primarily been used to rescore candidate outputs from base parsers in which decoding is more straightforward. We first present an algorithm for direct search in these generative models. We then demonstrate t… ▽ More

    Submitted 10 July, 2017; originally announced July 2017.

    Comments: ACL 2017. The first two authors contributed equally

  38. arXiv:1707.01164  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Kernel Feature Selection via Conditional Covariance Minimization

    Authors: Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael I. Jordan

    Abstract: We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we show how to perform feature selection via a constrained optimization problem involving the trace of the conditional covariance operator. We prove various consistency results… ▽ More

    Submitted 20 October, 2018; v1 submitted 4 July, 2017; originally announced July 2017.

    Comments: The first two authors contributed equally

  39. arXiv:1705.09580  [pdf, other

    stat.ML cs.AI

    Risk-Sensitive Cooperative Games for Human-Machine Systems

    Authors: Agostino Capponi, Reza Ghanadan, Matt Stern

    Abstract: Autonomous systems can substantially enhance a human's efficiency and effectiveness in complex environments. Machines, however, are often unable to observe the preferences of the humans that they serve. Despite the fact that the human's and machine's objectives are aligned, asymmetric information, along with heterogeneous sensitivities to risk by the human and machine, make their joint optimizatio… ▽ More

    Submitted 26 May, 2017; originally announced May 2017.

    Comments: 15 pages, 10 figures

    MSC Class: 97R40

  40. arXiv:1705.08292  [pdf, other

    stat.ML cs.LG

    The Marginal Value of Adaptive Gradient Methods in Machine Learning

    Authors: Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, Benjamin Recht

    Abstract: Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple overparameterized problems, adaptive methods often find drastically different solutions than gradient descent (GD) or stochastic gradient desc… ▽ More

    Submitted 21 May, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

  41. arXiv:1705.03919  [pdf, ps, other

    cs.CL

    A Minimal Span-Based Neural Constituency Parser

    Authors: Mitchell Stern, Jacob Andreas, Dan Klein

    Abstract: In this work, we present a minimal neural model for constituency parsing based on independent scoring of labels and spans. We show that this model is not only compatible with classical dynamic programming techniques, but also admits a novel greedy top-down inference algorithm based on recursive partitioning of the input. We demonstrate empirically that both prediction schemes are competitive with… ▽ More

    Submitted 10 May, 2017; originally announced May 2017.

    Comments: To appear in ACL 2017

  42. arXiv:1704.07535  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Abstract Syntax Networks for Code Generation and Semantic Parsing

    Authors: Maxim Rabinovich, Mitchell Stern, Dan Klein

    Abstract: Tasks like code generation and semantic parsing require mapping unstructured (or partially structured) inputs to well-formed, executable outputs. We introduce abstract syntax networks, a modeling framework for these problems. The outputs are represented as abstract syntax trees (ASTs) and constructed by a decoder with a dynamically-determined modular structure paralleling the structure of the outp… ▽ More

    Submitted 25 April, 2017; originally announced April 2017.

    Comments: ACL 2017. MR and MS contributed equally