Skip to main content

Showing 1–50 of 97 results for author: Dyer, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2301.09412  [pdf, other

    cs.CL cs.AI cs.CY cs.HC

    Deep Learning Mental Health Dialogue System

    Authors: Lennart Brocki, George C. Dyer, Anna Gładka, Neo Christopher Chung

    Abstract: Mental health counseling remains a major challenge in modern society due to cost, stigma, fear, and unavailability. We posit that generative artificial intelligence (AI) models designed for mental health counseling could help improve outcomes by lowering barriers to access. To this end, we have developed a deep learning (DL) dialogue system called Serena. The system consists of a core generative m… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Journal ref: 6th International Workshop on Dialog Systems (IWDS); 10th IEEE International Conference on Big Data and Smart Computing (2022 BigComp)

  3. arXiv:2211.15089  [pdf, other

    cs.CL cs.LG

    Continuous diffusion for categorical data

    Authors: Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, Rémi Leblond, Will Grathwohl, Jonas Adler

    Abstract: Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous natur… ▽ More

    Submitted 15 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: 26 pages, 8 figures; corrections and additional information about hyperparameters

  4. arXiv:2207.08583  [pdf, other

    cs.CL

    MAD for Robust Reinforcement Learning in Machine Translation

    Authors: Domenic Donato, Lei Yu, Wang Ling, Chris Dyer

    Abstract: We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviatio… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  5. Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

    Authors: Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Miloš Stanojević, Phil Blunsom, Chris Dyer

    Abstract: We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentenc… ▽ More

    Submitted 6 December, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: 17 pages, 5 figures, 2 tables and 1 algorithm. To appear in TACL, to be presented at EMNLP 2022

  6. arXiv:2202.11444  [pdf, other

    cs.CL cs.AI cs.LG

    Enabling arbitrary translation objectives with Adaptive Tree Search

    Authors: Wang Ling, Wojciech Stokowiec, Domenic Donato, Laurent Sartran, Lei Yu, Austin Matthews, Chris Dyer

    Abstract: We introduce an adaptive tree search algorithm, that can find high-scoring outputs under translation models that make no assumptions about the form or structure of the search objective. This algorithm -- a deterministic variant of Monte Carlo tree search -- enables the exploration of new kinds of models that are unencumbered by constraints imposed to make decoding tractable, such as autoregressivi… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

    Comments: 17 pages, 3 figures

  7. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  8. arXiv:2106.05346  [pdf, other

    cs.CL cs.AI cs.IR

    End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

    Authors: Devendra Singh Sachan, Siva Reddy, William Hamilton, Chris Dyer, Dani Yogatama

    Abstract: We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectat… ▽ More

    Submitted 4 December, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready version

  9. Diverse Pretrained Context Encodings Improve Document Translation

    Authors: Domenic Donato, Lei Yu, Chris Dyer

    Abstract: We propose a new architecture for adapting a sentence-level sequence-to-sequence transformer by incorporating multiple pretrained document context signals and assess the impact on translation performance of (1) different pretraining approaches for generating these signals, (2) the quantity of parallel data for which document context is available, and (3) conditioning on source, target, or source a… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Journal ref: ACL 2021 (1299-1311)

  10. arXiv:2106.02736  [pdf, other

    cs.LG cs.CL

    Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis--Hastings

    Authors: Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick

    Abstract: While recent work has shown that scores from models trained by the ubiquitous masked language modeling (MLM) objective effectively discriminate probable from improbable sequences, it is still an open question if these MLMs specify a principled probability distribution over the space of possible sequences. In this paper, we interpret MLMs as energy-based sequence models and propose two energy param… ▽ More

    Submitted 15 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: ICLR 2022 - camera ready

  11. arXiv:2005.13482  [pdf, other

    cs.CL

    Syntactic Structure Distillation Pretraining For Bidirectional Encoders

    Authors: Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

    Abstract: Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they s… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 17 pages, 6 tables, 2 figures. AK and LK contributed equally

  12. arXiv:2005.03684  [pdf, other

    cs.CL cs.CV

    Learning to Segment Actions from Observation and Narration

    Authors: Daniel Fried, Jean-Baptiste Alayrac, Phil Blunsom, Chris Dyer, Stephen Clark, Aida Nematzadeh

    Abstract: We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision u… ▽ More

    Submitted 11 August, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  13. arXiv:2005.01646  [pdf, other

    cs.LG cs.CL

    A Probabilistic Generative Model for Typographical Analysis of Early Modern Printing

    Authors: Kartik Goyal, Chris Dyer, Christopher Warren, Max G'Sell, Taylor Berg-Kirkpatrick

    Abstract: We propose a deep and interpretable probabilistic generative model to analyze glyph shapes in printed Early Modern documents. We focus on clustering extracted glyph images into underlying templates in the presence of multiple confounding sources of variance. Our approach introduces a neural editor model that first generates well-understood printing phenomena like spatial perturbations from templat… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: To appear at ACL 2020

  14. arXiv:2001.11128  [pdf, other

    cs.CL cs.LG eess.AS

    Learning Robust and Multilingual Speech Representations

    Authors: Kazuya Kawakami, Luyu Wang, Chris Dyer, Phil Blunsom, Aaron van den Oord

    Abstract: Unsupervised speech representation learning has shown remarkable success at finding representations that correlate with phonetic structures and improve downstream speech recognition performance. However, most research has been focused on evaluating the representations in terms of their ability to improve the performance of speech recognition systems on read English (e.g. Wall Street Journal and Li… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

  15. arXiv:2001.08279   

    cs.CL cs.AI cs.LG

    Transition-Based Dependency Parsing using Perceptron Learner

    Authors: Rahul Radhakrishnan Iyer, Miguel Ballesteros, Chris Dyer, Robert Frederking

    Abstract: Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora. In this paper, we tackle transition-based dependency parsing using a Perceptron Learner. Our proposed model, which adds more relevant features to the Perceptron Learn… ▽ More

    Submitted 28 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

    Comments: This was part of an assignment at my graduate course at LTI. This does not offer any major novelties

  16. arXiv:1910.00553  [pdf, other

    cs.CL cs.LG

    Better Document-Level Machine Translation with Bayes' Rule

    Authors: Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wang Ling, Lingpeng Kong, Phil Blunsom, Chris Dyer

    Abstract: We show that Bayes' rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents---a compelling benefit as parallel documents are not always available. In our formulation, the posterior probability of a candidate translation is the product of the unconditional (prior) probability of the candidate output doc… ▽ More

    Submitted 2 July, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted by TACL

  17. arXiv:1909.09428  [pdf, other

    cs.CL cs.LG

    A Critical Analysis of Biased Parsers in Unsupervised Parsing

    Authors: Chris Dyer, Gábor Melis, Phil Blunsom

    Abstract: A series of recent papers has used a parsing algorithm due to Shen et al. (2018) to recover phrase-structure trees based on proxies for "syntactic depth." These proxy depths are obtained from the representations learned by recurrent language models augmented with mechanisms that encourage the (unsupervised) discovery of hierarchical structure latent in natural language sentences. Using the same pa… ▽ More

    Submitted 20 September, 2019; originally announced September 2019.

  18. arXiv:1909.01492  [pdf, other

    cs.CL cs.CR cs.LG stat.ML

    Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation

    Authors: Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli

    Abstract: Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this… ▽ More

    Submitted 20 December, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  19. arXiv:1908.11047  [pdf, other

    cs.CL

    Shallow Syntax in Deep Water

    Authors: Swabha Swayamdipta, Matthew Peters, Brendan Roof, Chris Dyer, Noah A. Smith

    Abstract: Shallow syntax provides an approximation of phrase-syntactic structure of sentences; it can be produced with high accuracy, and is computationally cheap to obtain. We investigate the role of shallow syntax-aware representations for NLP tasks using two techniques. First, we enhance the ELMo architecture to allow pretraining on predicted shallow syntactic parses, instead of just raw text, so that co… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  20. arXiv:1906.10225  [pdf, other

    cs.CL stat.ML

    Compound Probabilistic Context-Free Grammars for Grammar Induction

    Authors: Yoon Kim, Chris Dyer, Alexander M. Rush

    Abstract: We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context-free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our grammar's rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assum… ▽ More

    Submitted 29 March, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  21. arXiv:1906.06438  [pdf, other

    cs.CL cs.LG

    Scalable Syntax-Aware Language Models Using Knowledge Distillation

    Authors: Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, Phil Blunsom

    Abstract: Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of tra… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  22. arXiv:1904.06834  [pdf, other

    cs.LG cs.CL stat.ML

    An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search

    Authors: Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick

    Abstract: Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practica… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: Long paper at NAACL 2019

  23. arXiv:1904.03746  [pdf, other

    cs.CL stat.ML

    Unsupervised Recurrent Neural Network Grammars

    Authors: Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis

    Abstract: Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNG… ▽ More

    Submitted 4 August, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: NAACL 2019

  24. arXiv:1901.11373  [pdf, other

    cs.LG cs.CL stat.ML

    Learning and Evaluating General Linguistic Intelligence

    Authors: Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom

    Abstract: We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of ex… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

  25. arXiv:1811.10475  [pdf, other

    cs.CL cs.AI cs.LG

    Sentence Encoding with Tree-constrained Relation Networks

    Authors: Lei Yu, Cyprien de Masson d'Autume, Chris Dyer, Phil Blunsom, Lingpeng Kong, Wang Ling

    Abstract: The meaning of a sentence is a function of the relations that hold between its words. We instantiate this relational view of semantics in a series of neural models based on variants of relation networks (RNs) which represent a set of objects (for us, words forming a sentence) in terms of representations of pairs of objects. We propose two extensions to the basic RN model for natural language. Firs… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: 12 pages

  26. arXiv:1811.09353  [pdf, other

    cs.CL

    Learning to Discover, Ground and Use Words with Segmental Neural Language Models

    Authors: Kazuya Kawakami, Chris Dyer, Phil Blunsom

    Abstract: We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences. In contrast to previous segmentation models that treat word segmentation as an isolated task, our model unifies word discovery, learning how words fit together to form sentences, and, by conditioning… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 November, 2018; originally announced November 2018.

  27. arXiv:1808.10485  [pdf, other

    cs.CL

    Syntactic Scaffolds for Semantic Structures

    Authors: Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, Noah A. Smith

    Abstract: We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks. Syntactic scaffolds avoid expensive syntactic processing at runtime, only making use of a treebank during training, through a multitask objective. We improve over strong baselines on PropBank semantics, frame semantics, and coreference resolution, achieving competitive performance on all th… ▽ More

    Submitted 30 August, 2018; originally announced August 2018.

    Comments: Accepted at EMNLP 2018

  28. arXiv:1808.00508  [pdf, other

    cs.NE

    Neural Arithmetic Logic Units

    Authors: Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, Phil Blunsom

    Abstract: Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  29. arXiv:1806.04127  [pdf, other

    cs.CL

    Finding Syntax in Human Encephalography with Beam Search

    Authors: John Hale, Chris Dyer, Adhiguna Kuncoro, Jonathan R. Brennan

    Abstract: Recurrent neural network grammars (RNNGs) are generative models of (tree,string) pairs that rely on neural networks to evaluate derivational choices. Parsing with them using beam search yields a variety of incremental complexity metrics such as word surprisal and parser action count. When used as regressors against human electrophysiological responses to naturalistic text, they derive two amplitud… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: ACL2018

  30. arXiv:1806.01261  [pdf, other

    cs.LG cs.AI stat.ML

    Relational inductive biases, deep learning, and graph networks

    Authors: Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals , et al. (2 additional authors not shown)

    Abstract: Artificial intelligence (AI) has undergone a renaissance recently, making major progress in key domains such as vision, language, control, and decision-making. This has been due, in part, to cheap data and cheap compute resources, which have fit the natural strengths of deep learning. However, many defining characteristics of human intelligence, which developed under much different pressures, rema… ▽ More

    Submitted 17 October, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

  31. arXiv:1805.11749  [pdf, other

    cs.CL

    Unsupervised Text Style Transfer using Language Models as Discriminators

    Authors: Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, Taylor Berg-Kirkpatrick

    Abstract: Binary classifiers are often employed as discriminators in GAN-based unsupervised style transfer systems to ensure that transferred sentences are similar to sentences in the target domain. One difficulty with this approach is that the error signal provided by the discriminator can be unstable and is sometimes insufficient to train the generator to produce fluent language. In this paper, we propose… ▽ More

    Submitted 29 January, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: NeurIPS camera ready

  32. arXiv:1805.09208  [pdf, other

    stat.ML cs.CL cs.LG

    Pushing the bounds of dropout

    Authors: Gábor Melis, Charles Blundell, Tomáš Kočiský, Karl Moritz Hermann, Chris Dyer, Phil Blunsom

    Abstract: We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that… ▽ More

    Submitted 27 September, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

  33. arXiv:1803.10049  [pdf, other

    cs.LG stat.ML

    Fast Parametric Learning with Activation Memorization

    Authors: Jack W Rae, Chris Dyer, Peter Dayan, Timothy P Lillicrap

    Abstract: Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fast-learning non-parametric model which stores recent activations and class labels into an exter… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

  34. arXiv:1803.03453  [pdf, other

    cs.NE

    The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities

    Authors: Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg, Julie Beaulieu, Peter J. Bentley, Samuel Bernard, Guillaume Beslon, David M. Bryson, Patryk Chrabaszcz, Nick Cheney, Antoine Cully, Stephane Doncieux, Fred C. Dyer, Kai Olav Ellefsen, Robert Feldt, Stephan Fischer, Stephanie Forrest, Antoine Frénoy, Christian Gagné, Leni Le Goff, Laura M. Grabowski, Babak Hodjat, Frank Hutter , et al. (28 additional authors not shown)

    Abstract: Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms su… ▽ More

    Submitted 21 November, 2019; v1 submitted 9 March, 2018; originally announced March 2018.

  35. arXiv:1803.03324  [pdf, other

    cs.LG stat.ML

    Learning Deep Generative Models of Graphs

    Authors: Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, Peter Battaglia

    Abstract: Graphs are fundamental data structures which concisely capture the relational structure in many important real-world domains, such as knowledge graphs, physical and social interactions, language, and chemistry. Here we introduce a powerful new approach for learning generative models over graphs, which can capture both their structure and attributes. Our approach uses graph neural networks to expre… ▽ More

    Submitted 8 March, 2018; originally announced March 2018.

    Comments: 21 pages

  36. arXiv:1801.10293  [pdf, other

    cs.CL

    Paraphrase-Supervised Models of Compositionality

    Authors: Avneesh Saluja, Chris Dyer, Jean-David Ruvini

    Abstract: Compositional vector space models of meaning promise new solutions to stubborn language understanding problems. This paper makes two contributions toward this end: (i) it uses automatically-extracted paraphrase examples as a source of supervision for training compositional models, replacing previous work which relied on manual annotations used for the same purpose, and (ii) develops a context-awar… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: This paper was originally submitted for review at NAACL 2015 and ACL 2015. This version maintains the original author affiliation "as-is" (as of when the work was done)

  37. arXiv:1712.07040  [pdf, other

    cs.CL cs.AI cs.NE

    The NarrativeQA Reading Comprehension Challenge

    Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

    Abstract: Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  38. End-to-End Neural Segmental Models for Speech Recognition

    Authors: Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

    Abstract: Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has bee… ▽ More

    Submitted 15 August, 2017; v1 submitted 1 August, 2017; originally announced August 2017.

  39. arXiv:1708.00111  [pdf, other

    cs.LG cs.CL cs.NE

    A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models

    Authors: Kartik Goyal, Graham Neubig, Chris Dyer, Taylor Berg-Kirkpatrick

    Abstract: Beam search is a desirable choice of test-time decoding algorithm for neural sequence models because it potentially avoids search errors made by simpler greedy methods. However, typical cross entropy training procedures for these models do not directly consider the behaviour of the final decoding method. As a result, for cross-entropy trained models, beam decoding can sometimes yield reduced test… ▽ More

    Submitted 6 October, 2017; v1 submitted 31 July, 2017; originally announced August 2017.

    Comments: Updated for clarity and notational consistency

    ACM Class: I.2.7; I.2.6

  40. arXiv:1707.05589  [pdf, other

    cs.CL

    On the State of the Art of Evaluation in Neural Language Models

    Authors: Gábor Melis, Chris Dyer, Phil Blunsom

    Abstract: Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods w… ▽ More

    Submitted 20 November, 2017; v1 submitted 18 July, 2017; originally announced July 2017.

  41. arXiv:1706.09528  [pdf, other

    cs.CL

    Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold

    Authors: Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith

    Abstract: We present a new, efficient frame-semantic parser that labels semantic arguments to FrameNet predicates. Built using an extension to the segmental RNN that emphasizes recall, our basic system achieves competitive performance without any calls to a syntactic parser. We then introduce a method that uses phrase-syntactic annotations from the Penn Treebank during training only, through a multitask obj… ▽ More

    Submitted 28 June, 2017; originally announced June 2017.

  42. arXiv:1706.02596  [pdf, other

    cs.CL cs.AI cs.NE

    Dynamic Integration of Background Knowledge in Neural NLU Systems

    Authors: Dirk Weissenborn, Tomáš Kočiský, Chris Dyer

    Abstract: Common-sense and background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, this knowledge must be acquired from training corpora during learning, and then it is static at test time. We introduce a new architecture for the dynamic integration of explicit background knowledge in NLU models. A general-purpose reading module reads… ▽ More

    Submitted 21 August, 2018; v1 submitted 8 June, 2017; originally announced June 2017.

  43. arXiv:1705.07860  [pdf, other

    cs.LG cs.CL stat.ML

    On-the-fly Operation Batching in Dynamic Computation Graphs

    Authors: Graham Neubig, Yoav Goldberg, Chris Dyer

    Abstract: Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the bat… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

  44. arXiv:1705.04146  [pdf, other

    cs.AI cs.CL cs.LG

    Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

    Authors: Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

    Abstract: Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mat… ▽ More

    Submitted 23 October, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

  45. arXiv:1705.02925  [pdf, other

    cs.CL

    Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

    Authors: Pradeep Dasigi, Waleed Ammar, Chris Dyer, Eduard Hovy

    Abstract: Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined in WordNet and represent a word token in a particular context by estimating a distribution over relevant semantic concepts. We use the new, context-sensitive embeddi… ▽ More

    Submitted 8 May, 2017; originally announced May 2017.

    Comments: ACL 2017

  46. arXiv:1704.06986  [pdf, ps, other

    cs.CL

    Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

    Authors: Kazuya Kawakami, Chris Dyer, Phil Blunsom

    Abstract: Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the "bursty" distribution of such words. In this paper, we augment a… ▽ More

    Submitted 23 April, 2017; originally announced April 2017.

    Comments: ACL 2017

  47. arXiv:1704.06970  [pdf, other

    cs.CL cs.LG cs.NE

    Differentiable Scheduled Sampling for Credit Assignment

    Authors: Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick

    Abstract: We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-to-sequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure (Bengio et al., 2015)--a well-known technique for correcting exposure bias--we introduce a new training objective that is continuous a… ▽ More

    Submitted 23 April, 2017; originally announced April 2017.

    Comments: Accepted at ACL2017 (http://bit.ly/2oj1muX)

    ACM Class: I.2.7; I.2.6

  48. arXiv:1703.01898  [pdf, other

    stat.ML cs.CL cs.LG

    Generative and Discriminative Text Classification with Recurrent Neural Networks

    Authors: Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom

    Abstract: We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find tha… ▽ More

    Submitted 25 May, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

  49. arXiv:1702.06378  [pdf, other

    cs.CL

    Multitask Learning with CTC and Segmental CRF for Speech Recognition

    Authors: Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith

    Abstract: Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels… ▽ More

    Submitted 5 June, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

    Comments: 5 pages, 2 figures, camera ready version at Interspeech 2017

  50. arXiv:1701.03980  [pdf, other

    stat.ML cs.CL cs.MS

    DyNet: The Dynamic Neural Network Toolkit

    Authors: Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

    Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva… ▽ More

    Submitted 14 January, 2017; originally announced January 2017.

    Comments: 33 pages