Skip to main content

Showing 1–25 of 25 results for author: Yamins, D L K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10447  [pdf, other

    cs.CV

    The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences

    Authors: Bria Long, Violet Xiang, Stefan Stojanov, Robert Z. Sparks, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank

    Abstract: Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient fo… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures, 4 tables and SI. Submitted to NeurIPS Datasets and Benchmarks

  2. arXiv:2312.06721  [pdf, other

    cs.CV

    Understanding Physical Dynamics with Counterfactual World Modeling

    Authors: Rahul Venkatesh, Honglin Chen, Kevin Feigelis, Daniel M. Bear, Khaled Jedoui, Klemen Kotar, Felix Binder, Wanhee Lee, Sherry Liu, Kevin A. Smith, Judith E. Fan, Daniel L. K. Yamins

    Abstract: The ability to understand physical dynamics is critical for agents to act in the world. Here, we use Counterfactual World Modeling (CWM) to extract vision structures for dynamics understanding. CWM uses a temporally-factored masking policy for masked prediction of video data without annotations. This policy enables highly effective "counterfactual prompting" of the predictor, allowing a spectrum o… ▽ More

    Submitted 22 July, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: ECCV 2024. Project page at: https://neuroailab.github.io/cwm-physics/

  3. arXiv:2311.00750  [pdf, other

    cs.CV cs.AI cs.LG

    Are These the Same Apple? Comparing Images Based on Object Intrinsics

    Authors: Klemen Kotar, Stephen Tian, Hong-Xing Yu, Daniel L. K. Yamins, Jiajun Wu

    Abstract: The human visual system can effortlessly recognize an object under different extrinsic factors such as lighting, object poses, and background, yet current computer vision systems often struggle with these variations. An important step to understanding and improving artificial vision systems is to measure image similarity purely based on intrinsic object properties that define object identity. This… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: First two authors contributed equally. Accepted at NeurIPS Datasets and Benchmarks Track 2023

  4. arXiv:2306.01828  [pdf, other

    cs.CV cs.AI

    Unifying (Machine) Vision via Counterfactual World Modeling

    Authors: Daniel M. Bear, Kevin Feigelis, Honglin Chen, Wanhee Lee, Rahul Venkatesh, Klemen Kotar, Alex Durango, Daniel L. K. Yamins

    Abstract: Leading approaches in machine vision employ different architectures for different tasks, trained on costly task-specific labeled datasets. This complexity has held back progress in areas, such as robotics, where robust task-general perception remains a bottleneck. In contrast, "foundation models" of natural language have shown how large pre-trained neural networks can provide zero-shot solutions t… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    ACM Class: I.2.10; I.4.8

  5. arXiv:2305.13452  [pdf, other

    cs.AI cs.LG

    Measuring and Modeling Physical Intrinsic Motivation

    Authors: Julio Martinez, Felix Binder, Haoliang Wang, Nick Haber, Judith Fan, Daniel L. K. Yamins

    Abstract: Humans are interactive agents driven to seek out situations with interesting physical dynamics. Here we formalize the functional form of physical intrinsic motivation. We first collect ratings of how interesting humans find a variety of physics scenarios. We then model human interestingness responses by implementing various hypotheses of intrinsic motivation including models that rely on simple sc… ▽ More

    Submitted 7 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 6 pages, 5 figures, accepted to CogSci 2023 with full paper publication in the proceedings

  6. arXiv:2305.13396  [pdf, other

    cs.LG cs.AI

    Developmental Curiosity and Social Interaction in Virtual Agents

    Authors: Chris Doyle, Sarah Shader, Michelle Lau, Megumi Sano, Daniel L. K. Yamins, Nick Haber

    Abstract: Infants explore their complex physical and social environment in an organized way. To gain insight into what intrinsic motivations may help structure this exploration, we create a virtual infant agent and place it in a developmentally-inspired 3D environment with no external rewards. The environment has a virtual caregiver agent with the capability to interact contingently with the infant agent in… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 6 pages, 5 figures, 2 tables; accepted to CogSci 2023 with full paper publication in the proceedings

  7. arXiv:2205.08515  [pdf, other

    cs.CV cs.AI

    Unsupervised Segmentation in Real-World Images via Spelke Object Inference

    Authors: Honglin Chen, Rahul Venkatesh, Yoni Friedman, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins, Daniel M. Bear

    Abstract: Self-supervised, category-agnostic segmentation of real-world images is a challenging open problem in computer vision. Here, we show how to learn static grouping priors from motion self-supervision by building on the cognitive science concept of a Spelke Object: a set of physical stuff that moves together. We introduce the Excitatory-Inhibitory Segment Extraction Network (EISEN), which learns to e… ▽ More

    Submitted 25 July, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: 25 pages, 10 figures

    ACM Class: I.2.10; I.4.8

  8. arXiv:2107.09133  [pdf, other

    cs.LG cond-mat.stat-mech q-bio.NC stat.ML

    The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

    Authors: Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel L. K. Yamins

    Abstract: In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate… ▽ More

    Submitted 28 December, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 78 pages, 9 figures, Neural Computation 2024

    Journal ref: Neural Computation (2024) 36 (1) 151-174

  9. arXiv:2106.08261  [pdf, other

    cs.AI cs.CV

    Physion: Evaluating Physical Prediction from Vision in Humans and Machines

    Authors: Daniel M. Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiao-Yu Fish Tung, R. T. Pramod, Cameron Holdaway, Sirui Tao, Kevin Smith, Fan-Yun Sun, Li Fei-Fei, Nancy Kanwisher, Joshua B. Tenenbaum, Daniel L. K. Yamins, Judith E. Fan

    Abstract: While current vision algorithms excel at many challenging tasks, it is unclear how well they understand the physical dynamics of real-world environments. Here we introduce Physion, a dataset and benchmark for rigorously evaluating the ability to predict how physical scenarios will evolve over time. Our dataset features realistic simulations of a wide range of physical phenomena, including rigid an… ▽ More

    Submitted 20 June, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 28 pages

    ACM Class: I.2.10; I.4.8; I.5

  10. arXiv:2103.14025  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    The ThreeDWorld Transport Challenge: A Visually Guided Task-and-Motion Planning Benchmark for Physically Realistic Embodied AI

    Authors: Chuang Gan, Siyuan Zhou, Jeremy Schwartz, Seth Alter, Abhishek Bhandwaldar, Dan Gutfreund, Daniel L. K. Yamins, James J DiCarlo, Josh McDermott, Antonio Torralba, Joshua B. Tenenbaum

    Abstract: We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge. In this challenge, an embodied agent equipped with two 9-DOF articulated arms is spawned randomly in a simulated physical home environment. The agent is required to find a small set of objects scattered around the house, pick them up, and transport them to a desi… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

    Comments: Project page: http://tdw-transport.csail.mit.edu/

  11. arXiv:2012.04728  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech q-bio.NC stat.ML

    Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

    Authors: Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka

    Abstract: Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients derived from real-world datasets. We circumvent this obstacle through a unifying theoreti… ▽ More

    Submitted 29 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: 30 pages, 17 figures, ICLR 2021

  12. arXiv:2010.11765  [pdf, other

    q-bio.NC cs.LG stat.ML

    Identifying Learning Rules From Neural Network Observables

    Authors: Aran Nayebi, Sanjana Srivastava, Surya Ganguli, Daniel L. K. Yamins

    Abstract: The brain modifies its synaptic strengths during learning in order to better adapt to its environment. However, the underlying plasticity rules that govern learning are unknown. Many proposals have been suggested, including Hebbian mechanisms, explicit error backpropagation, and a variety of alternatives. It is an open question as to what specific experimental measurements would need to be made to… ▽ More

    Submitted 8 December, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020 Camera Ready Version, 21 pages including supplementary information, 13 figures

  13. arXiv:2007.04954  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

    Authors: Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins

    Abstract: We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. Unique properties include: real-time near-photo-realistic image rendering; a library of objects and environments, and routines for their customization; generative procedu… ▽ More

    Submitted 28 December, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Oral Presentation at NeurIPS 21 Datasets and Benchmarks Track. Project page: http://www.threedworld.org

  14. arXiv:2006.12373  [pdf, other

    cs.CV cs.LG

    Learning Physical Graph Representations from Visual Scenes

    Authors: Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum, Daniel L. K. Yamins

    Abstract: Convolutional Neural Networks (CNNs) have proved exceptional at learning representations for visual object categorization. However, CNNs do not explicitly encode objects, parts, and their physical properties, which has limited CNNs' success on tasks that require structured understanding of visual scenes. To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which re… ▽ More

    Submitted 24 June, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 23 pages; corrected affiliations and acknowledgments

    ACM Class: I.4.8; I.2.6

  15. arXiv:2006.05467  [pdf, other

    cs.LG cond-mat.dis-nn cs.CV q-bio.NC stat.ML

    Pruning neural networks without any data by iteratively conserving synaptic flow

    Authors: Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

    Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we ide… ▽ More

    Submitted 18 November, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020, 18 pages, 10 figures

    Journal ref: Advances in Neural Information Processing Systems 2020

  16. arXiv:2004.13664  [pdf, other

    cs.LG cs.CV stat.ML

    Visual Grounding of Learned Physical Models

    Authors: Yunzhu Li, Toru Lin, Kexin Yi, Daniel M. Bear, Daniel L. K. Yamins, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba

    Abstract: Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. The abilities to perform physical reasoning and to adapt to new environments, while intrinsic to humans, remain challenging to state-of-the-art computational models. In this work, we present a neural model that simultaneously reasons about physics and m… ▽ More

    Submitted 29 June, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: The second and the third authors contributed equally to this paper, and are listed in alphabetical order. Project Page: http://visual-physics-grounding.csail.mit.edu/

  17. arXiv:2003.01513  [pdf, other

    q-bio.NC cs.LG cs.NE stat.ML

    Two Routes to Scalable Credit Assignment without Weight Symmetry

    Authors: Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena, Surya Ganguli, Jonathan M. Bloom, Daniel L. K. Yamins

    Abstract: The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport $-$ the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another. Until recently, attempts to create local learning rules that avoid weight transport have typically failed in the large-scale learning scenarios where backpropagation s… ▽ More

    Submitted 24 June, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: ICML 2020 Camera Ready Version, 19 pages including supplementary information, 10 figures

  18. arXiv:1909.06161  [pdf, other

    cs.CV cs.LG cs.NE eess.IV q-bio.NC

    Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs

    Authors: Jonas Kubilius, Martin Schrimpf, Kohitij Kar, Ha Hong, Najib J. Majaj, Rishi Rajalingham, Elias B. Issa, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, Aran Nayebi, Daniel Bear, Daniel L. K. Yamins, James J. DiCarlo

    Abstract: Deep convolutional artificial neural networks (ANNs) are the leading class of candidate models of the mechanisms of visual processing in the primate ventral stream. While initially inspired by brain anatomy, over the past years, these ANNs have evolved from a simple eight-layer architecture in AlexNet to extremely deep and branching architectures, demonstrating increasingly better object categoriz… ▽ More

    Submitted 28 October, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2019 (Oral). Code available at https://github.com/dicarlolab/neurips2019

  19. arXiv:1807.00053  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG cs.NE

    Task-Driven Convolutional Recurrent Models of the Visual System

    Authors: Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. K. Yamins

    Abstract: Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortic… ▽ More

    Submitted 26 October, 2018; v1 submitted 20 June, 2018; originally announced July 2018.

    Comments: NIPS 2018 Camera Ready Version, 16 pages including supplementary information, 6 figures

  20. arXiv:1806.08047  [pdf, other

    cs.AI cs.CV cs.LG cs.NE

    Flexible Neural Representation for Physics Prediction

    Authors: Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Joshua B. Tenenbaum, Daniel L. K. Yamins

    Abstract: Humans have a remarkable capacity to understand the physical dynamics of objects in their environment, flexibly capturing complex structures and interactions at multiple levels of detail. Inspired by this ability, we propose a hierarchical particle-based object representation that covers a wide variety of types of three-dimensional objects, including both arbitrary rigid geometrical shapes and def… ▽ More

    Submitted 27 October, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

    Comments: 23 pages, 20 figures

  21. arXiv:1802.07461  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Emergence of Structured Behaviors from Curiosity-Based Intrinsic Motivation

    Authors: Nick Haber, Damian Mrowca, Li Fei-Fei, Daniel L. K. Yamins

    Abstract: Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to replicate some of these abilities with a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which the agent can move and interact with… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

    Comments: 6 pages, 5 figures

    MSC Class: 68

  22. arXiv:1802.07442  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Learning to Play with Intrinsically-Motivated Self-Aware Agents

    Authors: Nick Haber, Damian Mrowca, Li Fei-Fei, Daniel L. K. Yamins

    Abstract: Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network that implements curiosity-driven intrinsic motivation. Using a simple but ecologically naturalistic simulated environment in which an agent can move and intera… ▽ More

    Submitted 30 October, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: In NIPS 2018. 10 pages, 5 figures

    MSC Class: 68

  23. arXiv:1711.07425  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    Modular Continual Learning in a Unified Visual Environment

    Authors: Kevin T. Feigelis, Blue Sheffer, Daniel L. K. Yamins

    Abstract: A core aspect of human intelligence is the ability to learn new tasks quickly and switch between them flexibly. Here, we describe a modular continual reinforcement learning paradigm inspired by these abilities. We first introduce a visual interaction environment that allows many types of tasks to be unified in a single framework. We then describe a reward map prediction scheme that learns new task… ▽ More

    Submitted 11 December, 2017; v1 submitted 20 November, 2017; originally announced November 2017.

  24. arXiv:1706.07147  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    A Useful Motif for Flexible Task Learning in an Embodied Two-Dimensional Visual Environment

    Authors: Kevin T. Feigelis, Daniel L. K. Yamins

    Abstract: Animals (especially humans) have an amazing ability to learn new tasks quickly, and switch between them flexibly. How brains support this ability is largely unknown, both neuroscientifically and algorithmically. One reasonable supposition is that modules drawing on an underlying general-purpose sensory representation are dynamically allocated on a per-task basis. Recent results from neuroscience a… ▽ More

    Submitted 21 June, 2017; originally announced June 2017.

  25. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

    Authors: Charles F. Cadieu, Ha Hong, Daniel L. K. Yamins, Nicolas Pinto, Diego Ardila, Ethan A. Solomon, Najib J. Majaj, James J. DiCarlo

    Abstract: The primate visual system achieves remarkable visual object recognition performance even in brief presentations and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have… ▽ More

    Submitted 12 June, 2014; originally announced June 2014.

    Comments: 35 pages, 12 figures, extends and expands upon arXiv:1301.3530