A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation

Authors

  • Mufan Xue Advanced Research Institute of Multidisciplinary Sciences, Beijing Institute of Technology, Beijing 100081, China
  • Xinyu Wu Advanced Research Institute of Multidisciplinary Sciences, Beijing Institute of Technology, Beijing 100081, China
  • Jinlong Li Advanced Research Institute of Multidisciplinary Sciences, Beijing Institute of Technology, Beijing 100081, China
  • Xuesong Li School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
  • Guoyuan Yang Advanced Research Institute of Multidisciplinary Sciences, Beijing Institute of Technology, Beijing 100081, China School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China

DOI:

https://doi.org/10.1609/aaai.v38i6.28461

Keywords:

CV: Medical and Biological Imaging, CV: Applications, CV: Interpretability, Explainability, and Transparency, CV: Representation Learning for Vision

Abstract

Recently, convolutional neural networks (CNNs) have become the best quantitative encoding models for capturing neural activity and hierarchical structure in the ventral visual pathway. However, the weak interpretability of these black-box models hinders their ability to reveal visual representational encoding mechanisms. Here, we propose a convolutional neural network interpretable framework (CNN-IF) aimed at providing a transparent interpretable encoding model for the ventral visual pathway. First, we adapt the feature-weighted receptive field framework to train two high-performing ventral visual pathway encoding models using large-scale functional Magnetic Resonance Imaging (fMRI) in both goal-driven and data-driven approaches. We find that network layer-wise predictions align with the functional hierarchy of the ventral visual pathway. Then, we correspond feature units to voxel units in the brain and successfully quantify the alignment between voxel responses and visual concepts. Finally, we conduct Network Dissection along the ventral visual pathway including the fusiform face area (FFA), and discover variations related to the visual concept of `person'. Our results demonstrate the CNN-IF provides a new perspective for understanding encoding mechanisms in the human ventral visual pathway, and the combination of ante-hoc interpretable structure and post-hoc interpretable approaches can achieve fine-grained voxel-wise correspondence between model and brain. The source code is available at: https://github.com/BIT-YangLab/CNN-IF.

Published

2024-03-24

How to Cite

Xue, M., Wu, X., Li, J., Li, X., & Yang, G. (2024). A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6413-6421. https://doi.org/10.1609/aaai.v38i6.28461

Issue

Section

AAAI Technical Track on Computer Vision V