Brain-optimized inference improves reconstructions of fMRI brain activity

Reese Kneeland (rek@umn.edu)

University of Minnesota Department of Computer Science

Ghislain St-Yves (gstyves@umn.edu)

University of Minnesota Department of Neuroscience

Jordyn Ojeda (ojeda040@umn.edu)

University of Minnesota Department of Computer Science

Thomas Naselaris (nase0005@umn.edu)

University of Minnesota Department of Neuroscience

ArXiv preprint

Source code

Introduction

The exploration of decoding and reconstructing images from human brain activity has captivated scholars for years, presenting a window into how our brains interpret visual stimuli. With the advent of sophisticated artificial intelligence (AI) technologies and the release of large neuroimaging datasets, there has been a notable enhancement in our ability to produce reconstructions with high fidelity. Our research contributes a new technique to this space, introducing an additional refinement methodology we call "brain-optimized inference" (BOI). Our method significantly enhances the quality of image reconstructions derived from functional magnetic resonance imaging (fMRI) data by optimizing against the target brain activity directly. By iteratively refining each reconstruction during inference, our reconstructions more accurately reflect fine details captured in the corresponding brain activity. We optimize against brain activity by mapping our reconstructions to a predicted pattern of fMRI brain activity using a brain encoding model. Our procedure ensures that the reconstructions are not merely visually precise, but also embody a more authentic representation of the underlying neural mechanisms. This advancement holds the potential to deepen our understanding of the complex relationship between visual perception and brain function, marking a significant step forward in the field of cognitive neuroscience.

Methods

Our study utilized the Natural Scenes Dataset (NSD), comprising thousands of brain scans, to train and test our decoding and encoding models. The core of our method involves an iterative refinement process. Starting with initial reconstructions generated by a base decoding method, we employ a diffusion model to produce a distribution of reconstructed images. These images are then evaluated by how well they predict the original brain activity, using a brain-optimized encoding model. The images most aligned to the pattern of brain activity act as direct guidance for the next iteration, gradually narrowing the distribution of reconstructions down to images that best match the brain activity. This process is meticulously designed to balance the influence of high-level semantic and low-level structural information, ensuring the reconstructions align closely with the original visual stimuli.

Pipeline diagram for our BOI algorithm. The initial outputs from the base decoder (blue box) produce the CLIP vector c and an initial latent z vector to seed the rest of the pipeline, in which we iteratively align an image distribution to the measured brain activity. Details on the numbered and lettered components of each stage are detailed in the full paper.

Results

Our approach outperformed existing decoding methods, including the MindEye model used as the base decoder for our protocol, across several metrics, including assessments by human raters and various image feature metrics. Notably, the quality of reconstructions improved systematically with each iteration of our brain-optimized inference process, demonstrating a significant alignment with the original brain activity. This alignment varied across different areas of the visual cortex, revealing insights into the diversity of visual representations in the brain. High-level visual areas tended to align more quickly with the reconstructions, suggesting differences in how various parts of the visual cortex process visual information. Our method opens new avenues for fine-tuning image reconstruction methods and produces state-of-the-art results by improving upon the reconstruction fidelity of the MindEye base model. We offer this method as a novel neuroimaging analysis tool offering a deeper understanding of visual processing in the brain.

Comparative assessment of reconstruction methods for subject 1. The first column is the ground truth image, indicated by the red border, while the second column is the reconstruction from our brain-optimized inference stage on top of the MindEye decoding method. The remaining rows represent results from previous decoding methods, including MindEye (Scotti et al. 2023), Brain Diffuser (Ozcelik and VanRullen 2023), and the ”+Decoded Text” method from Tagaki et al. (Takagi and Nishimoto 2023)

Iterative analysis of high-quality reconstructions generated by our brain-optimized inference stage on top of the MindEye decoding method. The first row is the ground truth image, indicated by the red border, the second row is the reconstruction provided by the MindEye reconstruction method, and the remaining columns are the reconstructions produced at iterative stages of the BOI process. The last row is an image from the last output distribution of our Mind-Eye + BOI method. These particular samples converged at or before iteration 7.

Conclusion

Our brain-optimized inference algorithm presents a novel advancement in the field of image reconstruction from fMRI data. By prioritizing the congruence between reconstructed images and corresponding brain activity, our methodology introduces a novel inference procedure utilizing the brain's neural representation of the stimuli as guidance in the reconstruction process. Our results demonstrate that this method produces significant gains in image reconstruction accuracy and lays the groundwork for further investigation into the complex mechanisms of the visual cortex. We believe aligning reconstructions with brain activity will be a critical tool in the future evolution of decoding techniques, allowing for enhanced reconstruction accuracy and a more profound understanding of cerebral functionalities through direct interpretable neural guidance.