Brain-optimized inference improves reconstructions of fMRI brain activity
Reese Kneeland (rek@umn.edu)
University of Minnesota Department of Computer Science
Ghislain St-Yves (gstyves@umn.edu)
University of Minnesota Department of Neuroscience
Jordyn Ojeda (ojeda040@umn.edu)
University of Minnesota Department of Computer Science
Thomas Naselaris (nase0005@umn.edu)
University of Minnesota Department of Neuroscience
Introduction
The exploration of decoding and reconstructing images from human brain activity has captivated scholars for years, presenting a window into how our brains interpret visual stimuli. With the advent of sophisticated artificial intelligence (AI) technologies and the release of large neuroimaging datasets, there has been a notable enhancement in our ability to produce reconstructions with high fidelity. Our research contributes a new technique to this space, introducing an additional refinement methodology we call "brain-optimized inference" (BOI). Our method significantly enhances the quality of image reconstructions derived from functional magnetic resonance imaging (fMRI) data by optimizing against the target brain activity directly. By iteratively refining each reconstruction during inference, our reconstructions more accurately reflect fine details captured in the corresponding brain activity. We optimize against brain activity by mapping our reconstructions to a predicted pattern of fMRI brain activity using a brain encoding model. Our procedure ensures that the reconstructions are not merely visually precise, but also embody a more authentic representation of the underlying neural mechanisms. This advancement holds the potential to deepen our understanding of the complex relationship between visual perception and brain function, marking a significant step forward in the field of cognitive neuroscience.
Methods
Our study utilized the Natural Scenes Dataset (NSD), comprising thousands of brain scans, to train and test our decoding and encoding models. The core of our method involves an iterative refinement process. Starting with initial reconstructions generated by a base decoding method, we employ a diffusion model to produce a distribution of reconstructed images. These images are then evaluated by how well they predict the original brain activity, using a brain-optimized encoding model. The images most aligned to the pattern of brain activity act as direct guidance for the next iteration, gradually narrowing the distribution of reconstructions down to images that best match the brain activity. This process is meticulously designed to balance the influence of high-level semantic and low-level structural information, ensuring the reconstructions align closely with the original visual stimuli.
Results
Our approach outperformed existing decoding methods, including the MindEye model used as the base decoder for our protocol, across several metrics, including assessments by human raters and various image feature metrics. Notably, the quality of reconstructions improved systematically with each iteration of our brain-optimized inference process, demonstrating a significant alignment with the original brain activity. This alignment varied across different areas of the visual cortex, revealing insights into the diversity of visual representations in the brain. High-level visual areas tended to align more quickly with the reconstructions, suggesting differences in how various parts of the visual cortex process visual information. Our method opens new avenues for fine-tuning image reconstruction methods and produces state-of-the-art results by improving upon the reconstruction fidelity of the MindEye base model. We offer this method as a novel neuroimaging analysis tool offering a deeper understanding of visual processing in the brain.
Conclusion
Our brain-optimized inference algorithm presents a novel advancement in the field of image reconstruction from fMRI data. By prioritizing the congruence between reconstructed images and corresponding brain activity, our methodology introduces a novel inference procedure utilizing the brain's neural representation of the stimuli as guidance in the reconstruction process. Our results demonstrate that this method produces significant gains in image reconstruction accuracy and lays the groundwork for further investigation into the complex mechanisms of the visual cortex. We believe aligning reconstructions with brain activity will be a critical tool in the future evolution of decoding techniques, allowing for enhanced reconstruction accuracy and a more profound understanding of cerebral functionalities through direct interpretable neural guidance.