syn_bx_neuro:

Decoding the Moving Mind: Multi-Subject fMRI-to-Video Retrieval with MLLM Semantic Grounding
Decoding dynamic visual information from brain activity remains challenging due to inter subject neural heterogeneity, limited per subject data availability, and the substantial temporal resolution gap between fMRI signals (0.5 Hz) and video dynamics (30 Hz). Current approaches face persistent challenges in addressing these temporal mismatches, demonstrate limited capacity to integrate subject specific neural patterns with shared representational frameworks, and lack adequate semantic granularity for aligning neural responses with visual content. To bridge these gaps, we propose a framework addressing these limitations through three innovations: (1) a Dynamic Temporal Alignment module that resolves temporal mismatches via exponentially weighted multiframe fusion with adaptive decay coefficients; (2) a Brain Mixture of Experts architecture that combines subject specific extractors with shared expert layers through parameter efficient tri modal contrastive learning; and (3) a Multi-perspective Semantic Hyper Anchoring module that resolves cross subject attention bias via multi-dimensional semantic decomposition, leveraging multimodal LLMs for fine grained video semantic extraction enabling the model to match individual attention patterns as different subjects naturally focus on distinct aspects of the same visual stimulus. This module boosts Top 10/Top 100 retrieval by 17.7%/6.6%. Experiments on two video fMRI datasets demonstrate state of the art performance, with 39%/30% improvements in Top 10/Top 100 accuracy over single subject baselines and 27% gains against multi subject models. The framework exhibits remarkable few shot adaptability, retaining 97% performance when using only 10% training data for new subjects. Visualization analysis confirms this generalization capability stems from effective disentanglement of subject specific and shared neural representations.