syn_bx_neuro:

3D markerless tracking of speech movements with submillimeter accuracy
Speech movements are highly complex and require precise tuning of both spatial and timing of oral articulators to support intelligible communication. These properties also make measurement of speech movements challenging, often requiring extensive physical sensors placed around the mouth and face that are not easily tolerated by certain populations such as young children. Recent progress in machine learning-based markerless facial landmark tracking technology demonstrated its potential to provide lip tracking without the need for physical sensors, but whether such technology can provide submillimeter precision and accuracy in 3D remains unknown. Moreover, it is also unclear whether such technology can be applied to track speech movements in young children. Here, we developed a novel approach that integrates Shape Preserving Facial Landmarks with Graph Attention Networks (SPIGA), a facial landmark detector, and CoTracker, a transformer-based neural network model that jointly tracks dense points across a video sequence. We further examined and validated this novel approach by assessing its tracking precision and accuracy. The findings revealed that our approach that integrates SPIGA and CoTracker was more precise ({approx} 0.15 mm in standard deviation) than SPIGA alone ({approx} 0.35 mm). In addition, its 3D tracking performance was comparable to electromagnetic articulography ({approx} 0.29 mm RMSE against simultaneously recorded articulograph data). Importantly, the approach performed similarly well across adults and young children (i.e., 3- and 4-year-olds). Because our framework is built upon open-source pretrained models that are fully trained, it promotes accessibility and open science while saving computing resources. Furthermore, given that this framework combines a landmark detection model (SPIGA) with a tracker model (CoTracker) to improve precision/accuracy, our novel approach serves as a proof-of-concept for enhancing the performance of a wide variety of commonly used markerless tracking applications in biology and neuroscience.