1. Self-attention mechanism to process sparse deformable point cloud data.
To label a cloud of sparse 3D points, resulted from a marker based motion capture session, containing outliers and missing data, is a highly ambiguous task. Our solution exploits a transformer architecture to capture local and global contextual information using self-attetion. SOMA consumes mocap point clouds directly and outputs a distribution
over marker labels.
In the image to the left the cube shows the marker of interest and color intensity depicts the average value of attention across frames of 50 randomly selected sequences. Each column shows a different marker. At the first layer (top) we see wider attention compared to the deepest layer (bottom).
2. Training on synthetic data.
Various noise sources can influence mocap data, namely: subject body shape, motion, marker layout and the exact placement of the markers on body, occlusions, ghost points, mocap hardware intrinsics, and more. To learn a robust model, we exploit AMASS [Mahmood et al. ICCV'19] and virtual markers on these bodies following the desired marker placement. Our novel synthetic mocap generation pipeline demonstrates generalization to real mocap datasets.
SOMA offers a robust marker based mocap auto solving solution that works with archival data, different mocap
technologies, poor data quality, and varying subjects and motions.