UC Berkeley |
UC Berkeley |
UC Berkeley |
Facebook AI Research |
We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures. Our model builds upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings. We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone. Trained with 3D pose estimations obtained from a large-scale dataset of internet videos, our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input. We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation. We demonstrate that our method outperforms previous state-of-the-art approaches and can generalize beyond the monologue-based training data to multi-person conversations.
We demonstrate the utility of leveraging surprisingly strong correlations between a speaker's body and hand poses.
While current image-based SoTA methods often fail on obstructed views of the hands, our prior based on body motion provides an additional cue for hand pose estimation to overcome challenges caused by fundamental depth ambiguity, frequent self-occlusion, and severe motion blur. Furthermore, we consider the temporal aspect of the input, allowing our method to produce smoother, more realistic hand sequences.
To learn the novel deep body prior in a data-driven way, we formulate a predictive task: given the body poses of a speaker, the goal is to predict their corresponding hand poses. Our method is trained on in-the-wild 3D motion data.
|
@article{ng2021body2hands,
|
---|