Pose Relation Transformer: Refine Occlusions for Human Pose Estimation

Image credit: Unsplash

Abstract

Accurately estimating the human pose is an essential task for many applications in robotics. However, existing pose estimation methods suffer from poor performance when occlusion occurs. Recent advances in NLP have been very successful in predicting the missing words conditioned on visible words. We draw upon the sentence completion analogy in NLP to guide our model to address occlusions in the pose estimation problem. We propose a novel approach that can mitigate the effect of occlusions motivated by the sentence completion task of NLP. In an analogous manner, we designed our model to reconstruct occluded joints given the visible joints utilizing joint correlations by capturing the implicit joint connectivity through the attention mechanism. In this ork, we propose a POse Relation Transformer (PORT) that captures the global context of the pose using self-attention and a local context by aggregating adjacent joint features. To supervise PORT in learning joint correlations, we guide PORT to reconstruct randomly masked joints, which we call Masked Joint Modeling (MJM). PORT trained with MJM adds to existing keypoint detection methods and successfully refines occlusions. Notably, PORT is a model-agnostic plug-and-play module for pose refinement under occlusion that can be plugged into any keypoint detector with substantially low computational costs. We conducted extensive experiments to demonstrate the advantage of PORT mitigating the occlusion on the hand and body pose PORT improves the pose estimation accuracy of existing human pose estimation methods by up to 16% with only 5% of additional parameters.w

Publication
In proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023

In proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023

Seunggeun Chi
Seunggeun Chi
Ph.D. student | Research Assistant

My research interests include distributed robotics, mobile computing and programmable matter.