Accurate 3D Pose Estimation From a Single Depth Image
 HP Labs, Palo Alto
 Bosch Research
 ETH Zürich
Figure 1. Examples of estimation results using pose tracking algorithms in  ((a) and (c)) and our method ((b) and (d)), from depth images captured by Kinect.
We present a novel system to estimate body pose configuration from a single depth map. It combines both pose detection and pose refinement. The input depth map is matched with a set of pre-captured motion exemplars to generate a body configuration estimation, as well as semantic labeling of the input point cloud. The initial estimation is then refined by directly fitting the body configuration with the observation (e.g., the input depth). In addition to the new system architecture, our other contributions include modifying a point cloud smoothing technique to deal with very noisy input depth maps, a point cloud alignment and pose search algorithm that is view-independent and efficient. Experiments on a public dataset show that our approach achieves significantly higher accuracy than previous state-of-art methods.
Given a point cloud, we ﬁrst remove irrelevant objects based on distance information, for which we use two ﬁxed distance thresholds representing the interested distance range throughout our test. A modiﬁed surface reconstruction algorithm is applied to remove noise. Then the cleaned point cloud is transformed into a canonical coordinate frame in order to remove viewpoint dependency, and a similar pose is identiﬁed in our motion database. Then a reﬁned pose conﬁguration is estimated through non-rigid registration between the input and the rendered depth map for the corresponding pose. We rely on database exemplars and a shape completion method to deal with large occlusions, i.e., missing body parts. Finally a failure detection and recovery mechanism is adopted to handle occasional failures from previous steps, using the temporal information.
1. Quantitative comparison with HC+EP Method  on publicly available dataset 
Overall mean error: 38mm (ours) vs. 100mm ()
2. Qualitative comparison with OpenNI 
Accurate 3D Pose Estimation from a Single Depth Image (pdf, video, poster)
Mao Ye, Xianwang Wang, Ruigang Yang, Liu Ren, Marc Pollefeys
International Conference on Computer Vision, 2011
Clarification: In table 1 in this paper, the numbers for our method and  ( in the paper) are actually obtained through experiments on different datasets. Our method is tested on the publicly available dataset ; while the method from  are tested on their synthetic data, which has less noises but larger varieties in terms of poses. This comparison (table 1) in the paper is therefore not quite appropriate.
 V. Ganapathi, C. Plagemann, D. Koller, and S. Thrun. Real time motion capture using a single time-of-ﬂight camera. CVPR2010.
 Primesense. OpenNI. http://www.openni.org/
 J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from a single depth image. CVPR 2011.
This work is supported in part by University of Kentucky Research Foundation, US National Science Foundation award IIS-0448185, CPA-0811647, MRI0923131, Microsoft’s ETH-EPFL Innovation Cluster for Embedded Software (ICES), as well as the EC’s FP7 European Research Council grant 4DVIDEO (n◦ 210806).