Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Visual Computing

Sparse Tracking of Deformable Objects

Sparse keypoint correspondences are needed for many Computer Vision applications like 3D reconstruction, image-based rendering, tracking, augmented reality or object detection. The task is particularly challenging, if the input data contains dynamic content. The most common approach to address this problem is to employ a keypoint detector and descriptor like SIFT. However, depending on factors like baseline, texturization, scene depth and accuracy of nearest neighbor calculation, the distribution and reliability of the results vary significantly (second illustration in the image below). One typical approach to improve results is to recover the underlying global model, e.g. the fundamental matrix, by applying a RANSAC method on the initial matching output and using it to remove outliers. However, this method can only capture global rigid motion and has high sensitivity to noise (third illustration in the image below).


Non-Rigid Dynamic Input Images; Basic SIFT Matching (11621 matches); Fundamental Matrix Filtering (10282 matches); Our Method (15960 matches)


Therefore, our approach tackles the problem of sparse tracking of dynamic image content by applying a local model. Basic descriptor matching is used to identify match candidates and a selection of initial correspondences. For the points associated this initialization, a dynamic Delaunay triangulation introduces a natural neighborhood and affine transformations between the triangles. Based on affine similarities to its neighborhood in the current selection, each match (candidate) can be evaluated. An iterative optimization procedure maximizes the number of selected correspondences while retaining local affinity between the neighbors. For all changes on the selection, an extended version of the dynamic Delaunay triangulation enables fast and localized recalculations of the affinities. The optimization results in a decreased number of incorrect correspondences and a significant increase in the total number of matches (fourth illustration in the image above). While not being as fast and simple as outlier rejection using epipolar geometry, the runtime of our approach is still by far governed by the keypoint extraction and basic matching process. The only assumption we make is local smoothness of the scene content between the images.


Our iterative optimization ran on the images above. The mesh is calculated for the first image and and shown for the second one. The left mesh represents the initial selection, the center one the end of filtering and the right one the final solution.





Johannes Furch and Peter Eisert
"An Iterative Method for Improving Feature Matches", Third Joint 3DIM/3DPVT Conference (3DV), Seattle, USA, 2013. [PDF] [VIDEO]

Johannes Furch and Peter Eisert
"Robust Key Point Matching for Dynamic Scenes", 9th European Conference on Visual Media Production (CVMP), London, UK, 2012. [PDF]