Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Visual Computing

Non-Rigid Structure-from-Motion

Reconstructing the 3D shape of a deformable object from a monocular image sequence is a challenging problem, because multiple shape configurations can produce the same image projection. We address the problem of reconstructing volumetric non-rigid 3D geometries under full perspective projection by employing a 3D template model of the object in rest pose. Volumetric non-rigid reconstruction is even more challenging than the reconstruction of planar-like surfaces, because only the front part of the object surface is visible in the image, while the back part and the interior have to be inferred without direct image information. While the object starts to deform, captured by only a single camera, the non-rigid shape is reconstructed sequentially by estimating the camera parameters and the deformation with respect to the template model in an optimization framework.

Volumetric Deformable Structure from Motion

Non-rigid structure from motion plays an important role in computer vision application such as human-computer interaction, motion capture, tracking, etc. Without the knowledge of a shape deformation model, this task is severely under-constrained, because multiple shape configurations can project to the same image location. Our approach builds on volumetric template-based approaches and can be divided in two components:

  1. Template computation of rest pose
  2. Estimation of camera and shape deformation


First, we compute a 3D template model of the rest pose from a multi-view image set with traditional rigid reconstruction techniques. The template serves as geometric and topological prior for the NRSfM task, where the template model is modified in order to satisfy the constraints imposed by the new input image depicting the object in a deformed state. The energy function minimized comprises two main terms, one accounts for the data fitting, the other controls the smoothness of the deformation. The data fitting term enforces that specific 3D surface points project to the correct image location and penalizes volume configurations that project outside the object silhouette. The deformation is regularized three-fold by taking into account temporal and surface smoothness as well as volume preservation.

A template model for the jointed doll (first column, left), generated from a rigid multi-view sequence, is modified to be consistent with the monocular image sequence (second column). At each time instance the camera parameters are estimated from rigid correspondences in the background (here: book).



These videos shows an evaluation of our NrSfM approach on the galloping horse data set of Robert Sumner.



P. Fechteler, L. Kausch, A. Hilsmann, P. Eisert
Animatable 3D Model Generation from 2D Monocular Visual Data, Proc. IEEE International Conference on Image Processing (ICIP), Athens, Greece, October 2018.

L. Kausch, A. Hilsmann, P. Eisert
Template-Based 3D Non-Rigid Shape Estimation from Monocular Image Sequences, Proc. Vision, Modeling, and Visualization (VMV), Bonn, Germany, Sep. 2017.