Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Visual Computing

Warp-based Motion Compensation for Endoscopic Kymography

The opening and closing of the vocal folds at high frequencies is a major source of sound in human speech. Video kymography is a technique for visualizing the motion of the vocal folds for medical diagnosis: The vibrating folds are filmed with an endoscopic camera pointed into the larynx at a very high framerate to capture vocal fold vibration. The kymogram used for medical diagnosis is essentially a time-slice image, i.e. an X-t-cut through the X-Y-t image cube of the endoscopic video. The quality and diagnostic interpretability of a kymogram deteriorates significantly if the camera moves relative to the scene as this motion interferes with the vibratory motion of the vocal fold in the kymogram. Scene-to-camera motion caused by the patient or the operator of the endoscope is hard to avoid in medical practice. In this work, we propose an approach to stabilizing the motion of endoscopic video for kymography.

This motion compensation problem is challenging and different from motion compensation of handheld video in several respects: Firstly, the camera motion to be eliminated may be significantly larger than a typical camera shake due to the short distance between camera and scene. Secondly, not only the camera and the vocal folds move but the entire scene may be highly nonrigid, for example when the ariepiglottic fold and the uniform cartilage move when the patient takes breath. Therefore, a 3D camera estimation approach is not possible throughout the entire endoscopic sequence. Finally, the image quality of the input material can be challenging. Depending on the endoscopic system, the algorithm has to cope with high noise levels, large areas of saturated highlights, interlacing artifacts, depth of field blur, false colors, etc. We therefore propose an algorithm that deviates from the typical feature-based approaches to motion compensation, but is nevertheless parallelizable and realtime capable even on the CPU. Our method uses an image-based inverse mesh warping approach that can be stated as an optimization problem and solved efficiently in a robust Gauss-Newton framework. The inverse warping yields a piecewise affine deformation field between two successive frames. Using the motion field, a rigid image transformation can be computed to compensate for the camera motion.


Left: Vocal fold kymograms from two endoscopic sequences.Left column: no motion compensation. Center column: Deshaker compensation. Right column: proposed method. At the wide openings the patient takes breath during the recording. Right: Structure computation results for two sequences. Artificial lighting was added to emphasize the 3D shape.





D. Schneider, A. Hilsmann, P. Eisert,
Warp-based Motion Compensation for Endoscopic Kymography, Proc. Eurographics, short paper, Llandudno, UK, pp. 48-49, Apr. 2011. [PDF]

D. Schneider, A. Hilsmann, P. Eisert,
Deshaking Endoscopic Video for Kymography, Siggraph 2011 poster, Vancouver, Canada, pp. 83, Aug. 2011. [PDF]