Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Visual Computing


The research in the group comprises the following areas


Watch latest results also on our YouTube channel.


Seeing, Modeling, and Animating Humans


img_neuralfaces0.png Neural Face Modelling and Animation

We present a practical framework for the automatic creation of animatable human face models from calibrated multi-view data. Using deep neural networks, we are able to combine classical computer graphics models with image based animation techniques. Based on captured multi-view video footage, we learn a compact latent representation of facial expressions by training a variational auto-encoder on textured mesh sequences.


img_volvideo0.jpg Interactive Volumetric Video

We present a pipeline for creating high-quality animatable and alterable volumetric video content, which exploits captured high-quality real-world data as much as possible as it contains all-natural deformations and characteristics. Key features are the supplementation of captured data with semantics and animation properties and the leveraging of geometry- and video-based animation methods that permit direct animation. We suggest a three-tiered solution: modeling low-resolution features (e.g., coarse movements) using geometrics, overlying video-based textures to capture subtle movements and fine features, and synthesizing traditionally neglected features (e.g. eyes) using an autoencoder-based approach.


facetextures_teaser.png Texture Based Facial Animation/Re-Targeting

Generating photorealistic facial animations is still a challenging task in computer graphics. Image based methods achieve a high level of realism but do not offer the same animation flexibility as computer graphics models. On the other hand, working with computer graphics models requires experienced animators as well as high quality rigged 3D models to achieve the desired visual quality. We developed methods that combine high quality textures and approximate geometric models to achieve the high visual quality of image based methods while still retaining the flexibility of computer graphics models to allow for rigid transforms and deformation.


img_hybridfaces0.png Hybrid Facial Capture and Animation

High quality capture, tracking and animation of human faces is a challenging task. We are developing new models that do not follow the 'one fits all' paradigm. Instead we use non-/linear models to describe facial geometry (identity) and deformation (expression). Additionally, image based rendering techniques are used to model complex regions like eyes, mouth and lips.


img_kinematic0.png Modeling and Capturing 3D Shape and Motion of Humans

We investigate compact 3D representations to model captured human characters. Our model is based on a combination of Linear Blend Skinning and Dual Unit Quaternion Skinning. An optimization framework is investigated to automatically calculate all components of the model from example data without any user interaction or complex parameter specifications.


img_HQheads0.png Reconstruction and Rendering of the Human Head

Our algorithm for reconstructing the human head in high detail from calibrated images is passive, i.e. we do not use projected patterns (as, for example, structured light approaches do) and we don’t use photometric normals. We are interested in recovering not only fine details in the face but also the appearance and some geometric detail of the subject’s hair, thereby capturing the complete human head. This is challenging due to the intricate geometry of hair and its complex interaction with light.


Scenes, Structure and Motion


img_track6dof0.png Deep 6 DoF Object Detection and Tracking

The display of local assistance information in augmented reality (AR) systems to support assembly tasks requires precise object detection and registration. We present an optimized deep neural network for 6-dof pose estimation of multiple different objects in one pass. A pixel-wise object segmentation placed in the middle part of the network provides the input to control a feature recognition module. Our pipeline combines global pose estimation with a precise local real-time registration algorithm and solely synthetic images are used for training.


img_vehicles0.png Deep 3D Trajectories of Vehicles

We develop advanced deep learning based approaches for the estimation of 3D pose and trajectories of vehicles for vehicle behavior prediction. In our research, we address several challenges, e.g. 3D estimation from monocular images with scarce training data under different viewing angles. In particular, one specific goal is to monitor the traffic through a network of cameras communicating with a central node, which is in charge of analyzing the trajectories of the traffic participants in order to predict dangerous situations and warn the involved individuals about the imminent danger.


img_nonrigidsfm0.png Non-Rigid Structure-from-Motion

Reconstructing the 3D shape of a deformable object from a monocular image sequence is a challenging task. We address the problem of reconstructing volumetric non-rigid 3D geometries under full perspective projection by employing a 3D template model of the object in rest pose.


img_reflection0.png Reflection Analysis for Face Morphing Attack Detection

A facial morph is a synthetically created image of a face that looks similar to two different individuals and even facial identification systems cannot distinguish between the person in this synthetic image and the two individuals. We study the effects of the generation of such a morph on the physical correctness of the illumination to reveal this kind of frauds. Morphed face images do often contain implausible highlights, due to different face geometries and lighting situations in the original images. In fact, the shape and position of the highlights does not fit to the geometry of the face and/or illumination settings.


img_motionblur0.jpg Joint Tracking and Deblurring

Video tracking is an important task in many automated or semi-automated applications, like cinematic post production, surveillance or traffic monitoring. Most established video tracking methods fail or lead to quite inaccurate estimates when motion blur occurs in the video, as they assume that the object appears constantly sharp in the video. We developed a novel motion blur aware tracking method that estimates the continuous motion of a rigid 3-D object with known geometry in a monocular video as well as the sharp object texture.


sparsetracking_0.png Sparse Tracking of Deformable Objects

Finding reliable and well distributed keypoint correspondences between images of non-static scenes is an important task in Computer Vision. We present an iterative algorithm that improves a descriptor based matching result by enforcing local smoothness. The optimization results in a decrease of incorrect correspondences and a significant increase in the total number of matches.


3drecteaser.jpg High-Resolution 3D Reconstruction

We developed a binocular stereo method which is optimized for reconstructing surface detail and exploits the high image resolutions of current digital cameras. Our method occupies a middle ground between stereo algorithms focused at depth layering of cluttered scenes and multi-view ”object reconstruction” approaches which require a higher view count. It is based on global non-linear optimization of continuous scene depth rather than discrete pixel disparities. We use a mesh-based data-term for large images, and a smoothness term using robust error norms to allow detailed surface geometry.


img_SFT.jpg Shape from Texture

We present a shape-from-texture SFT formulation, which is equivalent to a single-plane/multiple-view pose estimation problem statement under perspective projection. As in the classical SFT setting, we assume that the texture is constructed of one or more repeating texture elements, called texels, and assume that these texels are small enough such that they can be modeled as planar patches. In contrast to the classical setting, we do not assume that a fronto-parallel view of the texture element is known a priori. Instead, we formulate the SFT problem akin to a Structure-from-Motion (SFM) problem, given n views of the same planar texture patch.


Computational Video


img_blood0.jpg Video-Based Blood Flow Analysis

The extraction of heart rate and other vital parameters from video recordings of a person has attracted much attention over the last years. In our research we examine the time differences between distinct spatial regions using remote photoplethysmography (rPPG) in order to extract the blood flow path through human skin tissue in the neck and face. Our generated blood flow path visualization corresponds to the physiologically defined path in the human body.


psibr.jpg Pose-Space Image-based Rendering

Achieving real photorealism by physically simulating material properties and illumination is still computationally demanding and extremely difficult. Instead of relying on physical simulation, we follow a different approach for photo-realistic animation of complex objects, which we call Pose-Space Image-Based Rendering (PS-IBR). Our approach uses images as appearance examples to guide complex animation processes, thereby combining the photorealism of images with the ability to animate or modify an object.


img_tracking0.png Surface Tracking and Interaction in Texture Space

We present a novel approach for assessing and interacting with surface tracking algorithms targeting video manipulation in postproduction. As tracking inaccuracies are unavoidable, we enable the user to provide small hints to the algorithms instead of correcting erroneous results afterwards. Based on 2D mesh warp-based optical flow estimation, we visualize results and provide tools for user feedback in a consistent reference system, texture space. In this space, accurate tracking results are reflected by static appearance, and errors can easily be spotted as apparent intensity change, making user interaction for tracking improvement results more intuitive.


Learning and Inference


img_anomaly0.png Anomaly Detection and Analysis

We develop deep learning based methods for the automatic detection and analysis of anomalies in images with a limited amount of training data. For example, we detect and localize different types of damages that can be present in big structures from images taken by unmanned vehicles. Some of the problems are the unclear definition of what constitutes a damage or its exact extension, the difficulty of obtaining quality data and labeling it, the consequent lack of abundant data for training and the great variability of appearance of the targets to be detected, including the underrepresentation of some particular types.


img_morph0.jpg Detection of Face Morphing Attacks

Facial recognition systems can easily be tricked such that they authenticate two different individuals with the same tampered reference image. We develop methods for fully automatic generation of this kind of tampered face images (face morphs) as well as methods to detected face morphs. Our face morph detection methods are based on semantic image content like highlights in the eyes or the shape and appearance of facial features.


Biomedical Image Analysis


img_multispectral0.png Multispectral Tissue Analysis

We address the automatic differentiation of human tissue using multispectral imaging with promising potential for automatic visualization during surgery. We develop and investigate several hyperspectral camera setups to monitor the different optical behavior of tissue types in vivo. The aim of this work is to collect and analyze these tissue behaviors for each setup to open up optical opportunities during surgery.


img_kymo0.jpg Warp-based Motion Compensation for Endoscopic Kymography

Endoscopic video kymography is a method for visualizing the motion of the plica vocalis (vocal folds) for medical diagnosis. The diagnostic interpretability of a kymogram deteriorates if camera motion interferes with vocal fold motion, which is hard to avoid in practice. We propose an algorithm for compensating strong camera motion for video kymography based on an image-based inverse warping scheme that can be stated as an optimization problem.


Augmented Reality


img_artracking0.jpg Tracking for Projector Camera Systems

We enable distortion-free projection with a fixed metric size for moving projector-camera systems by dynamically tracking the orientation and position of the projection plane only by analyzing the distortion of the projection by itself, independent of the presented content. This is achieved by extending an optical flow-based model to the geometry of a projector-camera unit and using adaptive edge images in order to reach a high invariance to illumination changes.


Virtual Clothing Virtual Clothing

We developed a virtual mirror prototype for clothes that uses a dynamic texture overlay method to change the color and a printed logo on a shirt while a user stands in front of the system wearing a prototype shirt with a (line)-pattern. Similar to looking into a mirror when trying on clothes, the same impression is created but for virtually textured garments. The mirror is replaced by a large display that shows the mirrored camera image, for example, the upper portion of a person's body... more


Virtual Shoes Virtual Shoes

We have developed a Virtual Mirror for the real-time visualization of customized sports shoes. Similar to looking into a mirror when trying on new shoes in a shop, we create the same impression but for virtual shoes that the customer can design individually. We replace the real mirror by a large display that shows the mirrored input of a camera capturing the legs and shoes of a person. 3-D real-time tracking of both feet and exchanging the real shoes by computer graphics models gives the impression of actually wearing the virtual shoes.


Previous Research Projects


Near Regular Texture Near Regular Texture Analysis for Image-Based Texture Overlay

Image-based texture overlay or retexturing is the process of augmenting a surface in an image or a video sequence with a new, synthetic texture. We have developed a new method for image-based retexturing that preserves both texture distortion caused by projecting the surface into the image plane and the shading and reflection properties.


Near Regular Texture Synthesis Near Regular Texture Synthesis

We developed a method to synthesize near-regular textures in a constrained random sampling approach. We treat the texture as regular and analyze the global regular structure of the input sample texture to estimate two translation vectors defining the size and shape of a texture tile. In a subsequent synthesis step, this structure is exploited to guide or constrain a random sampling process so that random samples of the input are introduced into the output preserving the regular structure. This ensures the stochastic nature of the irregularities in the output yet preserving the regular pattern of the input texture.


Retexturing Joint Estimation of Deformation and Shading for Dynamic Texture Overlay

We developed a dynamic texture overlay method to augment non-rigid surfaces in single-view video. We are particularly interested in retexturing non-rigid surfaces, whose deformations are difficult to describe, such as the movement of cloth. Our approach to augment a piece of cloth in a real video sequence is completely image-based and does not require any 3-dimensional reconstruction of the cloth surface, as we are rather interested in convincing visualization than in accurate reconstruction... more


Video Streaming Efficient Video Streaming for Highly Interactive 3D Applications

Remote visualization of interactive 3D applications with high quality and low delay is a long-standing goal. One approach to enable the ubiquitous usage of 3D graphics applications also on computational weak end devices is to execute the application on a server and to transmit the audio-visual output as a video stream to the client. In contrast to video broadcast, interactive applications like computer games require extremely low delay in the end to end transmission. In this work, we use an enhanced H.264 video codec for efficient and low delay video streaming. The 3D render context is used to speedup encoding.


FaceSeg Semi-Automated Segmentation of Human Head Portraits

In this project, a system for the semi-automated segmentation of frontal human head portraits from arbitrary unknown backgrounds has been developed. The first fully automated processing stage computes an initial segmentation by combining face feature information with image-deduced color models and a learned parametric head shape model. Precise corrections may then be applied by the user with minimum effort in an interactive refinement step.


IBRFace Image-based Rendering of Faces

In this work, image-interpolation from previous views of a video sequence is exploited in order to realistically render human heads including hair. A 3D model-based head tracker analyzes human motion and video frames showing different head poses are subsequentially stored in a database. New views are rendered by interpolating from frames in the database which show a similar pose. Remaining errors are corrected by warping with a approximate 3D model. The same model is exploited to warp the eye and mouth region of the current camera frame in order to show facial expressions.


Turntable 3D Reconstruction from Turntable Sequences

A method for the enhancement of geometry accuracy in shape-from- silhouette frameworks is presented. For the particular case of turntable scenarios, an optimization scheme has been developed that minimizes silhouette deviations which correspond to shape errors. Experiments have shown that the silhouette error can be reduced by a factor of more than 10 even after an already quite accurate camera calibration step. The quality of an additional texture mapping can also be drastically improved making the proposed scheme applicable as a preprocessing step in many different 3-D multimedia applications.


Facial Animation Facial Expression Analysis from Monocular Video

A 3D model-based approach for the estimation of facial expressions from monocular video is presented. The deformable motion in the face is estimated using the optical flow constraint in a hierarchical analysis-by-synthesis framework. A generic face model parameterized by facial animation parameters according to the MPEG-4 standard constrains the motion in the video sequences. Additionally, illumination is estimated to robustly deal with illumination changes. The estimated expression parameters can be applied to other model to allow for expression cloning of face morphing.