Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Probevortrag zur Promotion: Yanhua Zhao

„Gesture Recognition based on mm-Wave Radar Technology“

Der Vortrag findet RUD 25, im Humboldt-Kabinett 3.116 in englischer Sprache statt. Ein Abstract ist am Ende dieser E-Mail angefügt.
Online Teilnahme (mit Qualität 'best effort') ist über den folgenden Link möglich:



Human-computer interaction has become part of our daily lives in recent years. Contactbased
human-computer interaction, such as using a mouse and keyboard, is something we are
used to, however, if we can get rid of this type of contact-based medium, it will give us more
freedom and bring us more convenience. Radar stands out as a very promising sensor, with
its small size, low power consumption, and affordability. Compared to other sensors, such
as cameras and LIDAR, radar can work in a variety of environments, and it is not affected
by light. Most importantly, there is no risk of exposing the user’s privacy. Among the many
types of radar, FMCW radar is utilised for gesture recognition due to its ability to observe
multiple targets and to measure range, velocity and angle.
In this work, we first propose a novel approach for data preprocessing to interpret the
features of hand gestures in the format of heat maps. A compressed sensing-based background
modelling approach is introduced into our work to remove static background and noise from
the radar data. Range, velocity and angle features extracted from the radar data are combined
into one heat map. The machine learning model achieves an average recognition accuracy of
99.47% on the test set. This is 5.58% higher than the dataset without background modelling.
Machine learning models require huge amounts of data to achieve better performance,
however, the process of collecting radar data is tedious and lengthy. Therefore, we propose
two approaches to generating synthetic datasets for a heatmap-based gesture recognition
system. In the first approach, we combine animation with a radar simulator. We start by
constructing the hand animation in Blender and give the appropriate constraints to joints.
Then the trajectory of the animation movement is exported to the radar simulator to generate
the raw radar data. Finally, the feature maps are extracted from the raw data. In the second
approach, a generative adversarial network is employed. In contrast to the animation-based
synthesis method, the generative adversarial network-based method does not require the
construction of hand animations, but rather the learning on a small number of real datasets
directly to produce a synthetic dataset with high diversity. Experimental results indicate that
both of our proposed synthesis methods can effectively augment real datasets. The machine
learning model can achieve a very high recognition rate on real datasets by only training on
synthetic datasets.
When considering the implementation of a gesture recognition system at the hardware
level, the limited hardware resources motivate us to design a low-complexity feature extraction
flow. The features extracted according to the low-complexity algorithm are onedimensional
feature vectors. Compared to the dataset in the format of the image, it has a
smaller size and the complex convolutional network can be skipped. Experimental results
show that fewer features do not affect the recognition accuracy of the model on the test set.
In addition, the animation-based synthetic dataset generation method is simplified to match
this low-complexity algorithm. When the model is trained only on the synthetic dataset, it
can achieve a recognition rate of 89.13% on the real dataset.
A single radar-based gesture recognition system can achieve high gesture recognition
rates, but its stability needs to be improved. Therefore we extend the single-radar scenario to
a multi-radar scenario. In the multi-radar scenario, the gesture features observed by all radars
are fused together. The machine learning model achieves a higher recognition rate on the
fused dataset than on the single radar-based dataset. In the case that one of the radars fails or
collects poor-quality data, the model still recognises the gesture correctly with an incomplete
feature map. This demonstrates the stability and superiority of the multi-radar scenario.