Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Promotionsvortrag: Arik Ermshaus

"Process Discovery from Time Series"

  • Wann 14.01.2026 von 15:00 bis 16:30
  • Wo Rudower Chaussee 25,12489 Berlin, Humboldt-Kabinett - 3.116, online
  • Name des Kontakts
  • iCal

Der Promotionsvortragwird mit der Möglichkeit einer Online-Teilnahme auf Antrag durchgeführt. Interessierte melden sich bitte bei Prof. Leser per E-Mail (ulf.leser@hu-berlin.de)

Abstract

Many biological, industrial, and physical processes, such as human activities, manufacturing, or earthquakes, are continually monitored by sensors and smart devices. This leads to an ever-growing amount of high-resolution, unannotated time series data. These recordings encode recognizable properties of latent states and transitions that can be modelled as abstract processes. Domain experts use process information for analysis, while downstream analytics, such as human activity recognition, IoT condition monitoring, or others, rely on it as preprocessing for classification or anomaly detection. Process discovery from time series is commonly approached as an unsupervised machine learning task that divides observations into variable-sized, non-overlapping segments and labels each with a state identifier. Although important, autonomous and accurate use on large-scale data sets remains limited.

This talk addresses three shortcomings. First, current algorithms require experts to set domain-specific hyper-parameters. I present an accurate, hyper-parameter-free time series segmentation algorithm that automatically learns its parameters from the data at hand. Second, many accurate process discovery techniques do not scale to large or streaming data sets. I introduce a fast, streaming time series segmentation algorithm that efficiently reuses and updates core data structures. Third, after segmentation, states must be detected and labelled to derive abstract processes. I propose a hyper-parameter-free time series state detection algorithm that leverages the predictive power of supervised techniques in an unsupervised setting through self-supervision.

Overall, these methods enable a better degree of automatic application, improved scalability, and higher accuracy in time series process discovery, with open-source Python implementations and visualizations to support reproducibility and replicability.