Humboldt-Universität zu Berlin - Faculty of Mathematics and Natural Sciences - Databases and Information Systems


Foundations of Data Analysis Workflow Validation (FONDA A1)

Jul 2020 - Jun 2024
With Prof. Nicole Schweikardt, Humboldt-Universität zu Berlin

As part of the CRC 1404 FONDA: Subproject A1


In a large number of scientific disciplines, vast amounts of data are generated and analysed. Scientific discovery is then based complex data analysis workflows (DAWs), which are series of discrete analysis programs arranged in (often non-linear) pipelines. An important aspect in the design and operation of these DAWs is the systematic detection and avoidance of misguided executions. In this project, as part of the CRC 1404 FONDA, we will approach this problem as a query discovery problem: Given a set of execution traces of a DAW or a family of DAWs, find a set of concise queries over the log stream that separate runs that succeeded from those that fail. Query discovery, in contrast to statistical methods for failure prediction, has the advantage that queries can be understood more easily by the DAW developer, which makes adaptation of DAWs to avoid problematic situations possible.



Data Analysis Workflows for Interactive Scientific Exploration (FONDA A6)

Jul 2020 - Jun 2024
With Prof. Birte Kehr, Leibniz Institute for Immunotherapy, University of Regensburg

As part of the CRC 1404 FONDA: Subproject A6


Data analysis workflows (DAWs) for scientific discoveries are often exploratory. Furthermore, also the process of specifying a DAW is exploratory, involving the repeated adaptation of the current DAW specification based on results of previous executions or based on refined requirements. In this project, we will investigate means to support the explorative process of DAW specification systematically by developing a specification model for exploratory DAWs, its mapping to distributed DAW infrastructures, and abstractions for interactive exploratory DAWs that connect exploration spaces with states of DAW executions. We focus on DAWs for genome analysis, which are often long and complex and whose development involves numerous design choices and time-consuming trial-and-error phases.



Process Conformance under incomplete Information (ProCI)

Feb 2019 - Jan 2023
With Prof. Lars Grunske, Humboldt-Universität zu Berlin


Process-aware information systems coordinate the execution of a set of elementary actions to reach a business goal, where actions may be as fine-grained as function calls or as coarse-grained as complex business transactions. The behaviour of such systems is commonly described by process models. However, once data is recorded during runtime, the question of conformance emerges: how do the modelled behaviour of a system and its recorded behaviour relate to each other? With an increasing volume of event data, post-mortem conformance checking based on a complete history of a process’ execution is often not a viable option. The ProCI project, therefore, sets out to provide the foundations for conformance checking that breaks with the omnipresent assumption of comprehensive access to event data and enables reasoning under incomplete information. Specifically, models and algorithms are developed for sampled and online conformance checking. Given the exponential time complexity of common conformance checking techniques, the former strives for drastic improvements of the runtime by considering only a fraction of large volumes of event data. The latter targets the question of space efficiency when conformance checking is realised over streams of events rather than static logs.



More to come...