Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

PhD Talk by Sarah Kleest-Meißner

*Exploring the Complexity of Event Query Discovery*

Eine Zoom-Einladung finden Sie hier. (nur mit HU/Informatik-Account)

Exploring the Complexity of Event Query Discovery

Abstract:

We propose multi-dimensional subsequence-queries with wildcards and gap-size constraints (mswg-queries, for short) as a expressive query model for sequence data. These queries consist of a query string s over an alphabet of variables and types, as well as a global window size and a number of local gap-size constraints. They are evaluated over a trace, i.e., a sequence of types, by replacing variables by single types, while satisfying the window and the gap-size constraints.

Based thereon, we define the task of discovering a mswg-query that describes best the traces from a given sample S, we provide an algorithm solving this task, and investigate its complexity. The latter identifies the subroutine for solving the matching problem (i.e., deciding whether a given query q matches in a given trace t) as bottle-neck. Our results show that the matching problem reduces to the discovery algorithm. Therefore, we analyse the complexity of the matching problem and identify tractable subclasses.

We discuss further extensions of mswg-queries for the one-dimensional setting, in fact, subsequence-queries with generalised gap-size constraints and disjunctive subsequence-queries.

The formal results are complemented by a description of our prototypical implementation of query discovery. Results from
evaluation experiments with synthetic data provide insights on sample characteristics that influence the running time, while experiments with real-world data indicate general feasibility of our approach.