Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Disputation Stefan Sprenger

Wann 15.02.2019 ab 14:15 (Europe/Berlin / UTC100) iCal
Wo Rudower Chaussee 25, Humboldt-Kabinett

Am Freitag, den 15.2.2019, verteidigt ab 14.00 c.t. Herr Stefan Sprenger im HUK seine Dissertation zum Thema "Efficient Processing of Range Queries in Main Memory".

Alle Interessierten sind herzlich eingeladen.


Over the last two decades, the hardware landscape has fundamentally changed, shaping the architecture of modern database systems. First, main memory becomes a popular choice for primary data storage, since current server machines can feature up to several terabytes of main memory. Second, modern CPUs get tremendously parallel providing both data- and task-level parallelism. They support SIMD instructions, which can simultaneously process multiple data elements. Moreover, stand-alone CPUs host an ever increasing number of cores, which can be considered as independent processing units.

Recent many-core CPUs install up to 72 cores on one chip. Database systems employ index structures as means to accelerate search queries. Over the last years, the research community has proposed many different in-memory approaches that optimize cache misses instead of disk I/O, as opposed to disk-based systems, and make use of the grown parallel capabilities of modern CPUs. However, these techniques mainly focus on single-key lookups, but neglect equally important range queries. Range queries are an ubiquitous operator in data management commonly used in numerous domains, such as genomic analysis, sensor networks, or online analytical processing.

The main goal of this dissertation is thus to improve the capabilities of main-memory database systems with regard to executing range queries. To this end, we first propose a cache-optimized, updateable main-memory index structure, the cache-sensitive skip list, which targets the execution of range queries on single database columns. Second, we study the performance of multidimensional range queries on modern hardware, where data are stored in main memory and processors support SIMD instructions and multithreading. We re-evaluate a previous rule of thumb suggesting that, on disk-based systems, scans outperform index structures for selectivities of approximately 15-20% or more. To increase the practical relevance of our analysis, we also contribute a novel benchmark consisting of several realistic multidimensional range queries applied to realworld genomic data. Third, based on the outcomes of our experimental analysis, we devise a novel, fast and space-efficient, main-memory based index structure, the BBTree, which supports multidimensional range and point queries and provides a parallel search operator that leverages the multi-threading capabilities of modern CPUs.