Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Dissertation: Herr Tobias Rawald

  • Wann 01.12.2017 von 16:00 bis 17:00
  • Wo Rudower Chaussee 25, Humboldt-Kabinett
  • iCal

lm Rahmen seines Promotionsverfahrens verteidigt Herr Tobias Rawald seine Dissertation
zum Thema "Scalable and Efficient Analysis of Large High-Dimensional Data Sets in the Context of Recurrence Analysis"

Alle Interessierten sind herzlich eingeladen.



Scalable and Efficient Analysis of Large High-Dimensional Data Sets in the Context of Recurrence Analysis

Recurrence analysis is a method from nonlinear time series analysis to investigate the recurrent behaviour of a system, e.g., the Earth’s climate system. Among others, it comprises a technique to quantitatively assess the contents of binary similarity matrices. This recurrence quantification analysis (RQA) relies on the identification of line structures within such recurrence matrices and provides a set of scalar measures. Existing computing approaches to RQA are either not capable of processing recurrence matrices exceeding a certain size or suffer from long runtimes considering time series that contain hundreds of thousands of data points. This thesis introduces scalable recurrence analysis (SRA), which is an alternative computing approach that subdivides a recurrence matrix into multiple sub matrices. Each sub matrix is processed individually in a massively parallel manner by a single compute device. This is implemented exemplarily using the OpenCL framework. SRA further  enables the parallel processing of multiple sub matrices using a set of compute devices. It is shown that this approach delivers drastic performance improvements in comparison to state-of-the-art recurrence analysis software by exploiting the computing capabilities of many-core hardware architectures, in particular graphics cards. This reduces the runtime for analysing time series exceeding one million data points from hours or days to minutes. The usage of OpenCL allows to execute identical SRA implementations on a variety of hardware platforms having different architectural properties. One major challenge is that an implementation may expose varying performance characteristics across different compute devices. An extensive evaluation analyses the impact of applying concepts from database technology, such memory storage layouts, to the recurrence analysis processing pipeline. It is investigated how different realisations of these concepts, e.g., row-store vs. column-store layout, affect the performance of the computations on different types of compute devices. This does not only include the runtime behaviour but also additional performance counters,
such as the amount of data fetched from compute device memory. Finally, an approach based on automatic performance tuning is introduced that autonomously selects well-performing RQA implementations for a given analytical scenario on a specific computing hardware. The corresponding evaluation compares the performance of a set of greedy selection strategies while analysing a real-world time series from climate impact research. Among others, it is demonstrated that the customised auto-tuning approach allows to drastically increase the efficiency of the processing by adapting the implementation selection.