Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

Verteidigung Bachelorarbeit: Jakob Brand

defense of Jakob Brand regarding his Bachelor thesis on Implementation of SFA in the MESSI framework
  • Wann 19.08.2022 von 10:00 bis 11:00
  • Wo Zoom: https://hu-berlin.zoom.us/j/65837825144?pwd=UUhsL015N2YxT0NMQnhDVUVKQnBJQT09
  • Name des Kontakts
  • iCal

Abstract:

Similarity search is a common task for time series analysis. As the amount of data gathered by applications tends to increase, index structures which are able to search huge datasets are needed. The MESSI index for similarity search by Peng et al. enables fast exploration of huge datasets by exploiting SIMD operations, multi-threading and large-scale in-memory processing. The index structure uses the iSAX symbolic representation for dimensionality reduction, which is based on time-domain compression with fixed intervals.
For this thesis, the symbolic representation of MESSI for 1-nearest neighbour queries is adapted to use SFA, which relies on frequency-domain compression and adaptive breakpoints. This representation is implemented in a way that preserves the multi-threading and in-memory processing capabilities of MESSI.
For an experimental analysis, several real and synthetic datasets with average sizes of 100 GB are used and a new dataset of seismological data is created. After determining the optimal parameters for SFA and MESSI, the initial SAX approach as baseline is compared to the new SFA approach in terms of execution wall time and other search metrics. The evaluation shows that SFA takes more time for index creation because an additional preprocessing phase is required. For query answering, SFA outperforms the baseline on all datasets in execution wall time with improvements of up to 72%.