Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik


Arbeitsgruppe Wissensmanagement in der Bioinformatik

Neue Entwicklungen im Datenbankbereich und in der Bioinformatik

Prof. Ulf Leser

  • wann/wo? siehe Vortragsliste

Dieses Seminar wird von den Mitgliedern der Arbeitsgruppe als Forum der Diskussion und des Austauschs genutzt. Studierende und Gäste sind herzlich eingeladen.

Folgende Vorträge sind bisher vorgesehen:

Termin & Ort Thema Vortragende(r)
Freitag, 23.10.2020, 10 Uhr c.t., Online Benchmarking State-Of-The-Art Time Series Motif Discovery Algorithms (Master Thesis) Rafael Moczalla
Freitag, 23.10.2020, 11 Uhr c.t., Online Interpreting Decisions of Deep Neural Networks to Identify Binding Preferences of Transcription Factors Proft, Sebastian
Wednesday, 28.10.2020, 11 Uhr c.t., Online Extracting Pathways from Text by Generating Graphs Leon Weber
Friday, 13.11.2020, 11 Uhr c.t., Online Exploring Classification Score Profiles for Change Point Detection Arik Ermshaus


Benchmarking State-Of-The-Art Time Series Motif Discovery Algorithms (Rafael Moczalla)

Motif discovery, i.e. the search for very similar repetitive patterns in data, has become very important in the analysis of large amounts of data in recent years, such as the recognition of unknown DNA sequences that can be assigned to a biological function or the recognition of specific brain states using EEG data. Many state-of-the-art algorithms for motif discovery have been established up to now. The generation of data with ground-truth annotated motifs is not a trivial task. In this presentation we propose a generator that generates data with ground-truth annotated motifs. We present a benchmark consisting of data with ground-truth annotated motifs. Finally, we perform an objective comparison of the runtimes and accuracies of the state-of-the-art motif discovery algorithms MK, SCRIMP, Scan MK, Cluster MK Set Finder, GrammarViz, EMMA and Learn Motifs using our benchmark.

Interpreting Decisions of Deep Neural Networks to Identify Binding Preferences of Transcription Factors (Proft, Sebastian)

Experiments that can identify transcription factor binding sites are costly, timeintensive, and tissue-specific. Finding a way to identify these binding sites without running a sequencing experiment for every single combination of cell type and transcription factor are well sought after. There exist several machine learning methods that try to tackle this problem, but most rely on extensive feature selection. Artificial neural networks are one such method that allows the use of sequencing data directly. They can employ a method known as convolution to learn the short DNA segments that contain these binding sites. These convolutional neural networks have been shown to classify sequences containing transcription factor binding sites successfully, but how they do so is still not well understood. Extracting the information from these “black boxes” is difficult, as usually only the output is observed and used to determine their performance. In this work, we train convolutional neural networks on increasingly more difficult datasets and apply methods such as maximum activation, input optimization, and layer-wise relevance propagation (LRP). These methods have already been applied to neural networks used in computer vision and allow us to understand better which parts of the input are most important for their decision-making process. We will apply them to DNA sequences to extract the DNA segments that correspond to the relevant transcription factor binding sites.

Extracting Pathways from Text by Generating Graphs (Leon Weber)

Biological pathways consist of multiple biochemical reactions that interact frequently in a complex manner. However, existing techniques for extracting pathway information from literature either reduce the complexity by modelling all pathways as binary interactions between participants or require richly annotated gold standard corpora for complex event structures which are scarce. We present a novel approach to pathway extraction, based on generative graph models, that only requires weakly labeled text and still can capture the full complexity of biochemical pathways.

Exploring Classification Score Profiles for Change Point Detection (Arik Ermshaus)

In recent years, the amount of unlabelled sensor data has grown significantly through the increase in computational power and omnipresence of sensors such as in smart devices. The literature, however, contains a great selection of time series classification algorithms which in turn require labelled datasets for training. In this talk, we explore supervised learning to assist solving unsupervised time series problems. We propose a novel self-supervised methodology that identifies self-similar time series regions by attaching labels to the left and right regions of hypothetical split points and evaluating binary classification problems to create a classification score profile. This profile illustrates to which degree a time series can be split into self-similar regions at the split points. We explore classification score profiles for single change point detection, assess our framework on a benchmark dataset and compare it to rival methods.

Kontakt: Patrick Schäfer; patrick.schaefer(at)hu-berlin.de