Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼
 

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

Forschungsseminar

Arbeitsgruppe Wissensmanagement in der Bioinformatik

Neue Entwicklungen im Datenbankbereich und in der Bioinformatik

Prof. Ulf Leser

  • wann/wo? siehe Vortragsliste

Dieses Seminar wird von den Mitgliedern der Arbeitsgruppe als Forum der Diskussion und des Austauschs genutzt. Studierende und Gäste sind herzlich eingeladen.

Folgende Vorträge sind bisher vorgesehen:


Termin & Ort Thema Vortragende(r)
Freitag, 18.10.2019, 10 Uhr c.t., RUD 25, 4.410 Neural Biomedical Named Entity Normalization Christopher Schiefer
Freitag, 29.11.2019, 11 Uhr c.t., RUD 25, 4.410 A framework for subword-level dictionary-based time series classification Leonard Clauß
Freitag, 07.02.2020, 10 Uhr c.t., RUD 25, 4.410 The Flair Framework and Research Challenges in Natural Language Processing Alan Akbik

Zusammenfassungen

Neural Biomedical Named Entity Normalization (Christopher Schiefer)

Methods for automatic information extraction from vast amounts of unstructured text become highly necessary due to the rapid growth of the biomedical literature. It is essential to identify biomedical entities in text documents in an automated way to enable tasks like searching for specific entities, providing document background information and linking similar documents. The process of linking a text mention to a specific entity identifier is called named entity normalization (NEN) or entity disambiguation. Previous approaches in the biomedical domain have been based on sets of manual rules, large dictionaries, and pre-defined features that are supposed to capture the knowledge of experts. In other domains, however, deep learning approaches have been able to outperform these traditional approaches significantly. We present a novel approach that leverages word embeddings as well as the latest improvements in biomedical Named Entity Recognition (NER) for the normalization of genes and compare it to state-of-the-art baseline approaches.

A framework for subword-level dictionary-based time series classification (Leonard Clauß)

The problem of time series classification appears in many important applications, for example detecting myocardial infarctions. The current state-of-the-art classifier WEASEL uses a dictionary-based approach. Specifically, the algorithm predicts the class of a time series by sliding windows of different sizes over it, discretizing each subsequence to a word and classifying based on the number of occurrences of these words. However, it does not consider subwords, i.e., subsequences of these words.
In this work, we evaluated different methods to select discriminative subwords: counting character-n-grams, finding long consecutive subsequences using Byte Pair Encoding and subwords with gaps using Apriori. To analyze their impact on classification we integrated them into the WEASEL pipeline and ran them on the UCR archive with 85 benchmark datasets. Our results show that compared to WEASEL the classification accuracy does not significantly differ for any approach. Due to the increased number of features, runtime and memory consumption are both increased significantly. Thus the examined methods do not provide a benefit for dictionary-based time series classification.

The Flair Framework and Research Challenges in Natural Language Processing (Alan Akbik)

Flair NLP is a widely-used open source framework for experimenting with different word embeddings in downstream NLP tasks such as named entity recognition (NER), text classification and similarity learning. In this talk, I give an overview of the framework with a focus on current research challenges. In particular, I present the idea of NLP models that never stop learning: such models can acquire new knowledge even at prediction time (i.e. after the training phase is completed) and so continuously improve. Applying this approach to NER, I show how we reach new state-of-the-art results across a range of evaluation tasks. I will also do a live-demo of such a model in action, to illustrate how it continues to learn during prediction. Time permitting, we will also discuss some general research ideas and open questions for future work.

Kontakt: Patrick Schäfer; patrick.schaefer(at)hu-berlin.de