Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik


Arbeitsgruppe Wissensmanagement in der Bioinformatik

Neue Entwicklungen im Datenbankbereich und in der Bioinformatik

Prof. Ulf Leser

  • wann/wo? siehe Vortragsliste

Dieses Seminar wird von den Mitgliedern der Arbeitsgruppe als Forum der Diskussion und des Austauschs genutzt. Studierende und Gäste sind herzlich eingeladen.

Folgende Vorträge sind bisher vorgesehen:


Termin & Ort Thema Vortragende(r)
Tuesday, 10.05.22, 9-10 am (Humboldt Kabinett)
Clinical classification of common cancers by means of deconvolution-based machine learning
Melanie Fattohi
Friday, 20.05.22, 10 am (online) Location-Aware Workflow Scheduling for Black-Box Tasks Fabian Lehmann
Friday, 28.05.22, 14.00 s.t (online) Neural Normalization of Biomedical Named Entities Christopher Schiefer


Clinical classification of common cancers by means of deconvolution-based machine learning (Melanie Fattohi)

Clinical classification of cancers for precision oncology is hindered by diverse cell type compositions of tumors strongly influencing a tumor’s characteristics. For accurate prediction of prognosis and treatment outcome, methods are required to support decision-making processes. Computational methods, such as machine learning (ML) and transcriptomic deconvolution, have been demonstrated to be valuable for the elucidation of cellular heterogeneity of tumors.
In this thesis, we built a computational framework that integrates deconvolution-based ML for support of clinical classification of cancers. We further included functions for assessment of both statistical and clinical significance of deconvolution-based ML results After establishing the feasibility of deconvolution for prediction of cell type proportions of cancerous tissue, we applied our deconvolution-based ML framework to both pancreatic ductal adenocarcinomas (PDACs) and diffuse large B-cell lymphomas (DLBCLs) bulk RNA-sequencing (RNA-seq) data, two cancer types for which tumor heterogeneity is largely unresolved in humans.
By means of our deconvolution-based ML framework, we classified human PDACS based on predicted proportions of metaplastic cell types, into distinct clusters, which were statistically associated with clinically relevant characteristics. These PDAC clusters are potentially related to different cell-of-origins and may inform about the progression of human PDACs. Moreover, we built an independent classifier based on senescence-associated phenotype proportions for prediction of treatment responses of DLBCLs, which showed distinct survival. Thus, we established a potential transcriptomic relation of cellular senescence and treatment response of human DLBCLs through deconvolution-based ML.
The thesis demonstrated the potential of a deconvolution-based ML framework for support of elucidation of unresolved tumor heterogeneity and clinical classification of cancers. This approach may be applied as a data augmentation strategy for cancers where the procurement of data is hindered by biotechnological difficulties or by the rarity of a cancer

Location-Aware Workflow Scheduling for Black-Box Tasks (Fabian Lehmann)

Scientific workflows deal with hundreds of Gigabytes of data located in distributed file systems. 
Such filesystems abstract the files' locations from the workflow execution engine and scheduler.
Accordingly, tasks cannot be scheduled to the data, resulting in reduced caching abilities and unnecessary network I/O.
However, filesystems in shared clusters can hardly be changed or replaced.
This paper proposes a non-invasive way to decouple workflows from distributed storage by delegating the data placement responsibility to the workflow scheduler.
We propose a new location-aware scheduling algorithm that minimizes network usage by combining task and data placement. 
We implemented our approach prototypically into the workflow execution engine Nextflow running on the Resource Manager Kubernetes and evaluated our method with different workflows from Earth Observation and Bioinformatics domains. 


Kontakt: Patrick Schäfer; patrick.schaefer(at)