Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

Finished Projects

  • simpatix: Similarity Search for Richly Annotated Structured Patient Cases, DFG, 2016 - 2019
  • Graduate School SOAMED: Service-Oriented Architectures in Medicine, DFG, 2010-2019
  • MAPTor-NET: MAPK-mTOR network model driven individualized therapies of pancreatic neuro-endocrine tumors (pNETs), BMBF, 2015-2018
  • Research unit Stratosphere: Cloud-enabled declarative information management, DFG, 2010-2017
  • BioPatent: and of Biomedical Patents, BMWi, 2015-2017
  • T-Sys: Systems biology of T helper cell, BMBF, 2013-2016
  • OncoPath: Oncogenic signaling and metabolic networks in solid cancer, BMBF, 2013-2016
  • BioBankCloud: Attacking the Biobank bottleneck, EU FP7, 2012-2016
  • Prositu: Genotoxic stress-induced signaling pathways in tumor cells, BMBF, 2012-2015
  • EU-Med: Assessing phenotypic impact of human mutation profiles, BMWi, 2012-2015
  • Virtual Liver: Modeling human liver physiology, morphology, and function, BMBF, 2010-2015
  • Graduate School METRIK: Model-Based Development of Self-Organizing Networks for Disaster Management, DFG, 2006-2015
  • BioGraph:Querying and analyzing biological networks, internal, 2003-2015
  • CellFinder: A Cell Data Repository, DFG, 2010-2012
  • Transregio 54: Growth and Survival, Plasticity and Cellular Interactivity of Lymphatic Malignancies, DFG, 2009-2012
  • ColoNET: A Systems Biology Approach for Integrating Molecular Diagnostics, BMBF, 2009-2012
  • Ali Baba: PubMed as a graph, internal, 2002-2012
  • Aladin: Almost Hands-Off Data Integration for the Life Sciences, within BCB, 2005-2013
  • Columba: An integrated database of protein structures, sequences, and annotations, within BCB, 2002-2007
  • BCB: Berliner Centre for Genome-Based Bioinformatics, BMBF, 2002-2006
  • Interdisciplinary Network for Bioinformatic and Linguistics, Berlin, 2003-2007
  • DDD: Deutsch.Diachron.Digital, internal, 2004-2006





simpatix: Similarity Search for Richly Annotated Structured Patient Cases


simpatix-logoHomepage: simpatix

Funding: DFG, to Dr. Starlinger

Period: 2016 - 2019

Project partner: Charite Berlin

Process-enhanced similarity search has a big potential to improve knowledge discovery and decision support in a number of disciplines, and especially in clinical medicine. Currently, this potential can not be exploited due to a lack of algorithms for both the creation of annotated process representations from unstructured content, and of methods for the effective comparison of such annotated processes. In the simpatix project, we focus on the medical domain where the central concept is a patient’s case, recorded in a (electronic) health record (EHR). Consisting of mostly unstructured or semi-structured data, such as clinical notes from examinations and treatments, tabularized data from quantitative tests (such as blood screenings), or discharge summaries, each case encodes a process describing the individual patient’s disease history. This project's main objectives are to a) develop methods for the construction of structured, process-oriented case representations from large data sets including unstructured documents; b) research algorithms for process-enhanced similarity search over richly annotated case collections; and to c) design and implement a generic repository to store process-enhanced case collections that allows scalable, effective similarity search.

Graduate School GRK 1651 SOAMED - Service-Oriented Architectures in Medicine



Funding: Deutsche Forschungsgemeinschaft

Period: 2010 - 2019 (now in its second funding period)

Project partner: Technische Universität Berlin, Hasso-Plattner-Institut Potsdam, Fraunhofer FIRST Berlin

Service orientation is a promising architectural concept to quickly and cost efficiently couple encapsulated software components ("services"), and to adapt them to new requirements. At the same time, Informatics is a key technology for the innovative organization of health care systems and of medical technology. In comparison with other organizational and embedded systems, the involved processes are more versatile, and the reliability and correctness requirements are higher. Medical processes are usually loosely coupled. Their integration is as much difficult as important. Theoretical and methodological foundations of both the design process and the structure of service-oriented systems might substantially improve today’s information technology In this situation, this Graduate School starts out with the idea to underpin the currently pragmatically focussed service-oriented approach with theoretical foundations by integrating established as well as emerging software engineering procedures. This approach aims at a decisive improvement of concepts, methods, and tool support for service-oriented system construction.

MAPTor-NET: MAPK-mTOR network model driven individualized therapies of pancreatic neuro-endocrine tumors



Funding: BMBF

Period: 2015 - 2018

Project partner: Charite Berlin, Universität Oldenburg

Pancreatic NET (pNET) comprise the most prominent subgroup group of rare Neuroendocrine tumors (NET) with distinct prognostic classes, and thus diverse therapeutic regimens. Available pNET treatments include somatostatin analogs, systemic chemotherapy, and novel molecular drugs targeting receptor tyrosine kinases (Sunitinib), or the mTOR pathway (Everolimus). However, tumor heterogeneity results in an unpredictable response to the therapy, and only a limited number of patients profits from either treatment. To date, no method for diagnostic stratification of patients exists. The MAPTor-Net consortium suggests a focused systems medicine approach that uses clinical and pathological data together with mutation/expression profiles to individually preselect patients prior to therapy. The approach uses a combination of top-down modeling of the core pathways altered in pNET, and a bottom-up approach gathering and integrating individual molecular data.

Research Unit Stratosphere: Cloud-enables Declarative Information Management



Funding: Deutsche Forschungsgemeinschaft

Period: 2010 - 2013 (first funding period), 2014-2017 (second funding period)

Project partner: Technische Universität Berlin, Hasso-Plattner-Institut Potsdam

The Collaborative Research Unit “Stratosphere” aims at advancing the state-of-art in data processing on parallel, architectures. Stratosphere explores the power of massively parallel computing for complex information management applications. We will develop a novel, database-inspired approach to analyze, aggregate, and query very large collections of either textual or (semi-)structured data on a virtualized, massively parallel architecture.
Stratosphere will conduct research in the areas of massively parallel data processing engines, a programming model for parallel data programming, robust optimization of declarative data flow programs, continuous re-optimization and adaptation of the execution, data cleansing, and text . The unit will validate its work through a benchmark of the overall system performance and by demonstrators in the areas of climate research, the biosciences and linked open data.

T-Sys: Systems biology of T helper cell, Immunomodulation - not just immunosuppression



Funding: BMBF

Period: 2013 - 2015

Project partner: DRFZ Berlin, CHarite Berlin, MDC Berlin, DKFZ Heidelberg, BIMSB Berlin, MicroDiscovery AG

Our aim is to identify hubs, autoregulatory loops, and limiting processes in Th cell activation and cell fate decision to discover potential sensitive targets for effective and specific manipulation of Th cells. We will concentrate not only on immunosuppression but also on immunomodulation of Th cells. The ultimate goal of such interference will be to balance the homeostasis of the immune system. To this end we will extend our activities by concerted involvement of i) screening studies, ii) broad manipulation experiments, iii) mouse models for certain diseases, and iv) investigation of patient samples.

OncoPath: Understanding oncogenic signaling and metabolic networks in solid cancer



Funding: BMBF

Period: 2013 - 2015

Project partner: Charite Berlin, MDC Berlin, MPI-MG Berlin, IMB Mainz

The major objective of the OncoPATH consortium is to dissect and model the impact of driver genetic alterations on biological properties of tumorigenic epithelial cells and their clinical behaviour at the systems level. The ultimate goal is to bridge the gap in the current understanding of the mechanisms controlling the response of individual tumor cell populations to targeted therapies. Using an integrated and iterative approach combining genome, transcriptome, proteome and metabolome analysis, functional assays in vitro and in vivo as well as mathematical modelling, we will study colon cancer as a paradigmatic malignant disease.

BioBankCloud: Attacking the Biobank Bottleneck



Funding: EU

Period: 2012 - 2015

Project partner: KTH Stockholm, Karolinska Institute Stockholm, University of Lisbon, Charite Berlin

As of 2012, a quantum shift is happening in the area of human genomics. A huge wave of big data is approaching, driven by the decreasing cost of sequencing genomic data, which has been halving every 4 months since 2004. Biobanks, that are used to store and catalogue human biological material, are not prepared to handle this wave of data - there is a biobank bottleneck: A lack of platform support for the storage, analysis and interconnection of the coming massive amounts of human genomic data.

Prositu: Genotoxic stress-induced signalling pathways in tumor cells


Funding: BMBF

Period: 2012 - 2015

Project partner: MDC Berlin, Charite Berlin

The aim of this project is the development of a quantitative model and the identification of biomarkers for the description and detection of the activation state of genotoxic stress-induced signalling pathways in tumor cells, using a combination of mathematical modelling and targeted proteomics.

Virtual Liver: Modeling human liver physiology morphology and function


Virtual Liver NetworkHomepage

Funding: BMBF

Period: 2010 - 2015

Project partner: 70 partners all over Germany

The Virtual Liver will be a dynamic model that represents, rather than fully replicates, human liver physiology morphology and function, integrating quantitative data from all levels of organization. Our part in the project is concerned with curator support for building the Virtual Liver Knowledge Base.

Graduate School GRK 1324 METRIK: Model-Based Development of Self-Organizing Networks for Catastrophy Management


graduate school logoHomepage: METRIK

Funding: Deutsche Forschungsgemeinschaft

Period: 2006-2010, 2010 - 2015

Project partner: See this list

Recent progress in basic research has lead to visions how to use new self-organizing networks for advanced information systems. These networks function without central administration – all nodes are able to adapt themselves to new environments autonomously and independently. The addition of new nodes or failure of individual nodes does not significantly impact the network’s ability to function properly. Information systems and underlying technologies for self-organizing networks, in the context of a specific application domain, are the central topic of research for this graduate school. The research focuses on the important technologies needed at each individual node of a self-organizing network. Research challenges within this graduate school include: finding a path through a network with the help of new routing protocols and forwarding techniques, replication of decentralized data, automated deployment and update of software components at runtime as well as work-load balancing among terminal devices with limited resources. Furthermore, non-functional aspects such as reliability, latency and robustness will be studied.

BioGraph:Querying and Analyzing Biological Networks



Funding: Bundesministerium für Bildung und Forschung (BCB)

Period: 2005 -

Project partner:

Graphs are playing an increasingly important role in many areas of biology. Examples are metabolic networks, networks of gene regulation, graphs formed of protein-protein interactions and complexes, and cascades in signal transduction. The size of the graphs under study have, due to improved experimental techniques, considerably grown in size, with many networks today reaching tens of thousands of nodes. In our project, we develop algorithms and systems for efficiently handling graphs of such sizes. In particular, we study graph-based query languages, indexing, cluster-based analysis and visualization.

CellFinder: A Cell Data Repository



Funding: Deutsche Forschungsgemeinschaft

Period: 2010 - 2012

Project partner: Charité Berlin

CellFinder will develop an advanced information system for managing diverse data on stem cell lines. Our part in this project is the adaptation and development of methods for extracting relevant information from biomedical publications.

Transregio 54: Growth and Survival, Plasticity and Cellular Interactivity of Lymphatic Malignancies, project Z3


Transregio 54Homepage: TRR-54

Funding: Deutsche Forschungsgemeinschaft

Period: 2009 - 2012

Project partner: See this list

A short description of the overall goals of the Transregio can be found here (in German)

The goal of subproject Z3 is to develop and maintain a comprehensive data management and analysis platform for the Transregio. The platform will be based on a central database for storing experimental data as produced in the other subprojects. All data will undergo a standardized preprocessing stage and will be reviewed with respect to defined quality criteria. We database will also link the experimental data with information that is integrated from external data sources. It will be accessible through an intuitive web interface that also includes customizable functionality for integrated data analysis and visualization. Thus, the project will help to build up a high-quality, comprehensive data set offering an optimal basis for subsequent studies.

ColoNET: A Systems Biology Approach for Integrating Molecular Diagnostics


Funding: Bundesministerium für Bildung und Forschung

Period: 2009 - 2012

Project partner: Charite Berlin, Institute for Theoretical Biology, HUB, MDC Berlin, DKFZ Heidelberg, Universität Potsdam, Universität Saarbrücken, Universität Halle, MicroDiscovery GmbH

Our project will develop a software for ranking potential biomarkers with respect to their diagnostic power for colorectal carcinoma. The ranking will be based on information extracted from scientific articles, models of the important pathways as created by other projects within ColoNet, experimental results and data integrated from public sources. All information will be filtered through a rigorous quality control and will be weighted based on the strength of their respective evidence.Furthermore, the project will provide support for searching and analyzing the relevant literature using text mining methods.

Ali Baba: PubMed as a graph


AliBaba logoHomepage: AliBaba (Java Web Start Application)

Funding: Bundesministerium für Bildung und Forschung (BCB), Max Planck Society

Period: 2002 - 2013

Project partner:

The extraction of interactions taking place between various biological objects from text has become a major point of research for text mining during the last years. Our group focuses on the collection of interaction networks, both to provide quick overviews over specified subparts of domains, and to build complete interaction graphs that can be queried afterwards. We put our current emphasis on mining protein-protein interactions from scientific publications.

Aladin: Almost Hands-Off Data Integration for the Life Sciences


aladin logo 

Funding: Bundesministerium für Bildung und Forschung (Leser), Deutsche Forschungsgemeinschaft (Naumann)

Period: 2005 - 2013

Project partner: Felix Naumann, Hasso-Plattner Institute Potsdam

Aladin aims - as Columba - on integration of databases in the life sciences. But opposed to Columba, Aladin's challenge is to integrate the data sources automatically. The fundamental idea is to work data-centric instead of schema-centric, which is besides its known disadvantages especially unsuitable for life science databases. Another major point for Aladin is to use domain-specific knowledge for integration strategies, e.g. common properties and structures of life science databases.

BCB: Berliner Centre for Genome-Based Bioinformatics


BCB logo 

Funding: Bundesministerium für Bildung und Forschung

Period: 2002 - 2006

Project partner: Charite, Max-Planck-Institute for Molecular Genetics, Freie Universität Berlin, Technische Fachhochschule Berlin, Max-Delbrück-Zentrum, Zuse Institut Berlin

The promise of Bioinformatics is to bridge the gap between genome research and medicine. It is the goal of the Berlin Center for Genome Based Bioinformatics (BCB) to realize this vision. The cooperating groups of BCB will attack the major problems in the informational synthesis of genome data

  • genomic annotation and knowledge management,
  • prediction of structure and function of gene products,
  • cellular and disease modelling.

Scientific efforts will be closely integrated with and complemented by educational efforts specifically promoting short-term education of bioinformatics specialists through a 1.5 year curriculum at Technische Fachhochschule Berlin (TFH), and a MSc course in bioinformatics at Freie Universität Berlin (FUB) in close collaboration with Humboldt Universität zu Berlin (HUB).

Columba: An integrated database of protein structures, sequences, and annotations


columba logo 

Funding: Bundesministerium für Bildung und Forschung

Period: 2004 - 2007

Project partner: Kristian Rother, Robert Preissner (Charitè); Prof. Freytag (HUB Informatik), Thomas Steinke (ZIB), Ina Koch (TFH Berlin)

Researchers interested in the analysis of protein structures often require not only the actual structure, but also textual annotation that is spread over different datasources.

In the project Columba we integrate many resources on proteins. Columba is centered around protein structures obtained from the Protein Data Bank (PDB). We add as much annotations as possible to the structures by describing their properties. These annotation include folding classification from SCOP and CATH, secondary structures calculated with DSSP, enzyme annotation from the ENZYME database, participation in metabolic pathways from KEGG, taxonomic classification from the NCBI Taxonomy, and function characterisation from Gene Ontology.

Interdisciplinary Network for Bioinformatic and Linguistics


biolinguistics logo 

Funding: Berliner Senat

Period: 2003 - 2007

Project partner: Prof. Lüdeling, Prof. Donhauser (Institut für Deutsche Sprache, HUB)

The project studies the two areas where linguistics and bioinformatics try to solve common problems. First, evolutionary biology tries to uncover the ancestral relationships between species. Similarly, historical linguistics is interested in the relationships of languages and dialects. Both use phylogenetic methods for this analysis. Second, biological databases annotate their objects with controlled vocabularies, ontologies, and thesauri, to denote gene function and structure. In a similar fashion, corpora in linguistic research are annotated with multiple and possible structured layers of knowledge describing morphology, syntax, and semantics of words and sentences. Both rely on efficient methods for dealing with large amounts of complex and structured annotation. In the project, we exploit and further develop synergies between these independent, yet highly related branches of research.

DDD: Deutsch.Diachron.Digital


DDD logo 


Period: 2004 - 2006

Project partner: Prof. Lüdeling, Prof. Donhauser (Institut für Deutsche Sprache, HUB), plus 16 further partners throughout Germany

The project Deutsch.Diachron.Digital is a German-wide and interdisciplinary initiative for the development of a digital reference corpus of German, starting from the very first manuscripts of predecessors of the German language to current time. Within this consortium, our group is responsible for the development of the central corpus database.