Software and Downloads
ChemSpot (link)
ChemSpot is a named entity recognition tool for identifying mentions of chemicals in natural language texts, including trivial names, drugs, abbreviations, molecular formulas and IUPAC entities. Since the different classes of relevant entities have rather different naming characteristics, ChemSpot uses a hybrid approach combining a Conditional Random Field with a dictionary. It achieves an F1 measure of 68.1% on the SCAI corpus, outperforming the only other freely available chemical named entity recognition tool, OSCAR4, by 10.8 percentage points.
CellFinder resources (link)
Resources that have been derived from the text mining experiments for the CellFinder project which aims to establishing a central stem cell data repository, by utilizing and interlinking existing public databases regarding defined areas of human pluripotent stem cell research.
Mariana Neves, Alexander Damaschun, Andreas Kurtz, Ulf Leser. Annotating and evaluating text for stem cell research. Third Workshop on Building and Evaluation Resources for Biomedical Text Mining (BioTxtM 2012) at Language Resources and Evaluation (LREC) 2012. (to appear)
GeneView (link)
GeneView is a web-based retrieval system for annotated biomedical texts. The system has indexed all PubMed abstracts plus the "data mining" subset of PMC (~200.000 full text). All texts are tagged for occurrences of gene names (using GNAT) and Mutations (using MutationFinder). Papers can be searched using the usual keyword search options, but results can be ranked by abstract content in terms of annotated entities.
GeneView is developed in the course of the ColoNet-Project, funded by the BMBF.
Thomas, P., Starlinger, J., Jacob, C., Solt, I., Hakenberg, J. and Leser, U.
GeneView Gene-Centric Ranking of Biomedical Text
Proceedings of BioCreative III, Bethesda, USA, 2010. pp 137-142.
LymphomExplorer (link)
The LymphomExplorer is a web-based solution for integrated data management and analysis within large biomedical projects. It is developed in the course of the TRR-54 and funded by the Deutsche Forschungsgemeinschaft. It stores various types of processed microarray data (human, mouse, Affymetrix, Exon Chips, Agilent, etc.) and provides easy-to-use methods for quality control, clustering, and functional analysis of selected datasets.
Access is restricted to members of the TRR-54.
Ali Baba (link)
Ali Baba is a software for extracting the most important information from all results of a PubMed search. Given a PubMed query, it collects all matching articles (using NCBI's eUtils), identifies all genes, diseases, drugs, species etc. plus their relationships to each each other in any of the articles, and displays these as a graph of interlinked biomedical concepts. Runs as a Java Web Start application.
Plake, C., Schiemann, T., Pankalla, M., Hakenberg, J., and Leser, U.
Ali Baba: PubMed as a graph
Bioinformatics, 22(19):2444-2445, 2006
BC-Viscon (link)
The BC-VisCon website provides an intuitive and flexible tool for viewing and analyzing aggregations and comparisons of the annotations made by the gene mention tagging servers connected to the BioCreAtIve-MetaServer.
Starlinger, J., Leitner, F., Valencia, A. and Leser, U.
SOA-based Integration of Text Mining Services
ICWS, Los Angeles, US, 2009
PETER (link)
PETER is a library for DNA similarity search and joins. It indexes a large set of long DNA strings in a prefix-tree and allows for similarity searching using Hamming or edit-distance. Furthermore, two PETER indexes can be compared efficiently to perform a similarity join. Can be used as standalone tool, as a C++ library in other software, or as an extension for Oracle, thus allowing fast similarity operations with SQL.
Rheinländer, A., Knobloch, M., Hochmuth, N. and Leser, U.
Prefix Tree Indexing for Similarity Search and Similarity Join on Genomic Data
SSDBM, Heideberg, Germany, 2010.
Online appendix for 2010' PPI kernel benchmark (link)
A web page containing an online appendix to our 2010 PPI kernel benchmark paper. Contains source codes and documentation.
Tikk, D., Thomas, P., Palaga, P., Hakenberg, J. and Leser, U.
A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature
PLoS Computational Biology 6(7)
DARQ (link)
DARQ is a query engine for federated SPARQL queries. It provides transparent query access to multiple, distributed SPARQL endpoints as if querying a single RDF graph. DARQ enables the applications to see a single query interface, leaving the details of federation to the query engine.
DARQ was developed in the course of the Graduate School METRIK, funded by the Deutsche Forschungsgemeinschaft. Development stopped in 2008.
Quilitz, B. and Leser, U. (2008).
Querying Distributed RDF Data Sources with SPARQL
European semantic Web Conference (ESWC), Teneriffa, Spain (2008).
PiPa (link)
PiPa is a tool to setup and maintain a local database for information on protein-protein interactions and biological pathways. It manages downloads and integration from a couple of public databases, including Reactome, BioCyc, BioGRID, GAD, MINT, IntAct, InHo, HPRD, PID, OMIM, MIPS, and UniProt. PiPa is not an application running on integrated data, but a tool to quickly create your own, in-house integrated database.
Arzt, S., Starlinger, J., Arnold, O., Kröger, S., Jaeger, S., and Leser, U. (2011).
PiPa: Custom Integration of Protein Interactions and Pathways
GI-Jahrestagung 2011, Workshop "Daten In den Lebenswissenschaften".