Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

What Makes a Gene Name? Named Entity Recognition in the Biomedical Literature

Ulf Leser* and Jörg Hakenberg

Humboldt-Universität zu Berlin, Computer Science Dept., Knowledge Management in Bioinformatics, Rudower Chaussee 25, 12489 Berlin, Germany.
* Corresponding author: leser(a)


The recognition of biomedical concepts in natural text (named entity recognition, NER) is a key technology for automatic or semi-automatic analysis of textual resources. Precise NER tools are a prerequisite for many applications working on text, such as information retrieval, information extraction or document classification. Over the past years, the problem has achieved considerable attention in the bioinformatics community and experience has shown that NER in the life sciences is a rather difficult problem. Several systems and algorithms have been devised and implemented. In this paper, the problems and resources in NER research are described, the principal algorithms underlying most systems sketched, and the current state-of-the-art in the field surveyed.


text mining; knowledge management; information extraction; machine learning; named entity recognition

Published in
Briefings in Bioinformatics, Vol. 6, No. 4, pp. 357-369, December 2005.
[Journal] - [Issue ToC] - [Article]
DOI: 10.1089/1536231041388366

    author = {Ulf Leser and J\"org Hakenberg},
    title  = {What Makes a Gene Name? Named Entity Recognition in the Biomedical Literature},
    journal = {Briefings in Bioinformatics},
    year = 2005,
    month = {December},
    volume = 6,
    number = 4,
    pages = {357-369}