Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

Learning Patterns for Information Extraction from Free Text

Learning Patterns for Information Extraction from Free Text

Conrad Plake1, Jörg Hakenberg1*, and Ulf Leser1

1 Humboldt-Universität zu Berlin, Department of Computer Science, Knowledge Management Group
* Corresponding author. Current affiliation: Knowledge Management in Bioinformatics, Dept. Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, 12489 Berlin, Germany. Phone: +49.30.2093.3903, eMail: hakenberg(a)


We describe a general approach to the task of information extraction from free text and propose methods for learning syntax patterns automatically from annotated corpora. We study the application of our approach to the extraction of protein-protein interactions from scientific texts. Based on this evaluation, we find that learning patterns outperforms techniques based on hand-crafted patterns.

Published in
Workshop des Arbeitskreises Knowledge Discovery, AKKD, Karlsruhe, Germany, March 1st, 2005.
[Extended Abstract]

  author = {Conrad Plake and J\"org Hakenberg and Ulf Leser},
  title = {Learning Patterns for Information Extraction from Free Text},
  booktitle = {Proc AKKD 2005},
  address = {Karlsruhe, Germany},
  month = {March},
 year = 2005