Getting started with PETER as a user-defined index in Oracle DB

Wissensmanagement in der Bioinformatik | PETER

Getting started with PETER as a user-defined index in Oracle DB

PETER is a prefix-tree based indexing algorithm supporting approximate search and approximate joins. It combines an efficient implementation of compressed prefix trees with advanced pruning techniques (length filtering, frequence filtering, q-gram filtering). PETER is written in C++ and can be used as a:

UNIX command line tool
user-defined index in Oracle DB
shared library for individual programs

PETER features Hamming and Edit distance as similarity measures. Our tool has been evaluated on various collections of Expressed Sequence Tags (ESTs) from dbEST with up to 5,000,000 entries of lengths up to 3,500 characters. PETER is faster by orders of magnitude compared to agrep, nrgrep for search queries and compared to user-defined functions for similarity joins inside Oracle DB. For a detailed evaluation, see:

Rheinländer, A., Knobloch, M., Hochmuth, N. and Leser, U.: Prefix Tree Indexing for Similarity Search and Similarity Join on Genomic Data. Int. Conf. on Statistical and Scientific Databases, Heideberg, Germany, 2010.

PETER was developed for indexing Expressed Sequence Tags and currently only deal with strings consisting of the letters A,C,G,T. We are working on this issue.

Installing PETER

On the Unix command line

In Oracle DB

Downloads

Source code version 0.3 (2010-04-25)
EST flat files taken from NCBI dbEST and prepared for indexing with PETER

For any questions or comments, please contact Astrid Rheinländer (rheinlae (youknowwhat) informatik.hu-berlin.de).

Mo	Di	Mi	Do	Fr	Sa	So
29	30	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31	1	2

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

Installing PETER

Downloads