Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

Getting started with PETER on the Unix command line

  • Download the source code and unpack it

  • Download the example EST sets or assemble them on your own. Your ESTs should be contained in a simple text file with one EST per line and should only contain the characters A,C,G,T

  • Compile PETER
  1. Change the path to your local data directory ("DATA_SUBDIR", the filesystem directory that contains EST files) in globals.hpp at line 27

  2. Compile PETER with "make all"

  • Create an index with the following command:
    ./index.out OF F<filename>
    Note, that the file dbEST_part1.dat should be located in the folder that is defined in DATA_SUBDIR
    Example: ./index.out OF FdbEST_part1.dat

  • Run similarity searches
    1. Search for a single pattern with Hamming distance in an indexed EST string set:
      ./appxsearch.out Ok k<threshold> T<indexed_file> p<pattern>
      Example (for hamming-distance=2): ./appxsearch.out Ok k2 TdbEST_part1.dat pACTGGGGGAAAAATTTTTTGGGGGGCCCC

    2. Search for a single pattern with Edit distance in an indexed string set:
      ./appxsearch.out Oe k<threshold> T<indexed_file> p<pattern>
      Example for k=2:
      ./appxsearch.out Oe k2 TdbEST_part1.dat pACTGGGGGAAAAATTTTTTGGGGGGCCC

    3. Search for multiple, not indexed patterns with Hamming distance in an indexed EST string set:
      ./appxsearch.out OK k<threshold> T<indexed_file> P<pattern-file>
      The patterns should be contained in a simple text file with one pattern per line. Example, for k=2:
      ./appxsearch.out OK k2 TdbEST_part1.dat PdbEst_part2.dat

    4. Search for multiple, not indexed patterns with Edit distance in an indexed string set:
      ./appxsearch.out OE k<threshold> T<indexed_file> P<pattern-file>
      Example for k=2:
      ./appxsearch.out OE k2 TdbEST_part1.dat PdbEst_part2.dat

  • Run similarity joins
    Make sure that both EST files are indexed and stored in your DATA_SUBDIR.
    1. Join two EST indices on Hamming distance
      ./kmjoin.out OJ A<indexedFileA> B<indexedFileB> k<threshold>
      Example for k=1:
      ./kmjoin.out OJ AdbEST_part1_native_stripped.dat BdbEST_2a.dat k1

    2. Join two EST indices on Edit distance
      ./elajoin.out OJ A<indexedFileA> B<indexedFileB> k<threshold>
      Example for k=1:
      ./elajoin.out OJ AdbEST_part1_native_stripped.dat BdbEST_2a.dat k1

  • Optimize index for DFS traversal
    Note that you don't need to run the optimizer when creating a new index as the index is optimized automatically.
    ./index.out OI <indexedFile>
    Example:
    ./index.out OI dbEST_part1_native_stripped.dat


  • For any questions or comments, please contact Astrid Rheinländer (rheinlae (youknowwhat) informatik.hu-berlin.de).