Getting started with PETER on the Unix command line

Wissensmanagement in der Bioinformatik | Getting started with PETER on the Unix command line

Getting started with PETER on the Unix command line

Download the source code and unpack it

Download the example EST sets or assemble them on your own. Your ESTs should be contained in a simple text file with one EST per line and should only contain the characters A,C,G,T

Compile PETER

Change the path to your local data directory ("DATA_SUBDIR", the filesystem directory that contains EST files) in globals.hpp at line 27

Compile PETER with "make all"

Create an index with the following command:
./index.out OF F<filename>
Note, that the file dbEST_part1.dat should be located in the folder that is defined in DATA_SUBDIR
Example: ./index.out OF FdbEST_part1.dat

Run similarity searches
1. Search for a single pattern with Hamming distance in an indexed EST string set:
  ./appxsearch.out Ok k<threshold> T<indexed_file> p<pattern>
  Example (for hamming-distance=2): ./appxsearch.out Ok k2 TdbEST_part1.dat pACTGGGGGAAAAATTTTTTGGGGGGCCCC
2. Search for a single pattern with Edit distance in an indexed string set:
  ./appxsearch.out Oe k<threshold> T<indexed_file> p<pattern>
  Example for k=2:
  ./appxsearch.out Oe k2 TdbEST_part1.dat pACTGGGGGAAAAATTTTTTGGGGGGCCC
3. Search for multiple, not indexed patterns with Hamming distance in an indexed EST string set:
  ./appxsearch.out OK k<threshold> T<indexed_file> P<pattern-file>
  The patterns should be contained in a simple text file with one pattern per line. Example, for k=2:
  ./appxsearch.out OK k2 TdbEST_part1.dat PdbEst_part2.dat
4. Search for multiple, not indexed patterns with Edit distance in an indexed string set:
  ./appxsearch.out OE k<threshold> T<indexed_file> P<pattern-file>
  Example for k=2:
  ./appxsearch.out OE k2 TdbEST_part1.dat PdbEst_part2.dat

Run similarity joins
Make sure that both EST files are indexed and stored in your DATA_SUBDIR.

Join two EST indices on Hamming distance
./kmjoin.out OJ A<indexedFileA> B<indexedFileB> k<threshold>
Example for k=1:
./kmjoin.out OJ AdbEST_part1_native_stripped.dat BdbEST_2a.dat k1

Join two EST indices on Edit distance
./elajoin.out OJ A<indexedFileA> B<indexedFileB> k<threshold>
Example for k=1:
./elajoin.out OJ AdbEST_part1_native_stripped.dat BdbEST_2a.dat k1

Optimize index for DFS traversal
Note that you don't need to run the optimizer when creating a new index as the index is optimized automatically.
./index.out OI <indexedFile>
Example:
./index.out OI dbEST_part1_native_stripped.dat

For any questions or comments, please contact Astrid Rheinländer (rheinlae (youknowwhat) informatik.hu-berlin.de).

Mo	Di	Mi	Do	Fr	Sa	So
30	31	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	1	2	3	4	5

Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik