Wissensmanagement in der Bioinformatik
Fortgeschrittene algorithmische Bioinformatik
Seminar im Sommersemester 2005
Professor Ulf Leser
Themen- und Literaturliste
Die angebotenen Themen sollen von den Teilnehmern anhand einschlägiger Literatur erarbeitet werden. Im Folgenden findet sich eine Auswahl an Referenzen. Die Basis für jeden Vortrag bildet zumeist ein Fachartikel oder Buchkapitel, zur Vertiefung sind jeweils weitere Referenzen vorgeschlagen. Die Abgrenzung des Themas erfolgt zusammen mit dem Betreuer. Die meisten der erwähnten Fachartikel lassen sich sehr schnell auf dem üblichen Weg finden, viele auch bei PubMed. Ein Großteil der Bücher ist in der Bibliothek vorrätig. Nicht alle Artikel sind jedoch frei verfügbar - Kopien gibt es in diesem Falle beim Betreuer, entsprechend bei Büchern und Buchkapiteln.
[Seq.Assembl.] - [Shift-AND/Karp-Rabin] - [Suffixarrays] - [LCP Problem] - [Enh. Suffixarrays] - [Grosse Suffixbäume] - [Seeddesign] - [SST] - [WholeGenomeAlign I] - [WholeGenomeAlign II] - [HMM] - [Profile HMM] - [MSA: DIAlign] - [MSA: T-Coffee] - [Oligodesign] - [Unique Probes] - [Allgemeine Literatur] - zurück zum Seminar
Sequenzassembly - Problem und Algorithmen
Literatur
- E.W. Myers, S.G. Sutton, A.L. Delcher, I.M. Dew, D.P. Fasulo, et al.
- A whole-genome assembly of Drosophila. Science,
287(5461):2196-204, 2000.
PubMed
- D. Huson
- Vorlesungsskript zum Celera Assembler, Kapitel "Genome Assembly", siehe unten
- Huson, Reinert, Myers
- The greedy path-merging algorithm for sequence assembly, Recomb 2001
- Myers
- "Whole Genome DNA Sequencing," IEEE Computational Engineering and Science 3, 1 (1999), 33-43
Shift-AND und Karp-Rabin
Literatur
- D. Gusfield, Kapitel 4.1, 4.2, 4.4
- Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997. ISBN 0521585198.
- Smyth, Kapitel 7.3 (Karp-Rabin) und 7.4 (Shift-And)
- „Computing Patterns in Strings“, Addison Wesley
- Baeza-Yates, Gonnet
- „A new approach to text searching“, Comm. of the ACM, 35, 1974
- Karp, Rabin
- Efficient randomized pattern matching algorithms. IBM; Journal on Research and Development, 31, 1987
Konstruktion von Suffixarrays in linearer Zeit
Im Vortrag sollte auch der Algorithmus von Myers erklärt werden mit O(n*log(n)), siehe Kap. 5.1 des Skripts von HeunLiteratur
- J. Kärkkäinen and P. Sanders
- Simple linear work suffix array construction. Proc. 30th International Colloquium on Automata, Languages and Programming, Vol.2719 of Lecture Notes in Computer Science, p. 943-955, Springer-Verlag, Berlin, 2003.
- Volker Heun
- Vorlesungsskript, Kapitel 5.1 und 5.3.
- E. Ohlebusch
- Algorithmen zur Sequenzanalyse, Vorlesungskript Uni Ulm, SoSe 2004. Kapitel 5.5.
Das LCP Problem in Suffixbäumen
Literatur
- D. Gusfield
- Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997. ISBN 0521585198. Kapitel 8: "Constant-Time Lowest Common Ancestor Retrieval"
Enhanced Suffixarrays
Literatur
- M.I. Abouelhoda, S. Kurtz, E. Ohlebusch
- The Enhanced Suffix Array and Its Applications to Genome Analysis. Proceedings of the 2nd Workshop on Algorithms in Bioinformatics (WABI'02), Lecture Notes in Computer Science, 2452:449-463, 2002.
Konstruktion sehr grosser Suffixbäume
Literatur
- Tata, S., Hankins, R. A. and Patel, J. M.
- Practical Suffix Tree Construction. Proceedings of VLDB
2004.
Paper - Hunt, E., Atkinson, M. and Irving, R. W.
- "A Database Index to Large Biological Sequences". 27th Conference on Very Large Database Systems, Roma, Italy. pp 139-148. (2001)
- Bedathur, S. J. and Haritsa, J. R.
- "Engineering a Fast Online Persistent Suffix Tree Construction". 20th Conference on Data Engineering, Boston, MA. pp 720-731. (2004)
Seeddesign im Alignment
Literatur
- Ma, B., Tromp, J. and Li, M.
- "PatternHunter: Faster and more sensitive homology search." Bioinformatics 18(3): 440-445. (2002)
- J. Buhler, U. Keich, Y. Sun
- Designing seeds for similarity search in genomic DNA. Proc Research in Computational Molecular Biology (RECOMB), 67-75, 2003.
SST: Advanced Similarity Search
Literatur
- E. Giladi, M.G. Walker, J.Z. Wang, W. Volkmuth.
- SST: an algorithm for finding near-exact sequence matches in time
proportional to the logarithm of the database size of the database
size. Bioinformatics, 18(6):873-7, 2002.
PubMed
Whole Genome Alignment mit Suffixbäumen
Literatur
- Delcher, A. L., Phillippy, A., Carlton, J. and Salzberg, S. L.
- "Fast algorithms for large-scale genome alignment and comparison." Nucleic Acids Res 30(11): 2478-83. 2002
- Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C. and Salzberg, S. L.
- "Versatile and open software for comparing large genomes." Genome Biol 5(2): R12. (2004)
- Chain, P., Kurtz, S., Ohlebusch, E. and Slezak, T.
- "An applications-focused review of comparative genomics tools: capabilities, limitations and future challenges." Brief Bioinform 4(2): 105-23. (2003)
Whole Genome Alignment mit Frequency Vectors
Literatur
- T. Kahveci, V. Ljosa, AK. Singh.
- Speeding up whole-genome alignment by indexing frequency vectors.
Bioinformatics, 20(13):2122-34, 2004.
PubMed - Kahveci, T. and Singh, A. K.
- "An Efficient Index Structure for String Databases". 27th Conference on Very Large Database Systems, Roma, Italy. pp 351-260, 2001
Alignments mit HMMs
Literatur
- Durbin, Eddy, Krogh, Mitchison
- Biological sequence analysis - Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998. Kapitel 4: "Pairwise alignment using HMMs"
Profile HMM
Literatur
- Durbin, Eddy, Krogh, Mitchison
- Biological sequence analysis - Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998. Kapitel 5: "Profile HMMs for sequence families"
Multiple Sequence Alignment: DIAlign
Literatur
- A.R. Subramanian, J. Weyer-Menkhoff, M. Kaufmann and B. Morgenstern
- DIALIGN-T: An improved algorithm for segment-based multiple
sequence alignment. BMC Bioinformatics, 6:66, 2005.
doi:10.1186/1471-2105-6-66
Synopsis PubMed - B. Morgenstern, K. Frech, A. Dress, T. Werner
- DIALIGN: Finding local similarities by multiple sequence alignment.
Bioinformatics 14(3):290-294, 1998.
PubMed - B. Morgenstern
- DIALIGN 2: improvement of the segment-to-segment approach to
multiple sequence alignment. Bioinformatics 15(3):211-218,
1999.
PubMed - B. Morgenstern
- DIALIGN: Multiple DNA and Protein Sequence Alignment at
BiBiServ. Nucleic Acids Res. 32(Web Server issue):W33-W36,
2004.
PubMed
Webseiten und Server
Multiple Sequence Alignment: T-Coffee
Literatur
- C. Notredame, D.G. Higgins, J. Heringa
- T-Coffee: A novel method for fast and accurate multiple sequence
alignment. J. Mol. Biol., 302:205-217, 2000.
PubMed - C. Notredame, L. Holme, D.G. Higgins
- COFFEE: A New Objective Function For Multiple Sequence
Alignmnent. Bioinformatics 14(5):407-422, 1998.
PubMed - O. O'Sullivan, K. Suhre, C. Abergel, D.G. Higgins, C. Notredame
- 3DCoffee: Combining Protein Sequences and Structures within
Multiple Sequence Alignments. Journal of Molecular Biology
340:385-395, 2004.
PubMed - O. Poirot, K. Suhre, C. Abergel, E. O'Toole, and C. Notredame
- 3DCoffee@igs: a web server for combining sequences and
structures into a multiple sequence alignment. Nucleic Acids
Res., 32(Web Server issue):W37-40, 2004.
PubMed - C. Grasso, C. Lee
- Combining partial order alignment and progressive multiple
sequence alignment increases alignment speed and scalability to very
large alignment problems. Bioinformatics, 20(10):1546-56,
2004.
PubMed
Progressive Partial Order Alignment (POA). Vergleicht pPOA, DIALIGN, TCOFFEE et al.)
Webseiten und Server
Oligarray - Designproblem
Literatur
- A.B. Kahng, I.I. Mandoiu, P.A. Pevzner, S. Reda, A.Z. Zelikovsky
- Scalable heuristics for design of DNA probe arrays. J Comput
Biol., 11(2-3):429-47, 2004.
PubMed - A.B. Kahng, I.I. Mandoiu, P.A. Pevzner, S. Reda, A.Z. Zelikovsky
- Engineering a Scalable Placement Heuristic for DNA Probe Arrays. Proc Research in Computational Molecular Biology (RECOMB), 148-156, 2003.
- Hui-Hsien Chou, An-Ping Hsia, Denise L. Mooney, and Patrick S. Schnable
- Picky: oligo microarray design for large genomes.
Bioinformatics, 20(17):2893-2902, 2004.
PubMed - Nancie Reymond, Hubert Charles, Laurent Duret, Federica Calevro, Guillaume Beslon, and Jean-Michel Fayard
- ROSO: optimizing oligonucleotide probes for microarrays.
Bioinformatics, 20(2):271-273, 2004.
PubMed - J.-M. Rouillard, M. Zuker, and E. Gulari
- OligoArray 2.0: Thermodynamicaly improved oligonucleotide
design for microarrays. Nucleic Acids Res., 31(12):3057-3062,
2003.
PubMed - J.-M. Rouillard, C.J. Herbert, and M. Zuker
- OligoArray: Genome-scale oligonucleotide design for
microarrays. Bioinformatics, 18(3):486-487, 2002.
PubMed
Unique Probes aus EST Datenbanken
Literatur
- J. Zheng, T.J. Close, T. Jiang, S. Lonardi
- Efficient selection of unique and popular oligos for large EST
databases. Bioinformatics, 20(13):2101-12, 2004.
PubMed
Allgemeine Literatur
Kurze Übersichten und Einführungen in die Materie finden sich in vielen Vorlesungsskripten und guten Standardwerken. Im Folgenden präsentieren wir eine Auswahl.
- D. Gusfield
- Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997. ISBN 0521585198.
- R. Durbin, S. Eddy, A. Krogh, G. Mitchison
- Biological sequence analysis - Probabilistic Models of Proteins
and Nucleic Acids, Cambridge University Press, 1998.
- Hidden Markov Models: Kapitel 4.
- Profile HMMs: Kapitel 5. - D. Huson
- Vorlesungsskript
Algorithms in Bioinformatics II, Sommersemester 2004, Uni
Tübingen.
- Sequenzassembly: Kapitel "Genome Assembly". - E. Ohlebusch
- Vorlesungsskript Algorithmen
zur Sequenzanalyse, Sommersemester 2004, Uni Ulm.
- Konstruktion von Suffixarrays: Kapitel 5.5. - E. Ohlebusch
- Vorlesungsskript Algorithmen der Bioinformatik, Wintersemester 2004/05, Uni Ulm.
- H. Kestler, E. Ohlebusch, R. Schuler
- Vorlesungsskript Einführung in die Bioinformatik, Wintersemester 2004/05, Uni Ulm.
- M. Vingron et al.
- Algorithmische
Bioinformatik, Wintersemester 2002/03, MPI Molekulare Genetik / FH
Berlin.
- Sequenzassembly: K. Reinert, MSA, Kapitel 2. - U. Leser
- Vorlesungsfolien Algorithmische Bioinformatik, Wintersemester 2004/05, HU Berlin.
[Nach oben] - zurück zum Seminar
Stand: 21.3.2005, JH