Framework for automated phylogenetic analysis of molecular sequence databases

Shen, Lishuang

Framework for automated phylogenetic analysis of molecular sequence databases

Date

2002-12

Authors

Shen, Lishuang

Publisher

Texas Tech University

Abstract

Utilizing the large amount of biological sequences data from the databases is a powerful tool for molecular phylogenetics. But dealing with the storage and analysis of the sequence data is tedious with manual methods, especially when tens of thousands sequences need to be analyzed. To address this problem, I developed the automatic framework for general molecular phylogenetics analysis (AGMPA) system. The system automates the process of the routine work in molecular phylogenetics. Perl scripts were used to glue together the programs used and to parse analysis outputs. This system also implements databases for information storage, retrieval and presentation. The databases integrate different types of data in molecular phylogenetics and supports database query from different ways. The system provides a graphical user interface (GUI) for all the functions and for calling bioinformatics programs of BLAST, FASTA, CLUSTAL, Phylp and TREE-PUZZLE. The system is implemented with perl/TK to ensure its cross-platform compatibility. The system was tested with 52,499 nucleotide sequences and 1165 protein sequences from Gossypium genus and with 36,495 protein sequences from Poaceae family. Phylogenetic analysis results from these two test datasets are presented.