Using a complex model of sequence evolution to evaluate and improve phylogenetic methods

Date

2001-12

Authors

Holder, Mark Travis

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The performance of phylogenetic methods was evaluated by testing their success in recovering the true tree from computer simulated data. Data were generated on a variety of tree shapes under a complex model of sequence evolution based on the work of Aaron Halpern and William Bruno. Parameters of the model were estimated by maximum likelihood techniques applied to a phylogeny of 1,610 sequences of mammalian cytochrome b genes. These simulations represent a rigorous test of the robustness of phylogenetic methods because several of the simplifying assumptions made by inference methods are violated. Maximum likelihood methods assuming the general time reversible model of sequence evolution with rate heterogeneity proved to be quite robust, outperforming all other methods on small trees. Distance methods were significantly worse, even when implementing the same model of sequence evolution. On larger trees only distance methods and parsimony techniques were studied. In virtually all cases parsimony outperformed distance-based approaches. The use of simple distance corrections improved performance for four taxon trees and ultrametric sixteen taxon trees but decreased the performance of the neighbor joining method on a 228 taxon tree. Neighbor joining performed as well or better than searches under the minimum evolution criterion in all cases. In general, the results of these simulations agree with the conclusions of previous studies that phylogenetic methods perform well over a wide range of tree shapes, highly accurate phylogenies for large number of taxa can be obtained from moderate sequence lengths, and model-based distance corrections are much less robust than maximum likelihood implementations of the same model. Maximum likelihood under a common model of sequence evolution was found to be inconsistent on difficult tree shapes when the data were generated under the model described here. Analysis of the spectra of the generating model and the model of inference suggest slight alterations to the general time-reversible model that may improve its performance.

Description

text

Keywords

Phylogeny, Biology--Classification, Nucleotide sequence, Proteins--Analysis

Citation