Large-scale analysis of phylogenetic search behavior

Date

2009-05-15

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Phylogenetic analysis is used in all branches of biology by inferring evolutionary trees. Applications include designing more effective drugs, tracing the transmission of deadly viruses, and guiding conservation and biodiversity efforts. Most analyses rely on effective heuristics for obtaining accurate trees. However, relatively little work has been done to analyze quantitatively the behavior of phylogenetic heuristics in tree space. This is important, because a better understanding of local search behavior can facilitate the design of better heuristics, which ultimately leads to more accurate depictions of the true evolutionary relationships. In order to access and analyze the tree search space, we implement an effec- tive local search heuristic. Having an effective heuristic that can open the space is important, since no search heuristic in this field can effectively provide data collec- tion control. So we have implemented and estimated a search heuristic, Simple Local Search or SLS, that works reasonably well in the space. Our investigations led to several interesting observations about the behavior of a search heuristic and the tree search space. We studied the correlation of tree features of search path trees, where tree features refer to the parsimony score, the Robinson- Foulds distance and the homoplasy measure. Most importantly from the results, parsimony score was highly correlated with Robinson-Foulds distance only in trees that lie on the search path to a local optimum. We also note that the scores of neighborhoods along search paths improve together, as a local search progresses. Correlations of tree features of search path trees are particularly useful in char- acterizing and controlling a search path. This paper proposes one possible stopping criterion to maximize the tree search results while minimizing computational time tested on three biological datasets using the correlation between the parsimony score and the RF distance value of search path trees. Also, the observation that scores of a neighborhood on a search path improve together gives us a significant amount of flexibility in selecting the next pivot of a search without losing performance. Eventually, our long-term goal is developing an effective search heuristic that can deal with large scale tree space in reasonable time. Improved knowledge about the tree search space and the search heuristic can provide a reasonable starting point toward the goal.

Description

Citation