Estimating phylogenetic trees from discrete morphological data

Wright, April Marie

Estimating phylogenetic trees from discrete morphological data

Date

2015-05

Authors

Wright, April Marie

Abstract

Morphological characters have a long history of use in the estimation of phylogenetic trees. Datasets consisting of morphological characters are most often analyzed using the maximum parsimony criterion, which seeks to minimize the amount of character change across a phylogenetic tree. When combined with molecular data, characters are often analyzed using model-based methods, such as maximum likelihood or, more commonly, Bayesian estimation. The efficacy of likelihood and Bayesian methods using a common model for estimating topology from discrete morphological characters, the Mk model, is poorly-explored. In Chapter One, I explore the efficacy of Bayesian estimation of phylogeny, using the Mk model, under conditions that are commonly encountered in paleontological studies. Using simulated data, I describe the relative performances of parsimony and the Mk model under a range of realistic conditions that include common scenarios of missing data and rate heterogeneity. I further examine the use of the Mk model in Chapter Two. Like any model, the Mk model makes a number of assumptions. One is that transition between character states are symmetric (i.e., there is an equal probability of changing from state 0 to state 1 and from state 1 to state 0). Many characters, including alleged Dollo characters and extremely labile characters, may not fit this assumption. I tested methods for relaxing this assumption in a Bayesian context. Using empirical datasets, I performed model fitting to demonstrate cases in which modelling asymmetric transitions among characters is preferred. I used simulated datasets to demonstrate that choosing the best-fit model of transition state symmetry can improve model fit and phylogenetic estimation. In my final chapter, I looked at the use of partitions to model datasets more appropriately. Common in molecular studies, partitioning breaks up the dataset into pieces that evolve according to similar mechanisms. These pieces, called partitions, are then modeled separately. This practice has not been widely adopted in morphological studies. I extended the PartitionFinder software, which is used in molecular studies to score different possible partition schemes to find the one which best models the dataset. I used empirical datasets to demonstrate the effects of partitioning datasets on model likelihoods and on the phylogenetic trees estimated from those datasets.