Investigating the behaviors and limitations of phylogenetic models of protein-coding sequence evolution

Spielman, Stephanie Jill

Investigating the behaviors and limitations of phylogenetic models of protein-coding sequence evolution

dc.contributor.advisor	Wilke, C. (Claus)	en
dc.contributor.committeeMember	Bull, James	en
dc.contributor.committeeMember	Barrick, Jeffrey	en
dc.contributor.committeeMember	Hillis, David	en
dc.contributor.committeeMember	Hofmann, Hans	en
dc.creator	Spielman, Stephanie Jill	en
dc.creator.orcid	0000-0002-9090-4788	en
dc.date.accessioned	2016-06-30T18:49:04Z
dc.date.accessioned	2018-01-22T22:30:11Z
dc.date.available	2016-06-30T18:49:04Z
dc.date.available	2018-01-22T22:30:11Z
dc.date.issued	2016-05	en
dc.date.submitted	May 2016
dc.date.updated	2016-06-30T18:49:04Z
dc.description.abstract	Probabilistic models which infer the strength and direction of natural selection from protein-coding sequences are among the most widely-used tools in comparative sequence analysis. A variety of phylogenetic models of coding-sequence evolution have been developed. However, these models have been produced independently from one another. As a consequence, it has been entirely unknown whether inferences from different models reveal similar or incompatible information about the evolutionary process. In this dissertation, I derive and study the mathematical relationship between two probabilistic models of protein-coding sequence evolution: dN/dS-based models, which estimate evolutionary rates, and mutation–selection models, which estimate site-specific amino-acid fitnesses. I demonstrate how this relationship reveals the behavioral properties, limitations, and applicabilities of different inference frameworks, which leads to concrete recommendations for how these models should best be employed in evolutionary sequence analysis. In Chapter 2, I develop a flexible and extendable software, implemented as a module in the Python programming language, for simulating sequences along phylogenies according to standard evolutionary models. This software platform provides an independent and user-friendly platform for testing model behavior, or indeed developing novel evolutionary models, thus enabling robust comparisons of modeling frameworks. In Chapter 3, I derive a mathematical relationship between dN/dS and amino-acid fitness values, and I show that mutation– selection models fully encompass information encoded in dN/dS models, provided that sequences are evolving under purifying selection. I further use this relationship to show that certain commonly-used dN/dS-based models are strongly and systematically biased. I additionally show that standard metrics used for model selection in phylogenetics (e.g. Akaike Information Criterion) may be positively misleading and indicate strong support for incorrect models. Finally, in Chapter 4, I apply the mathematical relationship developed in Chapter 3 to study the accuracy of two competing mutation–selection inference implementations, whose relative merits have been heavily debated in the literature. My approach demonstrates that mutation–selection inference platforms that treat amino-acid fitnesses as fixed-effect variables precisely estimate site-specific evolutionary constraints. By contrast, inference platforms that treat fitnesses as random-effect variables systematically underestimate the strength of natural selection across sites. Taken together, the work presented in this dissertation yields novel insights into how these popular evolutionary models can best be applied to sequence data, how their results should be interpreted, and finally how future model development should be conducted in order to yield robust and reliable inference methods.	en
dc.description.department	Ecology, Evolution and Behavior	en
dc.format.mimetype	application/pdf	en
dc.identifier	doi:10.15781/T2GH9B88K	en
dc.identifier.uri	http://hdl.handle.net/2152/38770	en
dc.language.iso	en	en
dc.subject	Molecular evolution	en
dc.subject	Protein evolution	en
dc.subject	Phylogenetic models	en
dc.title	Investigating the behaviors and limitations of phylogenetic models of protein-coding sequence evolution	en
dc.type	Thesis	en
dc.type.material	text	en

Collections

University of Texas at Austin

Investigating the behaviors and limitations of phylogenetic models of protein-coding sequence evolution

Files

Collections