Browsing by Subject "Phylogenetics"
Now showing 1 - 13 of 13
Results Per Page
Sort Options
Item Evolution of microbial populations with spatial and environmental structure(2010-05) Miller, Eric Louis; Meyers, Lauren Ancel; Bennett, Philip C.; Bull, James J.; Hawkes, Christine V.; Hillis, David M.Rarely are natural conditions constant, but generally biologists study microbes in artificially constant environments in the laboratory. I relaxed these assumptions of constant environments through time and space as I investigated how microbial populations evolve. First, I examined how bacteriophage evolved in the presence of permissive and nonpermissive hosts. I found that bacteriophage evolved discrimina- tion in mixed environments as well as in one of two environments with homogeneous, permissive hosts. This showed the asymmetry of host-shifting in viruses as well as the possibility of large, and somewhat unpredictable, pleiotropic effects. Secondly, I reconstructed ancestral environmental conditions for soil bacteria groups using phy- logenetics and environmental variables of extant species’ habitats. These generaliza- tions suggested characteristic phenotypes for several phylogenetic groups, including uncultured Acidobacteria. Lastly, I collected genetic sequences and global collection information for 65 bacteria genera across the domain. In examining the relation- ship between genetic distance, environmental conditions, and geography, I observed positive relationships specifically between genetic distance and geography or genetic distance and environmental conditions for bacteria from land sites but not from wa- ter sites. Phylogenic classifications or phenotypes of the genera could not predict these correlations. In all of these projects, variations in the environment created evolutionary signals that hinted at past environments of microbial populations.Item Evolution of the genus Sicydium (Gobiidae:sicydiinae)Chabarria, Ryan EarlItem Family of Hidden Markov Models and its applications to phylogenetics and metagenomics(2014-08) Nguyen, Nam-phuong Duc; Warnow, Tandy, 1955-A Profile Hidden Markov Model (HMM) is a statistical model for representing a multiple sequence alignment (MSA). Profile HMMs are important tools for sequence homology detection and have been used in wide a range of bioinformatics applications including protein structure prediction, remote homology detection, and sequence alignment. Profile HMM methods result in accurate alignments on datasets with evolutionarily similar sequences; however, I will show that on datasets with evolutionarily divergent sequences, the accuracy of HMM-based methods degrade. My dissertation presents a new statistical model for representing an MSA by using a set of HMMs. The family of HMM (fHMM) approach uses multiple HMMs instead of a single HMM to represent an MSA. I present a new algorithm for sequence alignment using the fHMM technique. I show that using the fHMM technique for sequence alignment results in more accurate alignments than the single HMM approach. As sequence alignment is a fundamental step in many bioinformatics pipelines, improvements to sequence alignment result in improvements across many different fields. I show the applicability of fHMM to three specific problems: phylogenetic placement, taxonomic profiling and identification, and MSA estimation. In phylogenetic placement, the problem addressed is how to insert a query sequence into an existing tree. In taxonomic identification and profiling, the problems addressed are how to taxonomically classify a query sequence, and how to estimate a taxonomic profile on a set of sequences. Finally, both profile HMM and fHMM require a backbone MSA as input in order to align the query sequences. In MSA estimation, the problem addressed is how to estimate a ``de novo'' MSA without the use of an existing backbone alignment. For each problem, I present a software pipeline that implements the fHMM specifically for that domain: SEPP for phylogenetic placement, TIPP for taxonomic profiling and identification, and UPP for MSA estimation. I show that SEPP has improved accuracy compared to the single HMM approach. I also show that SEPP results in more accurate phylogenetic placements compared to existing placement methods, and SEPP is more computationally efficient, both in peak memory usage and running time. I show that TIPP more accurately classifies novel sequences compared to the single HMM approach, and TIPP estimates more accurate taxonomic profiles than leading methods on simulated metagenomic datasets. I show how UPP can estimate ``de novo'' alignments using fHMM. I present results that show UPP is more accurate and efficient than existing alignment methods, and estimates accurate alignments and trees on datasets containing both full-length and fragmentary sequences. Finally, I show that UPP can estimate a very accurate alignment on a dataset with 1,000,000 sequences in less than 2 days without the need of a supercomputer.Item The fluviageny, a method for analyzing temporal river fragmentation using phylogenetics(2015-05) Gordon, Andrew Lloyd; Howison, James; Arctur, David KPhylogenetic trees have historically been used to determine evolutionary relatedness between organisms. In the past few decades, as we've developed increasingly powerful computational algorithms and toolsets for performing analyses using phylogenetic methods, the use of these trees has expanded into other areas, including biodiversity informatics and geoinformatics. This report proposes using phylogenetic methods to create "fluviagenies" - trees that represent the effects of river fragmentation over time caused by damming. Faculty at the Center for Research in Water Resources at the University of Texas worked to develop tools and documentation for automating the creation of river segment codes (a.k.a., "fluvcodes") based on spatiotemporal data. Python was used to generate fluviageny trees from lists of these codes. The resulting trees can be exported into the appropriate data format for use with various phylogenetics programs. The Fishes of Texas Database (fshesoftexas.org), a comprehensive geospatial database of Texas fish occurrences aggregated and normalized from 42 museum collections around the world, was employed to create an example of how this tool might be used to analyze and hypothesize changes in fish populations as a consequence of river fragmentation. Additionally, this paper serves to theorize and analyze past and future potential uses for phylogenetic trees in various other fields of informatics.Item Improved methods for phylogenetics(2009-12) Nelesen, Serita Marie; Hunt, Warren A., 1958-; Warnow, Tandy, 1955-; ; Boyer, Robert S.; Hillis, David M.; Linder, Craig R.Phylogenetics is the study of evolutionary relationships. It is a scientific endeavour to discover history, and it is not easy. Massive amounts of data together with computationally difficult optimization problems mean that heuristics are prevalent, and ever better techniques are sought. New approaches are valuable if they are more accurate, but are considered even more so if they are faster than pre-existing methods. Improvements to existing algorithms, whether in terms of space requirements, or faster running times, are also worthwhile. This dissertation explores three new techniques, each of which is valuable according to the previous definitions. The first contribution is TASPI, a system for storing collections of phylogenetic trees, and performing post-tree analyses. TASPI stores collections of trees more compactly than the previous method, and this compact structure lends itself to post-tree analyses. This results in the ability to compute strict and majority consensus trees faster than common alternatives. As an added benefit, TASPI is written in ACL2, which allows properties of the algorithms and data structures to be formally verified. The second contribution is an improved method to generate phylogenetic trees. A common methodology involves two steps, first estimating a Multiple Sequence Alignment (MSA), and then estimating a tree using that MSA. This method changes the way in which the MSA is estimated, and this leads to improved accuracy of the resultant trees. Also, in some cases, the time required is also reduced. The third contribution is BLuTGEN, a method by which a phylogenetic tree is estimated from sequence data, but without ever generating an MSA for the full dataset. BLuTGEN is as accurate as one of the best published tree estimation techniques (SATé), but takes a novel approach which allows it to be applied to much larger datasets.Item Insights into relationships among rodent lineages based on mitochondrial genome sequence data(Texas A&M University, 2006-04-12) Frabotta, Laurence JohnThis dissertation has two major sections. In Chapter II, complete mitochondrial (mt DNA) genome sequences were used to construct a hypothesis for affinities of most major lineages of rodents that arose quickly in the Eocene and were well established by the end of the Oligocene. Determining the relationships among extant members of such old lineages can be difficult. Two traditional schemes on subordinal classification of rodents have persisted for over a century, dividing rodents into either two or three suborders, with relationships among families or superfamilies remaining problematic. The mtDNA sequences for four new rodent taxa (Aplodontia, Cratogeomys, Erethizon, and Hystrix), along with previously published Euarchontoglires taxa, were analyzed under parsimony, likelihood, and Bayesian criteria. Likelihood and Bayesian analyses of the protein-coding genes converged on a single topology that weakly supported rodent monophyly and was significantly better than the parsimony trees. Analysis of the tRNAs failed to recover a monophyletic Rodentia and did not reach convergence on a stationary distribution after fifty million generations. Most relationships hypothesized in the likelihood topology have support from previous data. Mt tRNAs have been largely ignored with respect to molecular evolution or phylogenetic utility. In Chapter III, the mt tRNAs from 141 mammals were used to refine secondary structure models and examine their molecular evolution. Both H- and L-encoded tRNAs are AT-rich with different %G and GC-skew and a difference in skew between H- and L-strand stems. Proportion of W-C pairs is higher in the H-strand and GU/UG pairs are higher in the L-strand, suggesting increased mismatch compensation in L-strand tRNAs. Among rodents, the number of variable stem base-pairs was nearly 75% of that observed across all mammals combined. Compensatory base changes were present only at divergences of 4% or greater. Neither loop reduction nor an accumulation of deleterious mutations, both suggestive of mutational meltdown (Muller's ratchet), was observed. Mutations associated with human pathologies are correlated only with the coding strand, with H-strand tRNAs being linked to substantially more of these mutations.Item Molecular systematics of Meconopsis Vig. (Papaveraceae): taxonomy, polyploidy evolution, and historical biogeography from a phylogenetic insight(2013-12) Xiao, Wei, active 2013; Simpson, Beryl BrintnallKnown as the Himalayan poppies or the blue poppies, Meconopsis is a genus with approximately 50 species distributed through the high altitude of the Himalaya and the Hengduan Mountains (SW China). This dissertation is a study of the systematics of Meconopsis primarily using molecular phylogenetic methods. DNA sequences of chloroplast matK, ndhF, trnL-trnF, rbcL, and nuclear ITS were collected to reconstruct the phylogenies of the genus. Results showed that traditional Meconopsis is a polyphyletic group and revealed extensive mismatches between the nuclear ITS tree and the chloroplast tree. Based on the phylogenies, the taxonomy of Meconopsis was revised, making Meconopsis monophyletic. Four new sections (sect. Meconopsis, sect. Aculeatae, sect. Primulinae, and sect. Grandes) were proposed as well as a species complex (M. horridula). The chloroplast phylogeny and a likelihood method (chromEvol) were applied to ancestral chromosome number estimation to reconstruct the polyploidy evolution history of the genus. The analysis recovered an ancient triploid ancestor shared by sect. Primulinae and sect. Grandes. A low-copy nuclear gene (GAPDH) network of Meconopsis was further reconstructed, which indicated that the ancient triploid ancestor was formed by hybridization. A hypothesis of reticulate history of Meconopsis was also proposed based on the GAPDH network. Using a reconstructed rbcL phylogeny of Ranunculales, the stem group of Meconopsis was estimated at ca. 22 Mya by molecular dating, which coincided with the time of Asian interior desertification and the onset of Asian monsoon. These climatic changes could possibly have been the impetus for the split between Meconopsis and its sister clade. Ancestral area reconstruction was further conducted using likelihood-based methods. The result indicated that Meconopsis originated in the Himalaya, most likely in the west Himalaya, followed by migration to the Hengduan Mountains.Item Novel scalable approaches for multiple sequence alignment and phylogenomic reconstruction(2015-08) Mir arabbaygi, Siavash; Pingali, Keshav; Warnow, Tandy, 1955-; Hillis, David; Gosh, Joydeep; Berger, Bonnie; Mooney, RayThe amount of biological sequence data is increasing rapidly, a promising development that would transform biology if we can develop methods that can analyze large-scale data efficiently and accurately. A fundamental question in evolutionary biology is building the tree of life: a reconstruction of relationships between organisms in evolutionary time. Reconstructing phylogenetic trees from molecular data is an optimization problem that involves many steps. In this dissertation, we argue that to answer long-standing phylogenetic questions with large-scale data, several challenges need to be addressed in various steps of the pipeline. One challenges is aligning large number of sequences so that evolutionarily related positions in all sequences are put in the same column. Constructing alignments is necessary for phylogenetic reconstruction, but also for many other types of evolutionary analyses. In response to this challenge, we introduce PASTA, a scalable and accurate algorithm that can align datasets with up to a million sequences. A second challenge is related to the interesting fact that various parts of the genome can have different evolutionary histories. Reconstructing a species tree from genome-scale data needs to account for these differences. A main approach for species tree reconstruction is to first reconstruct a set of ``gene trees'' from different parts of the genome, and to then summarize these gene trees into a single species tree. We argue that this approach can suffer from two challenges: reconstruction of individual gene trees from limited data can be plagued by estimation error, which translates to errors in the species tree, and also, methods that summarize gene trees are not scalable or accurate enough under some conditions. To address the first challenge, we introduce statistical binning, a method that re-estimates gene trees by grouping them into bins. We show that binning improves gene tree accuracy, and consequently the species tree accuracy. To address the second challenge, we introduce ASTRAL, a new summary method that can run on a thousand genes and a thousand species in a day and has outstanding accuracy. We show that the development of these methods has enabled biological analyses that were otherwise not possible.Item A re-evaluation of crinoid morphology and proposed relationship of crown groups, with insights from biogeography(2011-08) Womack, Kyle Richard; Sprinkle, James, 1943-; Molineux, Ann; Rowe, TimothyCrinoids are the most primitive living members of the Phylum Echinodermata. Though still present in reduced numbers today, crinoids were the dominant echinoderms from the Ordovician to the Permian. The crinoid body plan consists of three major regions, the column, the calyx, and the arms. Each region serves important functions in crinoids. The column raises the rest of the body into the water column for more efficient feeding. The calyx contains the visceral mass and mouth. Arms extend out from the top of the calyx to trap microorgansisms and suspended organic particles in the water column. A re-evaluation of these functional units is undertaken to understand the importance of various structures and to obtain discrete characters for use in a cladistic analysis. The relationship of crinoid crown groups has been an active area of research for the past couple of decades. With each proposed phylogenetic relationship, a new interpretation of thecal plate homology has been proposed. Here each study is re-examined in the light of new data. A review of functional morphology indicates a dual-reference system to be the most supported interpretation of plate homology. The two reference points in this system are the stem-cup and the cup-arm junctions, at the top and bottom of the calyx. The difference between a two-circlet and three-circlet crinoid is the presence or absence of the middle (basal) circlet. A new cladistic analysis is presented, with the topology of trees obtained giving support for the retention of Paleozoic crinoid stem and crown groups. Crinoids appear abruptly in the fossil record. Questions pertaining to origins and ancestral stock abound. A biogeography study is employed to look at the distribution of crinoids from the Early to Middle Ordovician. Locality information, combined with an understanding of the movement of major plates, paleoclimate data, an understanding of larval distribution, and a review of similar studies carried out on different taxa, gives insight into possible radiation and dispersal patterns of crinoids from the first half of the Ordovician.Item Studies of phylogenetic relationships and evolution of functional traits in diatoms(2014-05) Nakov, Teofil; Theriot, Edward C. (Edward Claiborne), 1953-The research presented here deals with inferring phylogenetic trees and their use to study the evolution of functional traits in diatoms (Heterokontophyta: Bacillariophyceae). Two chapters are concerned with the phylogeny of a mainly freshwater group, the Cymbellales, with a convoluted taxonomic history and classification. I generated a multi-gene dataset to test the monophyly of the Cymbellales and reconstruct the relationships within the order. The molecular data were equivocal with respect to the monophyly of the Cymbellales, especially when taking into account some problematic taxa like Cocconeis and Rhoicosphenia. Aside from the problem with their monophyly, my work shows that the current genus- and family-level classification of the Cymbellales is unnatural, arguing for the need of nearly wholesale re-classification of the group. The two following chapters make use of phylogenetic trees to model the evolution of functional traits. I explored the evolution of cell size across the salinity gradient finding that the opposing selective forces exerted by marine and fresh waters select for different optimal cell sizes -- larger in the oceans and smaller in lakes and rivers. Thereafter, I modelled the evolutionary histories of habitat preference (planktonic-benthic) and growth form (solitary-colonial) across the diatoms. These traits exhibit markedly different evolutionary histories. Habitat preference evolves slowly, is conserved at the level of large clades, and its evolution is generally uniform across the tree. Growth form, on the other hand, has a more dynamic evolutionary history with frequent transitions between the solitary and colonial growth forms and rates of evolution that vary through time. I hope that these empirical studies represent an incremental advancement to the understanding of the evolution diatom species and functional diversity.Item The evolution of nuclear microsatellite DNA markers and their flanking regions using reciprocal comparisons within the African mole-rats (Rodentia: Bathyergidae)(Texas A&M University, 2006-10-30) Ingram, Colleen MarieMicrosatellites are repetitive DNA characterized by tandem repeats of short motifs (2 ?????? 5 bp). High mutation rates make them ideal for population level studies. Microsatellite allele genesis is generally attributed to strand slippage, and it is assumed that alleles are caused only by changes in repeat number. Most analyses are limited to alleles (electromorphs) scored by mobility only, and models of evolution rarely account for homoplasy in allele length. Additionally, insertion/deletion events (indels) in the flanking region or interruptions in the repeat can obfuscate the accuracy of genotyping. Many investigators use microsatellites, designed for a focal species, to screen for genetic variation in non-focal species. Comparative studies have shown different mutation rates of microsatellites in different species, and even individuals. Recent studies have used reciprocal comparisons to assess the level of polymorphism of microsatellites between pairs of taxa. In this study, I investigated the evolution of microsatellites within a phylogenetic context, using comparisons within the rodent family Bathyergidae. Bathyergidae represents a monophyletic group endemic to sub-Saharan Africa and relationships are well supported by morphological and molecular data. Using mitochondrial and nuclear DNA, a robust phylogeny was generated for the Bathyergidae. From my results, I proposed the new genus, Coetomys. I designed species-specific genotyping and microsatellite flanking sequence (MFS) primers for each genus. Sequencing of the MFS provided direct evidence of the evolutionary dynamics of the repeat motifs and their flanking sequence, including rampant electromorphic homoplasy, null alleles, and indels. This adds to the growing body of evidence regarding problems with genotype scores from fragment analysis. A number of the loci isolated were linked with repetitive elements (LTRs and SINEs), characterized as robust phylogenetic characters. Results suggest that cryptic variation in microsatellite loci are not trivial and should be assessed in all studies. The phylogenetic utility of the nucleotide variation of the MFS was compared to the well-resolved relationships of this family based on the 12S/TTR phylogeny. Variation observed in MFS generated robust phylogenies, congruent with results from 12S/TTR. Finally, a number of the indels within the MFS provided a suite of suitable phylogenetic characters.Item The influence of body size and sexual dimorphism on speciation within Anura(2016-08) Gullett, Taylor Cameron; Cannatella, David C.; Hillis, DavidMany adaptive radiations demonstrate clear relationships between morphological variation and diversity, hinting that trait plasticity leads to increased potential to diversify. This study will examine this pattern within frogs (Anura). Body size is the focal morphological feature of this study due to its ease of collection and close relationship with the niche of an organism. Unlike most large-scale studies, this one takes into account both male and female body size and the extent of sexual size dimorphism (SSD). This allows us to determine not only whether body size relates to diversification rate but also whether body size evolution in one sex is more indicative of changes in diversification rate than the other and what impact SSD has on diversity. The results show that rates of male and female body size evolution as well as extent of sexual size dimorphism were all significantly positively correlated with speciation rate. The relationship between body size and speciation supports the idea that morphological plasticity and enhanced diversification go hand-in-hand. Both sexes rate of body size evolution had a similar relationship with speciation, indicating neither sex is more important for diversification. Increased sexual size dimorphism suggests this selection for extreme variation promotes diversity. Overall, rates of phenotypic evolution and speciation were closely linked across all of Anura.