Estimating population histories using single-nucleotide polymorphisms sampled throughout genomes



Journal Title

Journal ISSN

Volume Title



Genomic data facilitate opportunities to track complex population histories of divergence and gene flow. We used 47,506 single-nucleotide polymorphisms (SNPs) to investigate cattle population history. Cattle are descendants of two independently domesticated lineages, taurine and indicine, that diverged 200,000 or more years ago. We found that New World cattle breeds, as well as many related breeds of cattle in southern Europe, exhibit ancestry from both the taurine and indicine lineages. Although European cattle are largely descended from the taurine lineage, gene flow from African cattle (partially of indicine origin) contributed substantial genomic components to both southern European cattle breeds and their New World descendants. We extended these analyses to compare timing of admixture in several breeds of taurine-indicine hybrid origin. We developed a metric, scaled block size (SBS), that uses the unrecombined block size of introgressed regions of chromosomes to differentiate between recent and ancient admixture. By comparing test individuals to standards with known recent hybrid ancestry, we were able to differentiate individuals of recent hybrid origin from other admixed individuals using the SBS metric. We genotyped SNP loci using the bovine 50K SNP panel. The selection of sites to include in SNP analyses can influence inferences from the data, especially when particular populations are used to select the array of polymorphic sites. To test the impact of this bias on the inference of population genetic parameters, we used empirical and simulated data representing the three major continental groups of cattle: European, African, and Indian. We compared the inference of population histories for simulated data sets across different ascertainment conditions using F[subscript ST] and principal components analysis (PCA). Ascertainment bias that results in an over-representation of within-group polymorphism decreases estimates of F[subscript ST] between groups. Geographically biased selection of polymorphic SNPs changes the weighting of principal component axes and can bias inferences about proportions of admixture and population histories using PCA. By combining empirical and simulated data, we were able to both test methods for inferring population histories from genomic SNP data and apply these methods to practical problems.