Accounting for long-range correlations in genome-wide simulations of large cohorts
Autoři:
Dominic Nelson aff001; Jerome Kelleher aff002; Aaron P. Ragsdale aff001; Claudia Moreau aff003; Gil McVean aff002; Simon Gravel aff001
Působiště autorů:
McGill University and Genome Québec Innovation Centre, McGill University, Montréal, Québec, Canada
aff001; Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
aff002; Centre Intersectoriel en Santé Durable, Université du Québec à Chicoutimi, Saguenay, Québec, Canada
aff003
Vyšlo v časopise:
Accounting for long-range correlations in genome-wide simulations of large cohorts. PLoS Genet 16(5): e32767. doi:10.1371/journal.pgen.1008619
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1008619
Souhrn
Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.
Klíčová slova:
DNA recombination – Effective population size – Genetic polymorphism – Genome evolution – Linkage disequilibrium – Population genetics – Population size – Simulation and modeling
Zdroje
1. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium. The American Journal of Human Genetics. 2004;74(1):106–120. doi: 10.1086/381000 14681826
2. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A map of recent positive selection in the human genome. PLoS biology. 2006;4(3):e72. doi: 10.1371/journal.pbio.0040072 16494531
3. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics. 2009;5(10). doi: 10.1371/journal.pgen.1000695 19851460
4. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475(7357):493–6. doi: 10.1038/nature10231 21753753
5. Li N, Stephens M. Modeling Linkage Disequilibrium and Identifying Recombination Hotspots Using Single-Nucleotide Polymorphism Data. Genetics. 2003;165(4):2213–2233. 14704198
6. Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C. Genomic scans for selective sweeps using SNP data. Genome Research. 2005;15(11):1566–1575. doi: 10.1101/gr.4252305 16251466
7. Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002;18(2):337–338. doi: 10.1093/bioinformatics/18.2.337 11847089
8. Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput Biol. 2016;12(5):1–22. doi: 10.1371/journal.pcbi.1004842
9. Hudson RR. Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology. 1983;23(2):183–201. doi: 10.1016/0040-5809(83)90013-8 6612631
10. Kelleher J, Thornton KR, Ashander J, Ralph PL. Efficient pedigree recording for fast population genetics simulation. PLoS computational biology. 2018;14(11):e1006581. doi: 10.1371/journal.pcbi.1006581 30383757
11. Kelleher J, Wong Y, Wohns AW, Fadil C, Albers PK, McVean G. Inferring whole-genome histories in large population datasets. Nature Genetics. 2019;51(9):1330–1338. doi: 10.1038/s41588-019-0483-y 31477934
12. Wakeley J, King L, Low BS, Ramachandran S. Gene genealogies within a fixed pedigree, and the robustness of kingman’s coalescent. Genetics. 2012;190(4):1433–1445. doi: 10.1534/genetics.111.135574 22234858
13. Bhaskar A, Clark AG, Song YS. Distortion of genealogical properties when the sample is very large. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(6):2385–90. doi: 10.1073/pnas.1322709111 24469801
14. Palamara PF. ARGON: fast, whole-genome simulation of the discrete time Wright-fisher process. Bioinformatics. 2016;32(19):3032–3034. doi: 10.1093/bioinformatics/btw355 27312410
15. Hudson RR. Gene genealogies and the coalescent process. In: Futuyma D. and Antonovics J. (eds), Oxford Surveys in Evolutionary Biology. vol. 7; 1990. p. 1–44.
16. Wilton PR, Baduel P, Landon MM, Wakeley J. Population structure and coalescence in pedigrees: Comparisons to the structured coalescent and a framework for inference. Theoretical Population Biology. 2017;115:1–12. doi: 10.1016/j.tpb.2017.01.004 28143695
17. King L, Wakeley J, Carmi S. A non-zero variance of Tajima’s estimator for two sequences even for infinitely many unlinked loci. Theoretical Population Biology. 2018;122:22–29. doi: 10.1016/j.tpb.2017.03.002 28341209
18. Liang M, Nielsen R. The lengths of admixture tracts. Genetics. 2014;197(3):953–967. doi: 10.1534/genetics.114.162362 24770332
19. Ball RM, Neigel JE, Avise JC. Gene Genealogies within the Organismal Pedigrees of Random-Mating Populations. Evolution. 1990;44(2):360. doi: 10.1111/j.1558-5646.1990.tb05205.x 28564387
20. Verhoeven KJF, Simonsen KL. Genomic haplotype blocks may not accurately reflect spatial variation in historic recombination intensity. Molecular Biology and Evolution. 2005;22(3):735–740. doi: 10.1093/molbev/msi058 15563716
21. Davies JL, Simančík F, Lyngsø R, Mailund T, Hein J. On recombination-induced multiple and simultaneous coalescent events. Genetics. 2007;177(4):2151–2160. doi: 10.1534/genetics.107.071126 17947442
22. Henn BM, Hon L, Macpherson JM, Eriksson N, Saxonov S, Pe’er I, et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE. 2012;7(4). doi: 10.1371/journal.pone.0034267
23. Shchur V, Nielsen R. On the number of siblings and p-th cousins in a large population sample. Journal of Mathematical Biology. 2018;77(5):1–20. doi: 10.1007/s00285-018-1252-8
24. Genome Quebec. Genizon Biobank; (2020). http://www.genomequebec.com/genizon-biobank/.
25. Waples RS. A bias correction for estimates of effective population size based on linkage disequilibrium at unlinked gene loci. Conservation Genetics. 2006;7(2):167–184. doi: 10.1007/s10592-005-9100-y
26. Ragsdale AP, Gravel S. Unbiased Estimation of Linkage Disequilibrium from Unphased Data. Molecular Biology and Evolution. 2019.
27. Gravel S. Population genetics models of local ancestry. Genetics. 2012;191(2):607–619. doi: 10.1534/genetics.112.139808 22491189
28. Fisher R. The genetical theory of natural selection. Clarendon Press; 1930.
29. Wright S. Evolution in Mendelian populations. Genetics. 1931;16(2):97. 17246615
30. BALSAC. BALSAC Population Database: 2016-2017 Annual Report.; 2018. http://balsac.uqac.ca/english/files/2018/01/BALSAC_RA2017_EN_page_WEB_v2-1.pdf.
31. Caballero M, Seidman DN, Qiao Y, Sannerud J, Dyer TD, Lehman DM, et al. Crossover interference and sex-specific genetic maps shape identical by descent sharing in close relatives. PLOS Genetics. 2019;15(12):1–29. doi: 10.1371/journal.pgen.1007979
Článek vyšel v časopise
PLOS Genetics
2020 Číslo 5
- Distribuce a lokalizace speciálně upravených exosomů může zefektivnit léčbu svalových dystrofií
- Prof. Jan Škrha: Metformin je bezpečný, ale je třeba jej bezpečně užívat a léčbu kontrolovat
- FDA varuje před selfmonitoringem cukru pomocí chytrých hodinek. Jak je to v Česku?
- Masturbační chování žen v ČR − dotazníková studie
- O krok blíže k pochopení efektu placeba při léčbě bolesti
Nejčtenější v tomto čísle
- The domesticated transposase ALP2 mediates formation of a novel Polycomb protein complex by direct interaction with MSI1, a core subunit of Polycomb Repressive Complex 2 (PRC2)
- Polyploidy breaks speciation barriers in Australian burrowing frogs Neobatrachus
- The phosphorelay BarA/SirA activates the non-cognate regulator RcsB in Salmonella enterica
- Congenital hearing impairment associated with peripheral cochlear nerve dysmyelination in glycosylation-deficient muscular dystrophy