[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Quantifying the effects of computational filter criteria on the accurate identification of de novo mutations at varying levels of sequencing coverage

Abstract

The rate of spontaneous (de novo) germline mutation is a key parameter in evolutionary biology, impacting genetic diversity and contributing to the evolution of populations and species. Mutation rates themselves evolve over time but the mechanisms underlying the mutation rate variation observed across the Tree of Life remain largely to be elucidated. In recent years, whole genome sequencing has enabled the estimation of mutation rates for several organisms. However, due to a lack of community standards, many previous studies differ both empirically – most notably, in the depth of sequencing used to reliably identify de novo mutations – and computationally – utilizing different computational pipelines to detect germline mutations as well as different analysis strategies to mitigate technical artifacts – rendering comparisons between studies challenging. Using a pedigree of Western chimpanzees as an illustrative example, we here quantify the effects of commonly utilized quality metrics to reliably identify de novo mutations at different levels of sequencing coverage. We demonstrate that datasets with a mean depth of ≤ 30X are ill-suited for the detection of de novo mutations due to high false positive rates that can only be partially mitigated by computational filter criteria. In contrast, higher coverage datasets enable a comprehensive identification of de novo mutations at low false positive rates, with minimal benefits beyond a sequencing coverage of 60X, suggesting that future work should favor breadth (by sequencing additional individuals) over depth. Importantly, the simulation and analysis framework described here provides conceptual guidelines that will allow researchers to take study design and species-specific resources into account when determining computational filtering strategies for their organism of interest.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Computational filter criteria and thresholds that mitigated the largest number of false positives while retaining the majority of genuine de novo mutations at varying levels of sequencing coverage across replicate runs.
Fig. 2: Misclassification rates.

Similar content being viewed by others

References

  • Acuna-Hidalgo R, Veltman JA, Hoischen A (2016) New insights into the generation and role of de novo mutations in health and disease. Genome Biol 17(1):241

    Article  PubMed  PubMed Central  Google Scholar 

  • Agrawal AF, Whitlock MC (2012) Mutation load: the fitness of individuals in populations where deleterious alleles are abundant. Annu Rev Ecol Evol Syst 43:115–135

    Article  Google Scholar 

  • Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T et al. (2012) A fine-scale chimpanzee genetic map from population sequencing. Science 336(6078):193–198

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Baer CF, Miyamoto MM, Denver DR (2007) Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet 8(8):619–631

    Article  CAS  PubMed  Google Scholar 

  • Bergeron LA, Besenbacher S, Bakker J, Zheng J, Li P, Pacheco G et al. (2021) The germline mutational process in rhesus macaque and its implications for phylogenetic dating. GigaScience 10(5):giab029

    Article  PubMed  PubMed Central  Google Scholar 

  • Bergeron LA, Besenbacher S, Turner T, Versoza CJ, Wang RJ, Price AL et al. (2022) The Mutationathon highlights the importance of reaching standardization in estimates of pedigree-based germline mutation rates. Elife 11:e73577

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bergeron LA, Besenbacher S, Zheng J, Li P, Bertelsen MF, Quintard B et al. (2023) Evolution of the germline mutation rate across vertebrates. Nature 615(7951):285–291

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Besenbacher S, Hvilsom C, Marques-Bonet T, Mailund T, Schierup MH (2019) Direct estimation of mutations in great apes reconciles phylogenetic dating. Nat Ecol Evol 3(2):286–292

    Article  PubMed  Google Scholar 

  • Besenbacher S, Liu S, Izarzugaza JM, Grove J, Belling K, Bork-Jensen J et al. (2015) Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat Commun 6:5969

    Article  CAS  PubMed  Google Scholar 

  • Brand CM, White FJ, Rogers AR, Webster TH (2022) Estimating bonobo (Pan paniscus) and chimpanzee (Pan troglodytes) evolutionary history from nucleotide site patterns. Proc Natl Acad Sci USA 119(17):e2200858119

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Campbell CD, Chong JX, Malig M, Ko A, Dumont BL, Han L et al. (2012) Estimating the human mutation rate using autozygosity in a founder population. Nat Genet 44(11):1277–1281

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Campbell CR, Tiley GP, Poelstra JW, Hunnicutt KE, Larsen PA, Lee HJ et al. (2021) Pedigree-based and phylogenetic methods support surprising patterns of mutation rate and spectrum in the gray mouse lemur. Heredity 127(2):233–244

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F et al. (2011) Variation in genome-wide mutation rates within and between human families. Nat Genet 43(7):712–714

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fan S, Hansen ME, Lo Y, Tishkoff SA (2016) Going global by adapting local: a review of recent human adaptation. Science 354(6308):54–59

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Feng C, Pettersson M, Lamichhaney S, Rubin CJ, Rafati N, Casini M et al. (2017) Moderate nucleotide diversity in the Atlantic herring is associated with a low mutation rate. Elife 6:e23907

    Article  PubMed  PubMed Central  Google Scholar 

  • Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I et al. (2015) Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47(7):822–826

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P (2022) A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput Biol 18(5):e1009123

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Goldmann JM, Wong WS, Pinelli M, Farrah T, Bodian D, Stittrich AB et al. (2016) Parent-of-origin-specific signatures of de novo mutations. Nat Genet 48(8):935–939

    Article  CAS  PubMed  Google Scholar 

  • Harris RB, Irwin K, Jones MR, Laurent S, Barrett RDH, Nachman MW et al. (2020) The population genetics of crypsis in vertebrates: recent insights from mice, hares, and lizards. Heredity 124(1):1–14

    Article  PubMed  Google Scholar 

  • Holtgrewe M (2010) Mason: a read simulator for second generation sequencing data. Dissertation, Freie Universität Berlin.

  • Hwang DG, Green P (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA 101(39):13994–14001

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jennewein DM, Lee J, Kurtz C, Dizon W, Shaeffer I, Chapman A et al. (2023) The Sol Supercomputer at Arizona State University. Practice and experience in advanced research computing, 296–301.

  • Jiang YH, Yuen RK, Jin X, Wang M, Chen N, Wu X et al. (2013) Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am J Hum Genet 93(2):249–263

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E et al. (2017) Parental influence on human germline de novo mutations in 1548 trios from Iceland. Nature 549(7673):519–522

    Article  PubMed  Google Scholar 

  • Keightley PD, Ness RW, Halligan DL, Haddrill PR (2014) Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics 196(1):313–320

    Article  CAS  PubMed  Google Scholar 

  • Keightley PD, Pinharanda A, Ness RW, Simpson F, Dasmahapatra KK, Mallet J et al. (2015) Estimation of the spontaneous mutation rate in Heliconius melpomene. Mol Biol Evol 32(1):239–243

    Article  CAS  PubMed  Google Scholar 

  • Kessler MD, Loesch DP, Perry JA, Heard-Costa NL, Taliun D, Cade BE et al. (2020) De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc Natl Acad Sci USA 117(5):2560–2569

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Koch E, Schweizer RM, Schweizer TM, Stahler DR, Smith DW, Wayne RK et al. (2019) De novo mutation rate estimation in wolves of known pedigree. Mol Biol Evol 36(11):2536–2547

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kong A, Frigge ML, Masson G, Besenbacher S, Sulem P, Magnusson G et al. (2012) Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488(7412):471–475

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS et al. (2018) High-resolution comparative analysis of great ape genomes. Science 360(6393):eaar6343

    Article  PubMed  PubMed Central  Google Scholar 

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25(14):1754–1760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lindsay SJ, Rahbari R, Kaplanis J, Keane T, Hurles ME (2019) Similarities and differences in patterns of germline mutation between mice and humans. Nat Commun 10(1):4053

    Article  PubMed  PubMed Central  Google Scholar 

  • Liu H, Jia Y, Sun X, Tian D, Hurst LD, Yang S (2017) Direct determination of the mutation rate in the bumblebee reveals evidence for weak recombination-associated mutation and an approximate rate constancy in insects. Mol Biol Evol 34(1):119–130

    Article  CAS  PubMed  Google Scholar 

  • Maretty L, Jensen JM, Petersen B, Sibbesen JA, Liu S, Villesen P et al. (2017) Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature 548(7665):87–91

    Article  CAS  PubMed  Google Scholar 

  • Martin HC, Batty EM, Hussin J, Westall P, Daish T, Kolomyjec S et al. (2018) Insights into platypus population structure and history from whole-genome sequencing. Mol Biol Evol 35(5):1238–1252

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Michaelson JJ, Shi Y, Gujral M, Zheng H, Malhotra D, Jin X et al. (2012) Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151(7):1431–1442

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Milhaven M, Pfeifer SP (2023) Performance comparison of six popular short-read simulators. Heredity 130(2):55–63

    Article  PubMed  Google Scholar 

  • Milholland B, Dong X, Zhang L, Hao X, Suh Y, Vijg J (2017) Differences between germline and somatic mutation rates in humans and mice. Nat Commun 8:15183

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pfeifer SP (2021) Studying mutation rate evolution in primates – the effects of computational pipeline and parameter choices. GigaScience 10(10):giab069

    Article  PubMed  PubMed Central  Google Scholar 

  • Pfeifer SP (2017a) Direct estimate of the spontaneous germ line mutation rate in African green monkeys. Evolution 71(12):2858–2870

  • Pfeifer SP (2017b) From next-generation resequencing reads to a high quality variant data set. Heredity 118(2):111–124

  • Pfeifer SP (2020) Spontaneous mutation rates. In Ho SYW (ed) The Molecular Evolutionary Clock. Theory and Practice. Springer Nature, pp. 35–44

  • Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B et al. (2013) Great ape genetic diversity and population history. Nature 499(7459):471–475

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rahbari R, Wuster A, Lindsay SJ, Hardwick RJ, Alexandrov LB, Turki SA et al. (2016) Timing, rates and spectra of human germline mutation. Nat Genet 48(2):126–133

    Article  CAS  PubMed  Google Scholar 

  • Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328(5978):636–639

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G et al. (2011) Integrative genomics viewer. Nat Biotechnol 29(1):24–26

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sasani TA, Pedersen BS, Gao Z, Baird L, Przeworski M, Jorde LB et al. (2019) Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. Elife 8:e46922

    Article  PubMed  PubMed Central  Google Scholar 

  • Smeds L, Qvarnström A, Ellegren H (2016) Direct estimate of the rate of germline mutation in a bird. Genome Res 26(9):1211–1218

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tatsumoto S, Go Y, Fukuta K, Noguchi H, Hayakawa T, Tomonaga M et al. (2017) Direct estimation of de novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome sequencing. Sci Rep. 7(1):13561

    Article  PubMed  PubMed Central  Google Scholar 

  • Thomas GWC, Wang RJ, Puri A, Harris RA, Raveendran M, Hughes DST et al. (2018) Reproductive longevity predicts mutation rates in primates. Curr Biol 28(19):3193–3197

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Turner TN, Coe BP, Dickel DE, Hoekzema K, Nelson BJ, Zody MC et al. (2017) Genomic patterns of de novo mutation in simplex autism. Cell 171(3):710–722

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • van der Auwera GA, O’Connor BD (2020) Genomics in the cloud: using Docker, GATK, and WDL in Terra (1st Edition). O’Reilly Media.

  • Venn O, Turner I, Mathieson I, de Groot N, Bontrop R, McVean G (2014) Strong male bias drives germline mutation in chimpanzees. Science 344(6189):1272–1275

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Versoza CJ, Ehmke E, Jensen JD, Pfeifer SP (2024) Characterizing the Rates and Patterns of De Novo Germline Mutations in the Aye-Aye (Daubentonia madagascariensis). Mol Biol Evol 42(3):msaf034. https://doi.org/10.1093/molbev/msaf034

    Article  Google Scholar 

  • Wang RJ, Thomas GWC, Raveendran M, Harris RA, Doddapaneni H, Muzny DM et al. (2020) Paternal age in rhesus macaques is positively associated with germline mutation accumulation but not with measures of offspring sociability. Genome Res 30(6):826–834

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang RJ, Peña-Garcia Y, Bibby MG, Raveendran M, Harris RA, Jansen HT et al. (2022) Examining the effects of hibernation on germline mutation rates in grizzly bears. Genome Biol Evol 14(10):evac148

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang RJ, Raveendran M, Harris RA, Murphy WJ, Lyons LA, Rogers J et al. (2022b) De novo mutations in domestic cat are consistent with an effect of reproductive longevity on both the rate and spectrum of mutations. Mol Biol Evol 39(7):msac127

  • Wong WS, Solomon BD, Bodian DL, Kothiyal P, Eley G, Huddleston KC et al. (2016) New observations on maternal age effect on germline de novo mutations. Nat Commun 7:10486

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wu FL, Strand AI, Cox LA, Ober C, Wall JD, Moorjani P et al. (2020) A comparison of humans and baboons suggests germline mutation rates do not track cell divisions. PLoS Biol 18(8):e3000838

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yang C, Zhou Y, Marcus S, Formenti G, Bergeron LA, Song Z et al. (2021) Evolutionary and biomedical insights from a marmoset diploid genome assembly. Nature 594(7862):227–233

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yuen RK, Thiruvahindrapuram B, Merico D, Walker S, Tammimies K, Hoang N et al. (2015) Whole-genome sequencing of quartet families with autism spectrum disorder. Nat Med 21(2):185–191

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Colin Brand for sharing their previously generated variant catalogue of Western chimpanzees and members of the Pfeifer Lab for their help with the visual curation of IGV screenshots. Computations were performed on the Sol Supercomputer at Arizona State University (Jennewein et al. 2023).

Funding

This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM151008 to SPP. MM, AG, and CJV were supported by the National Science Foundation CAREER Award DEB-2045343 to SPP. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.

Author information

Authors and Affiliations

Authors

Contributions

SPP conceived and designed the study. MM, AG, and CJV conducted read simulations and analyzed the data. MM and SPP wrote the manuscript with input from all authors. SPP obtained research funding.

Corresponding author

Correspondence to Susanne P. Pfeifer.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Associate editor: Louise Johnson.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Milhaven, M., Garg, A., Versoza, C.J. et al. Quantifying the effects of computational filter criteria on the accurate identification of de novo mutations at varying levels of sequencing coverage. Heredity (2025). https://doi.org/10.1038/s41437-025-00754-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41437-025-00754-0

Search

Quick links