Abstract
The rate of spontaneous (de novo) germline mutation is a key parameter in evolutionary biology, impacting genetic diversity and contributing to the evolution of populations and species. Mutation rates themselves evolve over time but the mechanisms underlying the mutation rate variation observed across the Tree of Life remain largely to be elucidated. In recent years, whole genome sequencing has enabled the estimation of mutation rates for several organisms. However, due to a lack of community standards, many previous studies differ both empirically – most notably, in the depth of sequencing used to reliably identify de novo mutations – and computationally – utilizing different computational pipelines to detect germline mutations as well as different analysis strategies to mitigate technical artifacts – rendering comparisons between studies challenging. Using a pedigree of Western chimpanzees as an illustrative example, we here quantify the effects of commonly utilized quality metrics to reliably identify de novo mutations at different levels of sequencing coverage. We demonstrate that datasets with a mean depth of ≤ 30X are ill-suited for the detection of de novo mutations due to high false positive rates that can only be partially mitigated by computational filter criteria. In contrast, higher coverage datasets enable a comprehensive identification of de novo mutations at low false positive rates, with minimal benefits beyond a sequencing coverage of 60X, suggesting that future work should favor breadth (by sequencing additional individuals) over depth. Importantly, the simulation and analysis framework described here provides conceptual guidelines that will allow researchers to take study design and species-specific resources into account when determining computational filtering strategies for their organism of interest.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
£169.00 per year
only £14.08 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Acuna-Hidalgo R, Veltman JA, Hoischen A (2016) New insights into the generation and role of de novo mutations in health and disease. Genome Biol 17(1):241
Agrawal AF, Whitlock MC (2012) Mutation load: the fitness of individuals in populations where deleterious alleles are abundant. Annu Rev Ecol Evol Syst 43:115–135
Auton A, Fledel-Alon A, Pfeifer S, Venn O, Ségurel L, Street T et al. (2012) A fine-scale chimpanzee genetic map from population sequencing. Science 336(6078):193–198
Baer CF, Miyamoto MM, Denver DR (2007) Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet 8(8):619–631
Bergeron LA, Besenbacher S, Bakker J, Zheng J, Li P, Pacheco G et al. (2021) The germline mutational process in rhesus macaque and its implications for phylogenetic dating. GigaScience 10(5):giab029
Bergeron LA, Besenbacher S, Turner T, Versoza CJ, Wang RJ, Price AL et al. (2022) The Mutationathon highlights the importance of reaching standardization in estimates of pedigree-based germline mutation rates. Elife 11:e73577
Bergeron LA, Besenbacher S, Zheng J, Li P, Bertelsen MF, Quintard B et al. (2023) Evolution of the germline mutation rate across vertebrates. Nature 615(7951):285–291
Besenbacher S, Hvilsom C, Marques-Bonet T, Mailund T, Schierup MH (2019) Direct estimation of mutations in great apes reconciles phylogenetic dating. Nat Ecol Evol 3(2):286–292
Besenbacher S, Liu S, Izarzugaza JM, Grove J, Belling K, Bork-Jensen J et al. (2015) Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat Commun 6:5969
Brand CM, White FJ, Rogers AR, Webster TH (2022) Estimating bonobo (Pan paniscus) and chimpanzee (Pan troglodytes) evolutionary history from nucleotide site patterns. Proc Natl Acad Sci USA 119(17):e2200858119
Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097
Campbell CD, Chong JX, Malig M, Ko A, Dumont BL, Han L et al. (2012) Estimating the human mutation rate using autozygosity in a founder population. Nat Genet 44(11):1277–1281
Campbell CR, Tiley GP, Poelstra JW, Hunnicutt KE, Larsen PA, Lee HJ et al. (2021) Pedigree-based and phylogenetic methods support surprising patterns of mutation rate and spectrum in the gray mouse lemur. Heredity 127(2):233–244
Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F et al. (2011) Variation in genome-wide mutation rates within and between human families. Nat Genet 43(7):712–714
Fan S, Hansen ME, Lo Y, Tishkoff SA (2016) Going global by adapting local: a review of recent human adaptation. Science 354(6308):54–59
Feng C, Pettersson M, Lamichhaney S, Rubin CJ, Rafati N, Casini M et al. (2017) Moderate nucleotide diversity in the Atlantic herring is associated with a low mutation rate. Elife 6:e23907
Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I et al. (2015) Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47(7):822–826
Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P (2022) A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput Biol 18(5):e1009123
Goldmann JM, Wong WS, Pinelli M, Farrah T, Bodian D, Stittrich AB et al. (2016) Parent-of-origin-specific signatures of de novo mutations. Nat Genet 48(8):935–939
Harris RB, Irwin K, Jones MR, Laurent S, Barrett RDH, Nachman MW et al. (2020) The population genetics of crypsis in vertebrates: recent insights from mice, hares, and lizards. Heredity 124(1):1–14
Holtgrewe M (2010) Mason: a read simulator for second generation sequencing data. Dissertation, Freie Universität Berlin.
Hwang DG, Green P (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA 101(39):13994–14001
Jennewein DM, Lee J, Kurtz C, Dizon W, Shaeffer I, Chapman A et al. (2023) The Sol Supercomputer at Arizona State University. Practice and experience in advanced research computing, 296–301.
Jiang YH, Yuen RK, Jin X, Wang M, Chen N, Wu X et al. (2013) Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am J Hum Genet 93(2):249–263
Jónsson H, Sulem P, Kehr B, Kristmundsdottir S, Zink F, Hjartarson E et al. (2017) Parental influence on human germline de novo mutations in 1548 trios from Iceland. Nature 549(7673):519–522
Keightley PD, Ness RW, Halligan DL, Haddrill PR (2014) Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics 196(1):313–320
Keightley PD, Pinharanda A, Ness RW, Simpson F, Dasmahapatra KK, Mallet J et al. (2015) Estimation of the spontaneous mutation rate in Heliconius melpomene. Mol Biol Evol 32(1):239–243
Kessler MD, Loesch DP, Perry JA, Heard-Costa NL, Taliun D, Cade BE et al. (2020) De novo mutations across 1,465 diverse genomes reveal mutational insights and reductions in the Amish founder population. Proc Natl Acad Sci USA 117(5):2560–2569
Koch E, Schweizer RM, Schweizer TM, Stahler DR, Smith DW, Wayne RK et al. (2019) De novo mutation rate estimation in wolves of known pedigree. Mol Biol Evol 36(11):2536–2547
Kong A, Frigge ML, Masson G, Besenbacher S, Sulem P, Magnusson G et al. (2012) Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488(7412):471–475
Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS et al. (2018) High-resolution comparative analysis of great ape genomes. Science 360(6393):eaar6343
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25(14):1754–1760
Lindsay SJ, Rahbari R, Kaplanis J, Keane T, Hurles ME (2019) Similarities and differences in patterns of germline mutation between mice and humans. Nat Commun 10(1):4053
Liu H, Jia Y, Sun X, Tian D, Hurst LD, Yang S (2017) Direct determination of the mutation rate in the bumblebee reveals evidence for weak recombination-associated mutation and an approximate rate constancy in insects. Mol Biol Evol 34(1):119–130
Maretty L, Jensen JM, Petersen B, Sibbesen JA, Liu S, Villesen P et al. (2017) Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature 548(7665):87–91
Martin HC, Batty EM, Hussin J, Westall P, Daish T, Kolomyjec S et al. (2018) Insights into platypus population structure and history from whole-genome sequencing. Mol Biol Evol 35(5):1238–1252
Michaelson JJ, Shi Y, Gujral M, Zheng H, Malhotra D, Jin X et al. (2012) Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151(7):1431–1442
Milhaven M, Pfeifer SP (2023) Performance comparison of six popular short-read simulators. Heredity 130(2):55–63
Milholland B, Dong X, Zhang L, Hao X, Suh Y, Vijg J (2017) Differences between germline and somatic mutation rates in humans and mice. Nat Commun 8:15183
Pfeifer SP (2021) Studying mutation rate evolution in primates – the effects of computational pipeline and parameter choices. GigaScience 10(10):giab069
Pfeifer SP (2017a) Direct estimate of the spontaneous germ line mutation rate in African green monkeys. Evolution 71(12):2858–2870
Pfeifer SP (2017b) From next-generation resequencing reads to a high quality variant data set. Heredity 118(2):111–124
Pfeifer SP (2020) Spontaneous mutation rates. In Ho SYW (ed) The Molecular Evolutionary Clock. Theory and Practice. Springer Nature, pp. 35–44
Prado-Martinez J, Sudmant PH, Kidd JM, Li H, Kelley JL, Lorente-Galdos B et al. (2013) Great ape genetic diversity and population history. Nature 499(7459):471–475
Rahbari R, Wuster A, Lindsay SJ, Hardwick RJ, Alexandrov LB, Turki SA et al. (2016) Timing, rates and spectra of human germline mutation. Nat Genet 48(2):126–133
Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328(5978):636–639
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G et al. (2011) Integrative genomics viewer. Nat Biotechnol 29(1):24–26
Sasani TA, Pedersen BS, Gao Z, Baird L, Przeworski M, Jorde LB et al. (2019) Large, three-generation human families reveal post-zygotic mosaicism and variability in germline mutation accumulation. Elife 8:e46922
Smeds L, Qvarnström A, Ellegren H (2016) Direct estimate of the rate of germline mutation in a bird. Genome Res 26(9):1211–1218
Tatsumoto S, Go Y, Fukuta K, Noguchi H, Hayakawa T, Tomonaga M et al. (2017) Direct estimation of de novo mutation rates in a chimpanzee parent-offspring trio by ultra-deep whole genome sequencing. Sci Rep. 7(1):13561
Thomas GWC, Wang RJ, Puri A, Harris RA, Raveendran M, Hughes DST et al. (2018) Reproductive longevity predicts mutation rates in primates. Curr Biol 28(19):3193–3197
Turner TN, Coe BP, Dickel DE, Hoekzema K, Nelson BJ, Zody MC et al. (2017) Genomic patterns of de novo mutation in simplex autism. Cell 171(3):710–722
van der Auwera GA, O’Connor BD (2020) Genomics in the cloud: using Docker, GATK, and WDL in Terra (1st Edition). O’Reilly Media.
Venn O, Turner I, Mathieson I, de Groot N, Bontrop R, McVean G (2014) Strong male bias drives germline mutation in chimpanzees. Science 344(6189):1272–1275
Versoza CJ, Ehmke E, Jensen JD, Pfeifer SP (2024) Characterizing the Rates and Patterns of De Novo Germline Mutations in the Aye-Aye (Daubentonia madagascariensis). Mol Biol Evol 42(3):msaf034. https://doi.org/10.1093/molbev/msaf034
Wang RJ, Thomas GWC, Raveendran M, Harris RA, Doddapaneni H, Muzny DM et al. (2020) Paternal age in rhesus macaques is positively associated with germline mutation accumulation but not with measures of offspring sociability. Genome Res 30(6):826–834
Wang RJ, Peña-Garcia Y, Bibby MG, Raveendran M, Harris RA, Jansen HT et al. (2022) Examining the effects of hibernation on germline mutation rates in grizzly bears. Genome Biol Evol 14(10):evac148
Wang RJ, Raveendran M, Harris RA, Murphy WJ, Lyons LA, Rogers J et al. (2022b) De novo mutations in domestic cat are consistent with an effect of reproductive longevity on both the rate and spectrum of mutations. Mol Biol Evol 39(7):msac127
Wong WS, Solomon BD, Bodian DL, Kothiyal P, Eley G, Huddleston KC et al. (2016) New observations on maternal age effect on germline de novo mutations. Nat Commun 7:10486
Wu FL, Strand AI, Cox LA, Ober C, Wall JD, Moorjani P et al. (2020) A comparison of humans and baboons suggests germline mutation rates do not track cell divisions. PLoS Biol 18(8):e3000838
Yang C, Zhou Y, Marcus S, Formenti G, Bergeron LA, Song Z et al. (2021) Evolutionary and biomedical insights from a marmoset diploid genome assembly. Nature 594(7862):227–233
Yuen RK, Thiruvahindrapuram B, Merico D, Walker S, Tammimies K, Hoang N et al. (2015) Whole-genome sequencing of quartet families with autism spectrum disorder. Nat Med 21(2):185–191
Acknowledgements
The authors would like to thank Colin Brand for sharing their previously generated variant catalogue of Western chimpanzees and members of the Pfeifer Lab for their help with the visual curation of IGV screenshots. Computations were performed on the Sol Supercomputer at Arizona State University (Jennewein et al. 2023).
Funding
This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM151008 to SPP. MM, AG, and CJV were supported by the National Science Foundation CAREER Award DEB-2045343 to SPP. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.
Author information
Authors and Affiliations
Contributions
SPP conceived and designed the study. MM, AG, and CJV conducted read simulations and analyzed the data. MM and SPP wrote the manuscript with input from all authors. SPP obtained research funding.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Associate editor: Louise Johnson.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Milhaven, M., Garg, A., Versoza, C.J. et al. Quantifying the effects of computational filter criteria on the accurate identification of de novo mutations at varying levels of sequencing coverage. Heredity (2025). https://doi.org/10.1038/s41437-025-00754-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41437-025-00754-0