Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data

Guirao-Rico, Sara; González Pérez, Josefa

Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/251033

COMPARTIR / EXPORTAR:

SHARE CORE BASE	Comparte tu historia de Acceso Abierto
Visualizar otros formatos: MARC \| Dublin Core \| RDF \| ORE \| MODS \| METS \| DIDL \| DATACITE
Refman EndNote Bibtex RefWorks Excel CSV PDF DataCite Send via email

Título:	Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data
Autor:	Guirao-Rico, Sara CSIC ORCID; González Pérez, Josefa CSIC ORCID
Palabras clave:	Bayesian High-throughput sequencing Low-frequency variants Maximum likelihood Poùlation genomics Site frequency spectrum
Fecha de publicación:	may-2021
Editor:	John Wiley & Sons
Citación:	Molecular Ecology Resources 21(4): 1216-1229 (2021)
Resumen:	Population genomics is a fast-developing discipline with promising applications in a growing number of life sciences fields. Advances in sequencing technologies and bioinformatics tools allow population genomics to exploit genome-wide information to identify the molecular variants underlying traits of interest and the evolutionary forces that modulate these variants through space and time. However, the cost of genomic analyses of multiple populations is still too high to address them through individual genome sequencing. Pooling individuals for sequencing can be a more effective strategy in Single Nucleotide Polymorphism (SNP) detection and allele frequency estimation because of a higher total coverage. However, compared to individual sequencing, SNP calling from pools has the additional difficulty of distinguishing rare variants from sequencing errors, which is often avoided by establishing a minimum threshold allele frequency for the analysis. Finding an optimal balance between minimizing information loss and reducing sequencing costs is essential to ensure the success of population genomics studies. Here, we have benchmarked the performance of SNP callers for Pool-seq data, based on different approaches, under different conditions, and using computer simulations and real data. We found that SNP callers performance varied for allele frequencies up to 0.35. We also found that SNP callers based on Bayesian (SNAPE-pooled) or maximum likelihood (MAPGD) approaches outperform the two heuristic callers tested (VarScan and PoolSNP), in terms of the balance between sensitivity and FDR both in simulated and sequencing data. Our results will help inform the selection of the most appropriate SNP caller not only for large-scale population studies but also in cases where the Pool-seq strategy is the only option, such as in metagenomic or polyploid studies.
Versión del editor:	https://doi.org/10.1111/1755-0998.13343
URI:	http://hdl.handle.net/10261/251033
DOI:	10.1111/1755-0998.13343
ISSN:	1755-098X
E-ISSN:	1755-0998
Aparece en las colecciones:	(IBE) Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Pool-seq_SNP.pdf		663,4 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro completo

CORE Recommender

PubMed Central
Citations

8

checked on 18-mar-2024

SCOPUS^TM
Citations

11

checked on 13-abr-2024

WEB OF SCIENCE^TM
Citations

12

checked on 27-feb-2024

Page view(s)

90

checked on 18-abr-2024

Download(s)

231

checked on 18-abr-2024

Google Scholar^TM

Check