English   español  
Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/162254
Compartir / Impacto:
Estadísticas
Add this article to your Mendeley library MendeleyBASE
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL
Título

Unmasking new intra-species diversity through K-mer count analysis

AutorPérez Cantalapiedra, Carlos ; Contreras-Moreira, Bruno ; Casas Cendoya, Ana María ; Igartua Arregui, Ernesto
Palabras claveCopy Number Variation
Gene Families
Genotyping
Barley
Presence-Absence Variation
Sequencing Plant Genomics
NBS-LRR
K-mer Analysis
Pentotricopeptide
Pangenomics
Exome Capture
Fecha de publicaciónmar-2018
CitaciónEUCARPIA Cereal Section/ IWW2 Meetings (Polydome - Clermont-Ferrand, France. 19-22 Marzo 2018)
ResumenHigh-throughput sequencing is often used to examine intra-species diversity. Most studies are focused on calling and genotyping SNPs. Other kinds of genomic variation, such as copy-number variation (CNV), are more rarely exploited despite literature reports linking them to phenotypic differences. For some loci, it is difficult to identify reliable SNPs. For instance, reads from closely related sequences (e.g. paralog genes) will often map stacked to the same location if some of those loci are absent from the reference sequence. Such piled up mappings produce abundant fake heterozygous SNPs, and thus have been called apparent heterozygous mappings (AHMs). To avoid wrong conclusions from false positive calls, SNPs from AHMs are often discarded, either in early (e.g. samples expected to be homozygous), or in downstream steps of the analysis (e.g. when incoherent haplotype blocks are identified). This would lead to information loss at certain loci. AHMs can be seen as a kind of CNV which is specific to non-identical copies. Unmasking such variation could help to i) assess the completeness of a genome or pan-genome reference, ii) confirm results from other CNV genotyping methods, when the copies originate in non-identical loci, iii) provide hints about the history and behavior of duplicating DNA loci, and iv) reveal novel intra-species genetic diversity. Here we present a software pipeline, kmeleon, available at https://github.com/eead-csic-compbio/kmeleon, designed to identify regions harboring AHMs. kmeleon is based on mappings, and thus it can be used for both homozygous and heterozygous samples. First, the different k-mers (sequences of length k) mapping to a single locus are identified and counted. Then, loci are classified based on the presence or absence of AHMs. From those intervals, it is straightforward to perform comparisons between genotypes, or to translate existing annotation to the regions with AHMs. We used exome capture data to detect AHMs in a set of barley accessions. We included the cultivar Morex, the genotype of the genome reference, as a control sample. As expected, it had the lowest number of AHMs, although some were still detectable. For all accessions, AHMs were found both in inter- and intragenic loci. Enrichment analysis showed that NBS-LRR proteins were overrepresented at AHMs, whereas PPRs proteins were depleted. Also, we will show that AHMs can be used to infer phylogenetic trees which are congruent to those produced with SNP-based approaches, supporting the information value, of this hidden variability, to describe genetic relationships.
Descripción1 .pdf copy (3 Figs.) from the original poster of the Authors. Creative Commons License Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
URIhttp://hdl.handle.net/10261/162254
Aparece en las colecciones: (EEAD) Comunicaciones congresos
Ficheros en este ítem:
Fichero Descripción Tamaño Formato  
CantalapiedraCP_EUCARPIA-Post_2018.pdf1,5 MBAdobe PDFVista previa
Visualizar/Abrir
Mostrar el registro completo
 


NOTA: Los ítems de Digital.CSIC están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.