Consistency of metagenomic assignment programs in simulated and real data

García-Etxebarria, Koldo; Garcia-Garcerà, Marc; Calafell, Francesc

Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/95685

COMPARTIR / EXPORTAR:

SHARE CORE BASE	Comparte tu historia de Acceso Abierto
Visualizar otros formatos: MARC \| Dublin Core \| RDF \| ORE \| MODS \| METS \| DIDL \| DATACITE
Refman EndNote Bibtex RefWorks Excel CSV PDF DataCite Send via email

Título:	Consistency of metagenomic assignment programs in simulated and real data
Autor:	García-Etxebarria, Koldo CSIC ORCID; Garcia-Garcerà, Marc CSIC ORCID; Calafell, Francesc CSIC ORCID
Fecha de publicación:	28-mar-2014
Editor:	BioMed Central
Citación:	BMC Bioinformatics 15(1): 90 (2014)
Resumen:	[Backgroun] Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. [Results] Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST + LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. [Conclusions] The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information.
Versión del editor:	http://dx.doi.org/10.1186/1471-2105-15-90
URI:	http://hdl.handle.net/10261/95685
DOI:	10.1186/1471-2105-15-90
E-ISSN:	1471-2105
Aparece en las colecciones:	(IBE) Artículos

Ficheros en este ítem:

Fichero	Tamaño	Formato
1471-2105-15-90.xml	99,9 kB	XML	Visualizar/Abrir
1471-2105-15-90-S4.ZIP	2,23 MB	Unknown	Visualizar/Abrir
1471-2105-15-90.pdf	442,13 kB	Adobe PDF	Visualizar/Abrir
1471-2105-15-90-S5.ZIP	10,59 MB	Unknown	Visualizar/Abrir
1471-2105-15-90-S3.PDF	669,28 kB	Adobe PDF	Visualizar/Abrir
1471-2105-15-90-S2.PDF	16,11 kB	Adobe PDF	Visualizar/Abrir
1471-2105-15-90-S1.DOCX	234,26 kB	Microsoft Word XML	Visualizar/Abrir

Mostrar el registro completo

CORE Recommender

PubMed Central
Citations

12

checked on 21-abr-2024

SCOPUS^TM
Citations

15

checked on 15-abr-2024

WEB OF SCIENCE^TM
Citations

14

checked on 26-feb-2024

Page view(s)

340

checked on 21-abr-2024

Download(s)

666

checked on 21-abr-2024

Google Scholar^TM

Check