English   español  
Please use this identifier to cite or link to this item: http://hdl.handle.net/10261/231577
Share/Impact:
Statistics
logo share SHARE   Add this article to your Mendeley library MendeleyBASE

Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL | DATACITE
Exportar a otros formatos:

Title

Fasta sequences for the Drosophila melanogaster Manually Curated Transposable Elements (MCTE) library

AuthorsRech, Gabriel E.
KeywordsDrosophila melanogaster
Transposable elements
Library
Consensuses
Transposable Elements Drosophila Melanogaster Rech 2021
Issue Date2-Mar-2021
PublisherDIGITAL.CSIC
CitationRech, Gabriel E.; 2021; Fasta sequences for the Drosophila melanogaster Manually Curated Transposable Elements (MCTE) library [Dataset]; DIGITAL.CSIC; http://dx.doi.org/10.20350/digitalCSIC/13765
AbstractManually Curated Transposable Elements (MCTE) library from Drosophila melanogaster. De novo reconstruction was performed using TEdenovo pipeline from REPET package (v.2.5) (Flutre et al. 2011; Hoede et al. 2014; Quesneville et al. 2005). De-novo genome assemblies were sequenced using long-read sequencing technologies (Oxford Nanopore Technologies or Pacific Bioscience).
DescriptionThe compressed file contains two files: MCTE.fasta with the consensus TE sequences in fasta format and the MCTE_info.xlsx Excel file with information for every sequence in the MCTE.fasta file. We used the REPET package (v.2.5) (Flutre et al. 2011; Hoede et al. 2014; Quesneville et al. 2005) for performing TE annotations using a manually curated TE (MCTE) library of consensus sequences. Briefly, REPET is composed of two main pipelines, TEdenovo dedicated to de novo detection of TE families and TEannot for the annotation and analysis of TEs in genomic sequences. For the creation of the MCTE library, we first run the REPET (v.2.5) (Flutre et al. 2011; Hoede et al. 2014; Quesneville et al. 2005) TEdenovo pipeline with default parameters on 13 genomes Drosophila melanogaster natural strains. The manual curation of the identified consensuses involved three main procedures: removal of redundant sequences, the manual identification of potentially artifactual sequences and the classification of consensuses into families. Redundant sequences (consensus sequences present in more than one genome) were identified by first running the PASTEClassifier module from PASTEC with default options (Hoede et al. 2014). We also performed similarity clustering, multiple sequence alignments (MSA) of the clusters and generated consensus sequences for each MSA in order to obtain a consensus sequence representative of all the genomes. We manually explored the consensus sequences and their copies using the plotCoverage tool from REPET and discarded consensuses showing mainly a high number of small copies. The assignation of the consensus sequences into families was performed using BLAT (Kent 2002) against the curated canonical sequences of Drosophila TEs from the Berkeley Drosophila Genome Project (BDGP) v.9.4.1 (https://fruitfly.org/p_disrupt/TE.html). When no matches were found, we used RepeatMasker (v.4) (Smit 2015) with the release RepBaseRepeatMaskerEdition-20181026 of the RepBase (Bao et al. 2015).
URIhttp://hdl.handle.net/10261/231577
DOIhttp://dx.doi.org/10.20350/digitalCSIC/13765
Appears in Collections:(IBE) Conjuntos de datos
Files in This Item:
File Description SizeFormat 
MCTE.rar239,26 kBUnknownView/Open
readme.txt2,87 kBTextView/Open
Show full item record
Review this work
 


WARNING: Items in Digital.CSIC are protected by copyright, with all rights reserved, unless otherwise indicated.