Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/351661
COMPARTIR / EXPORTAR:
logo share SHARE logo core CORE BASE
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL | DATACITE

Invitar a revisión por pares abierta
Título

Statistical Analysis and Tokenization of Epitopes to Construct Artificial Neoepitope Libraries

AutorLópez-Martínez, Elena; Manteca, Aitor; Ferruz, Noelia; Cortajarena, Aitziber L. CSIC ORCID
Palabras claveEpitope analysis
Library design
Tokenization
Natural language processing
Byte pair encoding
Fecha de publicación2023
EditorAmerican Chemical Society
CitaciónACS Synthetic Biology 12(10): 2812-2818 (2023)
ResumenEpitopes are specific regions on an antigen’s surface that the immune system recognizes. Epitopes are usually protein regions on foreign immune-stimulating entities such as viruses and bacteria, and in some cases, endogenous proteins may act as antigens. Identifying epitopes is crucial for accelerating the development of vaccines and immunotherapies. However, mapping epitopes in pathogen proteomes is challenging using conventional methods. Screening artificial neoepitope libraries against antibodies can overcome this issue. Here, we applied conventional sequence analysis and methods inspired in natural language processing to reveal specific sequence patterns in the linear epitopes deposited in the Immune Epitope Database (www.iedb.org) that can serve as building blocks for the design of universal epitope libraries. Our results reveal that amino acid frequency in annotated linear epitopes differs from that in the human proteome. Aromatic residues are overrepresented, while the presence of cysteines is practically null in epitopes. Byte pair encoding tokenization shows high frequencies of tryptophan in tokens of 5, 6, and 7 amino acids, corroborating the findings of the conventional sequence analysis. These results can be applied to reduce the diversity of linear epitope libraries by orders of magnitude.
Versión del editorhttps://doi.org/10.1021/acssynbio.3c00201
URIhttp://hdl.handle.net/10261/351661
DOI10.1021/acssynbio.3c00201
E-ISSN2161-5063
Aparece en las colecciones: (IBMB) Artículos




Ficheros en este ítem:
Fichero Descripción Tamaño Formato
StatisticalAnalysisand-Tokenization_López_Art_2023.pdf2,88 MBAdobe PDFVisualizar/Abrir
Mostrar el registro completo

CORE Recommender

SCOPUSTM   
Citations

1
checked on 24-abr-2024

Page view(s)

7
checked on 27-abr-2024

Download(s)

1
checked on 27-abr-2024

Google ScholarTM

Check

Altmetric

Altmetric


Este item está licenciado bajo una Licencia Creative Commons Creative Commons