Por favor, use este identificador para citar o enlazar a este item:
http://hdl.handle.net/10261/351661
COMPARTIR / EXPORTAR:
SHARE CORE BASE | |
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL | DATACITE | |
Título: | Statistical Analysis and Tokenization of Epitopes to Construct Artificial Neoepitope Libraries |
Autor: | López-Martínez, Elena; Manteca, Aitor; Ferruz, Noelia; Cortajarena, Aitziber L. CSIC ORCID | Palabras clave: | Epitope analysis Library design Tokenization Natural language processing Byte pair encoding |
Fecha de publicación: | 2023 | Editor: | American Chemical Society | Citación: | ACS Synthetic Biology 12(10): 2812-2818 (2023) | Resumen: | Epitopes are specific regions on an antigen’s surface that the immune system recognizes. Epitopes are usually protein regions on foreign immune-stimulating entities such as viruses and bacteria, and in some cases, endogenous proteins may act as antigens. Identifying epitopes is crucial for accelerating the development of vaccines and immunotherapies. However, mapping epitopes in pathogen proteomes is challenging using conventional methods. Screening artificial neoepitope libraries against antibodies can overcome this issue. Here, we applied conventional sequence analysis and methods inspired in natural language processing to reveal specific sequence patterns in the linear epitopes deposited in the Immune Epitope Database (www.iedb.org) that can serve as building blocks for the design of universal epitope libraries. Our results reveal that amino acid frequency in annotated linear epitopes differs from that in the human proteome. Aromatic residues are overrepresented, while the presence of cysteines is practically null in epitopes. Byte pair encoding tokenization shows high frequencies of tryptophan in tokens of 5, 6, and 7 amino acids, corroborating the findings of the conventional sequence analysis. These results can be applied to reduce the diversity of linear epitope libraries by orders of magnitude. | Versión del editor: | https://doi.org/10.1021/acssynbio.3c00201 | URI: | http://hdl.handle.net/10261/351661 | DOI: | 10.1021/acssynbio.3c00201 | E-ISSN: | 2161-5063 |
Aparece en las colecciones: | (IBMB) Artículos |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
StatisticalAnalysisand-Tokenization_López_Art_2023.pdf | 2,88 MB | Adobe PDF | Visualizar/Abrir |
CORE Recommender
SCOPUSTM
Citations
1
checked on 24-abr-2024
Page view(s)
7
checked on 27-abr-2024
Download(s)
1
checked on 27-abr-2024
Google ScholarTM
Check
Altmetric
Altmetric
Este item está licenciado bajo una Licencia Creative Commons