Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/270429
COMPARTIR / EXPORTAR:
logo share SHARE BASE
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL | DATACITE

Invitar a revisión por pares abierta
Título

Medical Lexicon for Spanish (MedLexSp) [DATASET]

AutorCampillos-Llanos, Leonardo CSIC ORCID
Palabras claveMedical Lexicon
Biomedical natural language processing
Tesauro UNESCOLingüística
Medicina
Fecha de publicación25-may-2022
EditorDIGITAL.CSIC
CitaciónCampillos-Llanos, Leonardo; 2022; Medical Lexicon for Spanish (MedLexSp) [Dataset]; DIGITAL.CSIC; https://doi.org/10.20350/digitalCSIC/14656
ResumenMedLexSp is an unified medical lexicon for Medical Natural Language Processing in Spanish. It includes terms and inflected word forms with part-of-speech information and Unified Medical Language System (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used Natural Language Processing techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs 10, the Anatomical Therapeutical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. This dataset was collected during the NLPMedTerm project and the CLARA-MeD project, with the goal of creating a lexical resource for medical text processing in the Spanish language.
MedLexSp is an unified medical lexicon for Medical Natural Language Processing in Spanish. It includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs).
Spain, Latin America and United States of America (data from MedlinePlus Spanish and the Spanish version of the National Cancer Institute Dictionary of Medical Terms).
Descripción- MedLexSp.dsv: a delimiter-separated value file, with the following data fields: Field 1 is the UMLS CUI of the entity; field 2, the lemma; field 3, the variant forms; field 4, the part-of-speech; field 5, the semantic types(s); and field 6, the semantic group. - MedLexSp.xml: an XML-encoded version using the Lexical Markup Framework (LMF), which includes the morphological data (number, gender, verb tense and person, and information about affix/abbreviation data). The Document Type Definition file is also provided (lmf.dtd). - Lexical Record files: in subfolder "LR/": · LR_abr.dsv: list of equivalences between acronyms/abbreviations and full forms. · LR_affix.dsv: provides the equivalence between affixes/roots and their meanings. · LR_n_v.dsv: list of deverbal nouns. · LR_adj_n.dsv: list of adjectives derived from nouns. - Spacy lemmatizer (in subfolder "spacy_lemmatizer/"): lemmatizer.py - Stanza lemmatizer (in subfolder "stanza_lemmatizer/"): ancora-medlexsp.pt
File List: 1) MedLexSp.dsv; 2) MedLexSp.xml and lmf.dtd (Document Type Definition); 3) Lexical Record files: in subfolder "LR/": 3.1) LR_abr.dsv; 3.2) LR_affix.dsv; 3.3) LR_n_v.dsv; 3.4) LR_adj_n.dsv; 4) Spacy lemmatizer (in subfolder "spacy_lemmatizer/"): lemmatizer.py 5) Stanza lemmatizer (in subfolder "stanza_lemmatizer/"): ancora-medlexsp.pt See more information about the format below. Companion code and files can be found in the github repository: https://github.com/lcampillos/MedLexSp
URIhttp://hdl.handle.net/10261/270429
DOIhttps://doi.org/10.20350/digitalCSIC/14656
Aparece en las colecciones: (CCHS-ILLA) Conjuntos de datos




Ficheros en este ítem:
Fichero Descripción Tamaño Formato
Medical_Lexicon_Spanish_MedLexSp.pdfInstructions3,2 kBAdobe PDFVista previa
Visualizar/Abrir
MedLexSp_License_2022.pdfUsage license142,28 kBAdobe PDFVista previa
Visualizar/Abrir
README.txt10,86 kBTextVisualizar/Abrir
Mostrar el registro completo

CORE Recommender
fair
fair eva

Page view(s)

469
checked on 19-abr-2024

Download(s)

229
checked on 19-abr-2024

Google ScholarTM

Check

Altmetric

Altmetric


NOTA: Los ítems de Digital.CSIC están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.