Por favor, use este identificador para citar o enlazar a este item:
http://hdl.handle.net/10261/270429
COMPARTIR / EXPORTAR:
SHARE BASE | |
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL | DATACITE | |
Título: | Medical Lexicon for Spanish (MedLexSp) [DATASET] |
Autor: | Campillos-Llanos, Leonardo CSIC ORCID | Palabras clave: | Medical Lexicon Biomedical natural language processing |
Tesauro UNESCO: | Lingüística Medicina |
Fecha de publicación: | 25-may-2022 | Editor: | DIGITAL.CSIC | Citación: | Campillos-Llanos, Leonardo; 2022; Medical Lexicon for Spanish (MedLexSp) [Dataset]; DIGITAL.CSIC; https://doi.org/10.20350/digitalCSIC/14656 | Resumen: | MedLexSp is an unified medical lexicon for Medical Natural Language Processing in Spanish. It includes terms and inflected word forms with part-of-speech information and Unified Medical Language System (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used Natural Language Processing techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs 10, the Anatomical Therapeutical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. This dataset was collected during the NLPMedTerm project and the CLARA-MeD project, with the goal of creating a lexical resource for medical text processing in the Spanish language. MedLexSp is an unified medical lexicon for Medical Natural Language Processing in Spanish. It includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). Spain, Latin America and United States of America (data from MedlinePlus Spanish and the Spanish version of the National Cancer Institute Dictionary of Medical Terms). |
Descripción: | - MedLexSp.dsv: a delimiter-separated value file, with the following data fields: Field 1 is the UMLS CUI of the entity; field 2, the lemma; field 3, the variant forms; field 4, the part-of-speech; field 5, the semantic types(s); and field 6, the semantic group. - MedLexSp.xml: an XML-encoded version using the Lexical Markup Framework (LMF), which includes the morphological data (number, gender, verb tense and person, and information about affix/abbreviation data). The Document Type Definition file is also provided (lmf.dtd). - Lexical Record files: in subfolder "LR/": · LR_abr.dsv: list of equivalences between acronyms/abbreviations and full forms. · LR_affix.dsv: provides the equivalence between affixes/roots and their meanings. · LR_n_v.dsv: list of deverbal nouns. · LR_adj_n.dsv: list of adjectives derived from nouns. - Spacy lemmatizer (in subfolder "spacy_lemmatizer/"): lemmatizer.py - Stanza lemmatizer (in subfolder "stanza_lemmatizer/"): ancora-medlexsp.pt File List: 1) MedLexSp.dsv; 2) MedLexSp.xml and lmf.dtd (Document Type Definition); 3) Lexical Record files: in subfolder "LR/": 3.1) LR_abr.dsv; 3.2) LR_affix.dsv; 3.3) LR_n_v.dsv; 3.4) LR_adj_n.dsv; 4) Spacy lemmatizer (in subfolder "spacy_lemmatizer/"): lemmatizer.py 5) Stanza lemmatizer (in subfolder "stanza_lemmatizer/"): ancora-medlexsp.pt See more information about the format below. Companion code and files can be found in the github repository: https://github.com/lcampillos/MedLexSp |
URI: | http://hdl.handle.net/10261/270429 | DOI: | https://doi.org/10.20350/digitalCSIC/14656 |
Aparece en las colecciones: | (CCHS-ILLA) Conjuntos de datos |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Medical_Lexicon_Spanish_MedLexSp.pdf | Instructions | 3,2 kB | Adobe PDF | Visualizar/Abrir |
MedLexSp_License_2022.pdf | Usage license | 142,28 kB | Adobe PDF | Visualizar/Abrir |
README.txt | 10,86 kB | Text | Visualizar/Abrir |
CORE Recommender
Page view(s)
469
checked on 19-abr-2024
Download(s)
229
checked on 19-abr-2024
Google ScholarTM
Check
Altmetric
Altmetric
NOTA: Los ítems de Digital.CSIC están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.