English   español  
Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/130239
Compartir / Impacto:
Estadísticas
Add this article to your Mendeley library MendeleyBASE
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL
Título

An information retrieval approach to document sanitization

AutorNettleton, David F.; Abril, Daniel
Palabras claveWikileaks cables
Disclosure risk
Information loss
Queries
Search engine
Information retrieval
Document sanitization
Privacy
Fecha de publicación2015
EditorSpringer
CitaciónStudies in Computational Intelligence 567: 151- 166 (2015)
ResumenIn this paper we use information retrieval metrics to evaluate the effect of a document sanitization process, measuring information loss and risk of disclosure. In order to sanitize the documents we have developed a semi-automatic anonymiza-tion process following the guidelines of Executive Order 13526 (2009) of the US Administration. It embodies two main and independent steps: (i) identifying and anonymizing specific person names and data, and (ii) concept generalization based on WordNet categories, in order to identify words categorized as classified. Finally, we manually revise the text from a contextual point of view to eliminate complete sentences, paragraphs and sections, where necessary. For empirical tests, we use a subset of the Wikileaks Cables, made up of documents relating to five key news items which were revealed by the cables. © Springer International Publishing Switzerland 2015
URIhttp://hdl.handle.net/10261/130239
DOI10.1007/978-3-319-09885-2_9
Identificadoresdoi: 10.1007/978-3-319-09885-2_9
issn: 1860-949X
Aparece en las colecciones: (IIIA) Libros y partes de libros
Ficheros en este ítem:
Fichero Descripción Tamaño Formato  
accesoRestringido.pdf15,38 kBAdobe PDFVista previa
Visualizar/Abrir
Mostrar el registro completo
 


NOTA: Los ítems de Digital.CSIC están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.