Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/167040
COMPARTIR / EXPORTAR:
logo share SHARE logo core CORE BASE
Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL | DATACITE

Invitar a revisión por pares abierta
Campo DC Valor Lengua/Idioma
dc.contributor.authorColomé, Adrià-
dc.contributor.authorTorras, Carme-
dc.date.accessioned2018-06-27T06:19:28Z-
dc.date.available2018-06-27T06:19:28Z-
dc.date.issued2017-
dc.identifierdoi: 10.1109/TRO.2017.2679202-
dc.identifierissn: 1552-3098-
dc.identifiere-issn: 1941-0468-
dc.identifier.citationIEEE Transactions on Robotics 33(4): 978-985 (2017)-
dc.identifier.urihttp://hdl.handle.net/10261/167040-
dc.description.abstractPolicy search (PS) algorithms are widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this paper, we propose a generalization of the relative entropy policy search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named dual REPS (DREPS) following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering that there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems. We first tested our proposed approach in a simulated reinforcement learning setting and found that DREPS considerably speeds up the learning process, especially during the early optimization steps and in cases where other approaches get trapped in between several alternative maxima. Further experiments in which a real robot had to learn a task with a multimodal reward function confirm the advantages of our proposed approach with respect to REPS.-
dc.description.sponsorshipThis work is partially funded by Spanish project RoboInstruct (TIN2014-58178-R) and CSIC project MANIPlus (201350E102). Adria Colome is also supported by the Spanish Ministry of Education, Culture and Sport via a FPU doctoral grant (AP2010-1989).-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relationinfo:eu-repo/grantAgreement/MINECO/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2014-58178-R-
dc.relation.isversionofPostprint-
dc.rightsopenAccess-
dc.titleDual REPS: A generalization of relative entropy policy search exploiting bad experiences-
dc.typeartículo-
dc.identifier.doi10.1109/TRO.2017.2679202-
dc.relation.publisherversionhttps://doi.org/10.1109/TRO.2017.2679202-
dc.date.updated2018-06-27T06:19:29Z-
dc.description.versionPeer Reviewed-
dc.language.rfc3066eng-
dc.contributor.funderMinisterio de Educación, Cultura y Deporte (España)-
dc.contributor.funderConsejo Superior de Investigaciones Científicas (España)-
dc.contributor.funderMinisterio de Economía y Competitividad (España)-
dc.relation.csic-
dc.identifier.funderhttp://dx.doi.org/10.13039/501100003176es_ES
dc.identifier.funderhttp://dx.doi.org/10.13039/501100003339es_ES
dc.identifier.funderhttp://dx.doi.org/10.13039/501100003329es_ES
dc.type.coarhttp://purl.org/coar/resource_type/c_6501es_ES
item.fulltextWith Fulltext-
item.grantfulltextopen-
item.cerifentitytypePublications-
item.openairetypeartículo-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
Aparece en las colecciones: (IRII) Artículos
Ficheros en este ítem:
Fichero Descripción Tamaño Formato
DualREPS.pdf1,77 MBAdobe PDFVista previa
Visualizar/Abrir
Show simple item record

CORE Recommender

SCOPUSTM   
Citations

5
checked on 24-abr-2024

WEB OF SCIENCETM
Citations

4
checked on 25-feb-2024

Page view(s)

228
checked on 24-abr-2024

Download(s)

199
checked on 24-abr-2024

Google ScholarTM

Check

Altmetric

Altmetric


NOTA: Los ítems de Digital.CSIC están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.