English   español  
Por favor, use este identificador para citar o enlazar a este item: http://hdl.handle.net/10261/167040
COMPARTIR / IMPACTO:
Estadísticas
logo share SHARE logo core CORE   Add this article to your Mendeley library MendeleyBASE

Visualizar otros formatos: MARC | Dublin Core | RDF | ORE | MODS | METS | DIDL
Exportar a otros formatos:
Título

Dual REPS: A generalization of relative entropy policy search exploiting bad experiences

AutorColomé, Adrià ; Torras, Carme
Fecha de publicación2017
EditorInstitute of Electrical and Electronics Engineers
CitaciónIEEE Transactions on Robotics 33(4): 978-985 (2017)
ResumenPolicy search (PS) algorithms are widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this paper, we propose a generalization of the relative entropy policy search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named dual REPS (DREPS) following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering that there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems. We first tested our proposed approach in a simulated reinforcement learning setting and found that DREPS considerably speeds up the learning process, especially during the early optimization steps and in cases where other approaches get trapped in between several alternative maxima. Further experiments in which a real robot had to learn a task with a multimodal reward function confirm the advantages of our proposed approach with respect to REPS.
Versión del editorhttps://doi.org/10.1109/TRO.2017.2679202
URIhttp://hdl.handle.net/10261/167040
Identificadoresdoi: 10.1109/TRO.2017.2679202
issn: 1552-3098
e-issn: 1941-0468
Aparece en las colecciones: (IRII) Artículos
Ficheros en este ítem:
Fichero Descripción Tamaño Formato  
DualREPS.pdf1,77 MBAdobe PDFVista previa
Visualizar/Abrir
Mostrar el registro completo
 


NOTA: Los ítems de Digital.CSIC están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.