ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search
In this paper, we propose an actor ensemble algorithm, named ACE, for
continuous control with a deterministic policy in reinforcement learning. In
ACE, we use actor ensemble (i.e., multiple actors) to search the global maxima
of the critic. Besides the ensemble perspective, we also formulate ACE in the
option framework by extending the option-critic architecture with deterministic
intra-option policies, revealing a relationship between ensemble and options.
Furthermore, we perform a look-ahead tree search with those actors and a
learned value prediction model, resulting in a refined value estimation. We
demonstrate a significant performance boost of ACE over DDPG and its variants
in challenging physical robot simulators.