Learning to Teach in Cooperative Multiagent Reinforcement Learning
We present a framework and algorithm for peer-to-peer teaching in cooperative
multiagent reinforcement learning. Our algorithm, Learning to Coordinate and
Teach Reinforcement (LeCTR), trains advising policies by using students'
learning progress as a teaching reward. Agents using LeCTR learn to assume the
role of a teacher or student at the appropriate moments, exchanging action
advice to accelerate the entire learning process. Our algorithm supports
teaching heterogeneous teammates, advising under communication constraints, and
learns both what and when to advise. LeCTR is demonstrated to outperform the
final performance and rate of learning of prior teaching methods on multiple
benchmark domains. To our knowledge, this is the first approach for learning to
teach in a multiagent setting.