Traditional reinforcement learning algorithms require lots of online interaction with the environment for training and cannot effectively adapt to changes in the task environment, making them difficult to apply to real-world problems. Offline meta-reinforcement learning provides an effective way to quickly adapt to a new task by using replay datasets of multiple tasks for offline policy learning. Applying offline meta-reinforcement learning to complex tasks will face two challenges. Firstly, reinforcement learning algorithms overestimate the value of state-action pairs not contained in the dataset and thus select non-optimal actions, resulting in poor performance. Secondly, meta-reinforcement learning algorithms need not only to learn the policy but also to have robust and efficient task inference capabilities. To address the above problems, this paper proposes an offline meta-reinfor-cement learning algorithm based on contrastive prediction. To cope with the problem of overestimation of value functions, the proposed algorithm uses behavior cloning to encourage policy to prefer actions included in the dataset. To improve the task inference capability of meta-learning, the proposed algorithm uses recurrent neural networks for task inference on the contextual trajectories of the agents and uses contrastive learning and prediction networks to analyze and distinguish potential structures in different task trajectories. Experimental results show that the agents trained by the proposed algorithm can score more than 25 percentage points when faced with unseen tasks, and it has higher meta-training efficiency and better generalization performance compared with existing methods.
» Read on@article{HWjfcst23,
author = {Xu Han and Feng Wu},
doi = {10.3778/j.issn.1673-9418.2203074},
journal = {Journal of Frontiers of Computer Science and Technology (JFCST)},
number = {8},
pages = {1917-1927},
title = {Offline Meta-Reinforcement Learning with Contrastive Prediction},
volume = {17},
year = {2023}
}