People who observe a multi-agent team can often provide valuable information to the agents based on their superior cognitive abilities to interpret sequences of observations and assess the overall situation. The knowledge they possess is often difficult to be fully represent using a formal model such as DEC-POMDP. To deal with this, we propose an extension of the DEC-POMDP that allows states to be partially specified and benefit from expert knowledge, while preserving the partial observability and decentralized operation of the agents. In particular, we present an algorithm for computing policies based on history samples that include human labeled data in the form of reward reshaping. We also consider ways to minimize the burden on human experts during the labeling phase. The results offer the first approach to incorporating human knowledge in such complex multi-agent settings. We demonstrate the benefits of our approach using a disaster recovery scenario, comparing it to several baseline approaches.
» Read on@inproceedings{WZJdai19,
address = {Beijing, China},
author = {Feng Wu and Shlomo Zilberstein and Nicholas R. Jennings},
booktitle = {Proceedings of the 1st International Conference on Distributed Artificial Intelligence (DAI)},
doi = {10.1145/3356464.3357699},
month = {October},
pages = {1-8},
title = {Stochastic Multi-Agent Planning with Partial State Models},
year = {2019}
}