Reinforcement Learning of Multi-Party Trading Dialog Policies (bibtex)
by Takuya Hiraoka, Kallirroi Georgila, Elnaz Nouri, David Traum, Satoshi Nakamura
Abstract:
Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring cases where negotiation takes place between more than two interlocutors. In this paper, as a first study on multi-party negotiation, we apply RL to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. We use Q-learning with linear function approximation, least-squares policy iteration, and neural fitted Q iteration. In addition, to make the learning process more efficient, we introduce an incremental reward function. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Furthermore, we compare the learned dialog policies with two strong hand-crafted baseline dialog policies. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is not a straightforward task and requires a lot of experimentation; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of the two strong hand-crafted baselines.
Reference:
Reinforcement Learning of Multi-Party Trading Dialog Policies (Takuya Hiraoka, Kallirroi Georgila, Elnaz Nouri, David Traum, Satoshi Nakamura), In Transactions of the Japanese Society for Artificial Intelligence, volume 31, 2016.
Bibtex Entry:
@article{hiraoka_reinforcement_2016,
	title = {Reinforcement {Learning} of {Multi}-{Party} {Trading} {Dialog} {Policies}},
	volume = {31},
	issn = {1346-8030},
	url = {https://www.jstage.jst.go.jp/article/tjsai/31/4/31_B-FC1/_pdf},
	abstract = {Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring cases where negotiation takes place between more than two interlocutors. In this paper, as a first study on multi-party negotiation, we apply RL to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. We use Q-learning with linear function approximation, least-squares policy iteration, and neural fitted Q iteration. In addition, to make the learning process more efficient, we introduce an incremental reward function. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Furthermore, we compare the learned dialog policies with two strong hand-crafted baseline dialog policies. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is not a straightforward task and requires a lot of experimentation; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of the two strong hand-crafted baselines.},
	journal = {Transactions of the Japanese Society for Artificial Intelligence},
	author = {Hiraoka, Takuya and Georgila, Kallirroi and Nouri, Elnaz and Traum, David and Nakamura, Satoshi},
	month = sep,
	year = {2016},
	keywords = {UARC, Virtual Humans}
}
Powered by bibtexbrowser