Lixing Huang, Louis-Philippe Morency, Jonathan Gratch: “A Multimodal End-of-Turn Prediction Model: Learning from Parasocial Consensus Sampling”

May 7, 2011 | Toronto, Ontario

Speaker: Lixing Huang, Louis-Philippe Morency, Jonathan Gratch
Host: 10th International Conference on Autonomous Agents and Multiagent System

Virtual humans, with realistic behaviors and increasingly human-like social skills, evoke in users a range of social behaviors normally only seen in human face-to-face interactions. One of the key challenges in creating such virtual humans is giving them human-like conversational skills. Traditional conversational virtual humans usually make turn-taking decisions depending on explicit cues, such as “press-to-talk buttons”, from the human users. In contrast, people decide when to take turns by observing their conversational partner’s behavior. In this paper, we present a multimodal end-of-turn prediction model. Instead of recording face-to-face conversations, we collect the turn-taking data using Parasocial Consensus Sampling (PCS) framework, where participants are guided to interact with media representation of people parasocially. Then, we analyze the relationship between verbal and nonverbal features and turn-taking behavior using the consensus data and show how these features influence the time people use to take turns. Finally, we present a probabilistic multimodal end-of-turn prediction model learned from the consensus data, which enables virtual humans to make real-time turn-taking predictions. The evaluation results show that our model achieves a high accuracy and takes human-like pauses, in terms of length, before taking its turns. Our work demonstrates the validity of Parasocial Consensus Sampling and generalizes this framework to model turn-taking behavior.