Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages (bibtex)
by Amir Zadeh, Rowan Zellers, Eli Pincus, Louis-Philippe Morency
Abstract:
People share their opinions, stories, and reviews through online video sharing websites every day. The automatic analysis of these online opinion videos is bringing new or understudied research challenges to the field of computational linguistics and multimodal analysis. Among these challenges is the fundamental question of exploiting the dynamics between visual gestures and verbal messages to be able to better model sentiment. This article addresses this question in four ways: introducing the first multimodal dataset with opinion-level sentiment intensity annotations; studying the prototypical interaction patterns between facial gestures and spoken words when inferring sentiment intensity; proposing a new computational representation, called multimodal dictionary, based on a language-gesture study; and evaluating the authors' proposed approach in a speaker-independent paradigm for sentiment intensity prediction. The authors' study identifies four interaction types between facial gestures and verbal content: neutral, emphasizer, positive, and negative interactions. Experiments show statistically significant improvement when using multimodal dictionary representation over the conventional early fusion representation (that is, feature concatenation).
Reference:
Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages (Amir Zadeh, Rowan Zellers, Eli Pincus, Louis-Philippe Morency), In IEEE Intelligent Systems, volume 31, 2016.
Bibtex Entry:
@article{zadeh_multimodal_2016,
	title = {Multimodal sentiment intensity analysis in videos: {Facial} gestures and verbal messages},
	volume = {31},
	issn = {1541-1672},
	url = {http://ieeexplore.ieee.org/abstract/document/7742221/},
	doi = {10.1109/MIS.2016.94},
	abstract = {People share their opinions, stories, and reviews through online video sharing websites every day. The automatic analysis of these online opinion videos is bringing new or understudied research challenges to the field of computational linguistics and multimodal analysis. Among these challenges is the fundamental question of exploiting the dynamics between visual gestures and verbal messages to be able to better model sentiment. This article addresses this question in four ways: introducing the first multimodal dataset with opinion-level sentiment intensity annotations; studying the prototypical interaction patterns between facial gestures and spoken words when inferring sentiment intensity; proposing a new computational representation, called multimodal dictionary, based on a language-gesture study; and evaluating the authors' proposed approach in a speaker-independent paradigm for sentiment intensity prediction. The authors' study identifies four interaction types between facial gestures and verbal content: neutral, emphasizer, positive, and negative interactions. Experiments show statistically significant improvement when using multimodal dictionary representation over the conventional early fusion representation (that is, feature concatenation).},
	number = {6},
	journal = {IEEE Intelligent Systems},
	author = {Zadeh, Amir and Zellers, Rowan and Pincus, Eli and Morency, Louis-Philippe},
	month = nov,
	year = {2016},
	keywords = {Virtual Humans},
	pages = {82--88}
}
Powered by bibtexbrowser