Angeliki Metallinou, Sungbok Lee, Shrikanth Narayanan: “Decision Level Combination of Multiple Modalities for Recognition and Analysis of Emotional Expression”

March 14, 2010 | Dallas, TX

Speaker: Angeliki Metallinou, Sungbok Lee, Shrikanth Narayanan
Host: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing

Emotion is expressed and perceived through multiple modalities. In this work, we model face, voice and head movement cues for emotion recognition and we fuse classifiers using a Bayesian framework. The facial classifier is the best performing followed by the voice and head classifiers and the multiple modalities seem to carry complementary information, especially for happiness. Decision fusion significantly increases the average total unweighted accuracy, from 55% to about 62%. Overall, we achieve average accuracy on the order of 65-75% for emotional states and 30-40% for neutral state using a large multi-speaker, multimodal database. Performance analysis for the case of anger and neutrality suggests a positive correlation between the number of classifiers that performed well and the perceptual salience of the expressed emotion. Index Terms: Multimodal Emotion Recognition, Hidden Markov Model, Bayesian Information Fusion, Perceptual Salience