Dongrui Wu, Thomas Parsons, Emily Mower, Shrikanth Narayanan:

July 19, 2010 | Singapore

Speaker: Dongrui Wu, Thomas Parsons, Emily Mower, Shrikanth Narayanan
Host: IEEE International Conference on Multimedia & Expo

Speech processing is an important aspect of affective computing. Most research in this direction has focused on classifying emotions into a small number of categories. However, numerical represen- tations of emotions in a multi-dimensional space can be more ap- propriate to reflect the gradient nature of emotion expressions, and can be more convenient in the sense of dealing with a small set of emotion primitives. This paper presents three approaches (ro- bust regression, support vector regression, and locally linear re- construction) for emotion primitives estimation in 3D space (va- lence/activation/dominance), and two approaches (average fusion and locally weighted fusion) to fuse the three elementary estimators for better overall recognition accuracy. The three elementary esti- mators are diverse and complementary because they cover both lin- ear and nonlinear models, and both global and local models. These five approaches are compared with the state-of-the-art estimator on the same spontaneously elicited emotion dataset. Our results show that all of our three elementary estimators are suitable for speech emotion estimation. Moreover, it is possible to boost the estimation performance by fusing them properly since they appear to leverage complementary speech features.