Audio-Visual Emotion Recognition using Gaussian Mixture Models for Face and Voice (bibtex)
by Metallinou, Angeliki, Lee, Sungbok and Narayanan, Shrikanth
Abstract:
Emotion expression associated with human communica- tion is known to be a multimodal process. In this work, we investigate the way that emotional information is conveyed by facial and vocal modalities, and how these modalities can be effectively combined to achieve improved emotion recognition accuracy. In particular, the behaviors of differ- ent facial regions are studied in detail. We analyze an emo- tion database recorded from ten speakers (five female, five male), which contains speech and facial marker data. Each individual modality is modeled by Gaussian Mixture Mod- els (GMMs). Multiple modalities are combined using two different methods: a Bayesian classifier weighting scheme and support vector machines that use post classification ac- curacies as features. Individual modality recognition per- formances indicate that anger and sadness have compara- ble accuracies for facial and vocal modalities, while happi- ness seems to be more accurately transmitted by facial ex- pressions than voice. The neutral state has the lowest per- formance, possibly due to the vague definition of neutral- ity. Cheek regions achieve better emotion recognition ac- curacy compared to other facial regions. Moreover, classi- fier combination leads to significantly higher performance, which confirms that training detailed single modality clas- sifiers and combining them at a later stage is an effective approach.
Reference:
Audio-Visual Emotion Recognition using Gaussian Mixture Models for Face and Voice (Metallinou, Angeliki, Lee, Sungbok and Narayanan, Shrikanth), In Proceedings of the IEEE International Symposium on Multimedia, 2008.
Bibtex Entry:
@inproceedings{metallinou_audio-visual_2008,
	address = {Berkeley, CA},
	title = {Audio-{Visual} {Emotion} {Recognition} using {Gaussian} {Mixture} {Models} for {Face} and {Voice}},
	url = {http://ict.usc.edu/pubs/Audio-Visual%20Emotion%20Recognition%20using%20Gaussian%20Mixture%20Models%20for%20Face%20and%20Voice.pdf},
	abstract = {Emotion expression associated with human communica- tion is known to be a multimodal process. In this work, we investigate the way that emotional information is conveyed by facial and vocal modalities, and how these modalities can be effectively combined to achieve improved emotion recognition accuracy. In particular, the behaviors of differ- ent facial regions are studied in detail. We analyze an emo- tion database recorded from ten speakers (five female, five male), which contains speech and facial marker data. Each individual modality is modeled by Gaussian Mixture Mod- els (GMMs). Multiple modalities are combined using two different methods: a Bayesian classifier weighting scheme and support vector machines that use post classification ac- curacies as features. Individual modality recognition per- formances indicate that anger and sadness have compara- ble accuracies for facial and vocal modalities, while happi- ness seems to be more accurately transmitted by facial ex- pressions than voice. The neutral state has the lowest per- formance, possibly due to the vague definition of neutral- ity. Cheek regions achieve better emotion recognition ac- curacy compared to other facial regions. Moreover, classi- fier combination leads to significantly higher performance, which confirms that training detailed single modality clas- sifiers and combining them at a later stage is an effective approach.},
	booktitle = {Proceedings of the {IEEE} {International} {Symposium} on {Multimedia}},
	author = {Metallinou, Angeliki and Lee, Sungbok and Narayanan, Shrikanth},
	month = dec,
	year = {2008},
	pages = {250--257}
}
Powered by bibtexbrowser