The Effect of Fuzzy Training Targets on Voice Quality Classification (bibtex)
by Scherer, Stefan, Kane, John, Gobl, Christer and Schwenker, Friedhelm
Abstract:
The dynamic use of voice qualities in spoken language can reveal useful information on a speaker’s attitude, mood and affective states. This information may be desirable for a range of speech technology applications. However, annotation of voice quality may frequently be inconsistent across raters. But whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. These features are used as inputs to a fuzzy-input fuzzy-output support vector machine (F2 SVM) algorithm, to automatically classify the voice qualities. The F2 SVM is compared to standard approaches and shows promising results. Performances for cross validation, leave one speaker out, and cross corpus experiments of around 90% are achieved.
Reference:
The Effect of Fuzzy Training Targets on Voice Quality Classification (Scherer, Stefan, Kane, John, Gobl, Christer and Schwenker, Friedhelm), In Workshop on Multimodal Pattern Recognition of Social Signals in Human Computer Interaction, 2012.
Bibtex Entry:
@inproceedings{scherer_effect_2012,
	address = {Tsukuba Science City, Japan},
	title = {The {Effect} of {Fuzzy} {Training} {Targets} on {Voice} {Quality} {Classification}},
	url = {http://ict.usc.edu/pubs/The%20Effect%20of%20Fuzzy%20Training%20Targets%20on%20Voice%20Quality%20Classification.pdf},
	abstract = {The dynamic use of voice qualities in spoken language can reveal useful information on a speaker’s attitude, mood and affective states. This information may be desirable for a range of speech technology applications. However, annotation of voice quality may frequently be inconsistent across raters. But whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. These features are used as inputs to a fuzzy-input fuzzy-output support vector machine (F2 SVM) algorithm, to automatically classify the voice qualities. The F2 SVM is compared to standard approaches and shows promising results. Performances for cross validation, leave one speaker out, and cross corpus experiments of around 90\% are achieved.},
	booktitle = {Workshop on {Multimodal} {Pattern} {Recognition} of {Social} {Signals} in {Human} {Computer} {Interaction}},
	author = {Scherer, Stefan and Kane, John and Gobl, Christer and Schwenker, Friedhelm},
	month = nov,
	year = {2012},
	keywords = {Virtual Humans}
}
Powered by bibtexbrowser