William Yang Wang, Kallirroi Georgila: “Automatic Detection of Unnatural Word-Level Segments in Unit-Selection Speech Synthesis”

December 12, 2011 | Waikoloa, Hawaii

Speaker: William Yang Wang, Kallirroi Georgila
Host: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2011

We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF-IDF), and we report comparative results between different feature types and their combinations. We also compare three modeling methods based on Support Vector Machines (SVMs), Random Forests, and Conditional Random Fields (CRFs). We then discuss our results and present a comprehensive error analysis.