University of Southern California

Programmer/Research Assistant Intern 0930

January 08, 2009

Project Name

Conversational Speech Synthesizer


Project Description

Current state of the art speech synthesizers are understandable but cannot engage in conversational speech. They tend to sound like read speech or pre-planned monologues rather than someone deciding what to say while speaking. They are also not well suited to highly interactive speech in which the speech quality is influenced by monitoring the listener(s). In this project we will address some of these issues, in particular creating “vocal gestures” such as laughs, breath sounds, and filled pauses such as “um” and “ah” and integration of these within a speech synthesizer to produce speech that sounds more like someone in a spontaneous conversation.


Job Description

The intern will work with the ICT dialogue team to help specify an initial set of target behaviors to examine an API for conversational speech so that a dialogue agent can specify when to trigger vocal gestures and conversational behaviors. The intern will work independently to modify a speech synthesizer and insert the behaviors and an ability to trigger them according to the API. The intern will help evaluate the resulting synthesis by comparing two versions of a conversational character: one that uses the unmodified synthesizer and one that uses the modified synthesizer.

Skill set:
• Familiarity with a current speech synthesizers (e.g., Festival, Cepstral, Cerevoice, …) and the ability to add conversational behaviors to it
• Experience with analysis of spoken language
• Programming ability sufficient to help develop APIs to call the new speech capabilities from the dialogue components in ICT virtual humans


Back to 2009 Internship list.

Back to Application Form