A Practical and Configurable Lip Sync Method for Games (bibtex)
by Xu, Yuyu, Feng, Andrew W., Marsella, Stacy C. and Shapiro, Ari
Abstract:
We demonstrate a lip animation (lip sync) algorithm for real-time applications that can be used to generate synchronized facial movements with audio generated from natural speech or a text-to-speech engine. Our method requires an animator to construct animations using a canonical set of visemes for all pairwise combinations of a reduced phoneme set (phone bigrams). These animations are then stitched together to construct the final animation, adding velocity and lip-pose constraints. This method can be applied to any character that uses the same, small set of visemes. Our method can operate efficiently in multiple languages by reusing phone bigram animations that are shared among languages, and specific word sounds can be identified and changed on a per-character basis. Our method uses no machine learning, which offers two advantages over techniques that do: 1) data can be generated for non-human characters whose faces can not be easily retargeted from a human speaker’s face, and 2) the specific facial poses or shapes used for animation can be specified during the setup and rigging stage, and before the lip animation stage, thus making it suitable for game pipelines or circumstances where the speech targets poses are predetermined, such as after acquisition from an online 3D marketplace.
Reference:
A Practical and Configurable Lip Sync Method for Games (Xu, Yuyu, Feng, Andrew W., Marsella, Stacy C. and Shapiro, Ari), In ACM SIGGRAPH Motion in Games, 2013.
Bibtex Entry:
@inproceedings{xu_practical_2013,
	address = {Dublin, Ireland},
	title = {A {Practical} and {Configurable} {Lip} {Sync} {Method} for {Games}},
	url = {http://ict.usc.edu/pubs/A%20Practical%20and%20Configurable%20Lip%20Sync%20Method%20for%20Games.pdf},
	abstract = {We demonstrate a lip animation (lip sync) algorithm for real-time applications that can be used to generate synchronized facial movements with audio generated from natural speech or a text-to-speech engine. Our method requires an animator to construct animations using a canonical set of visemes for all pairwise combinations of a reduced phoneme set (phone bigrams). These animations are then stitched together to construct the final animation, adding velocity and lip-pose constraints. This method can be applied to any character that uses the same, small set of visemes. Our method can operate efficiently in multiple languages by reusing phone bigram animations that are shared among languages, and specific word sounds can be identified and changed on a per-character basis. Our method uses no machine learning, which offers two advantages over techniques that do: 1) data can be generated for non-human characters whose faces can not be easily retargeted from a human speaker’s face, and 2) the specific facial poses or shapes used for animation can be specified during the setup and rigging stage, and before the lip animation stage, thus making it suitable for game pipelines or circumstances where the speech targets poses are predetermined, such as after acquisition from an online 3D marketplace.},
	booktitle = {{ACM} {SIGGRAPH} {Motion} in {Games}},
	author = {Xu, Yuyu and Feng, Andrew W. and Marsella, Stacy C. and Shapiro, Ari},
	month = nov,
	year = {2013},
	keywords = {Virtual Humans, UARC}
}
Powered by bibtexbrowser