An Unsupervised Approach to Glottal Inverse Filtering (bibtex)
by Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefen Scherer
Abstract:
The extraction of the glottal volume velocity waveform from voiced speech is a well-known example of a sparse signal recovery problem. Prior approaches have mostly used wellengineered speech processing or convex L1-optimization methods to solve the inverse filtering problem. In this paper, we describe a novel approach to modeling the human vocal tract using an unsupervised dictionary learning framework. We make the assumption of an all-pole model of the vocal tract, and derive an L1 regularized least squares loss function for the all-pole approximation. To evaluate the quality of the extracted glottal volume velocity waveform, we conduct experiments on real-life speech datasets, which include vowels and multi-speaker phonetically balanced utterances. We find that the the unsupervised model learns meaningful dictionaries of vocal tracts, and the proposed data-driven unsupervised framework achieves a performance comparable to the IAIF (Iterative Adaptive Inverse Filtering) glottal flow extraction approach.
Reference:
An Unsupervised Approach to Glottal Inverse Filtering (Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefen Scherer), In Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO), 2016.
Bibtex Entry:
@inproceedings{ghosh_unsupervised_2016,
	address = {Budapest, Hungary},
	title = {An {Unsupervised} {Approach} to {Glottal} {Inverse} {Filtering}},
	url = {http://www.eurasip.org/Proceedings/Eusipco/Eusipco2016/papers/1570252319.pdf},
	abstract = {The extraction of the glottal volume velocity waveform from voiced speech is a well-known example of a sparse signal recovery problem. Prior approaches have mostly used wellengineered speech processing or convex L1-optimization methods to solve the inverse filtering problem. In this paper, we describe a novel approach to modeling the human vocal tract using an unsupervised dictionary learning framework. We make the assumption of an all-pole model of the vocal tract, and derive an L1 regularized least squares loss function for the all-pole approximation. To evaluate the quality of the extracted glottal volume velocity waveform, we conduct experiments on real-life speech datasets, which include vowels and multi-speaker phonetically balanced utterances. We find that the the unsupervised model learns meaningful dictionaries of vocal tracts, and the proposed data-driven unsupervised framework achieves a performance comparable to the IAIF (Iterative Adaptive Inverse Filtering) glottal flow extraction approach.},
	booktitle = {Proceedings of the 2016 24th {European} {Signal} {Processing} {Conference} ({EUSIPCO})},
	author = {Ghosh, Sayan and Laksana, Eugene and Morency, Louis-Philippe and Scherer, Stefen},
	month = sep,
	year = {2016},
	keywords = {Virtual Humans, UARC}
}
Powered by bibtexbrowser