Use of Model Transformations for Distributed Speech Recognition (bibtex)
by Srinivasamurthy, Naveen, Narayanan, Shrikanth and Ortega, Antonio
Abstract:
Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech - either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition performance. The diversity of the applications using DSR implies that a variety of speech encoders can be used to compress speech. By treating the encoder variability as a mismatch we propose using model transformation to reduce the speech recognition performance degradation. The advantage of using model transformation is that only a single model set needs to be trained at the server, which can be adapted on the fly to the input speech data. We were able to reduce the word error rate by 61.9%, 63.3% and 56.3% for MELP, GSM and MFCC-encoded data, respectively, by using MAP adaptation, which shows the generality of our proposed scheme.
Reference:
Use of Model Transformations for Distributed Speech Recognition (Srinivasamurthy, Naveen, Narayanan, Shrikanth and Ortega, Antonio), In 4th ISCA Tutorial and Research Workshop on Speech Synthesis, 2001.
Bibtex Entry:
@inproceedings{srinivasamurthy_use_2001,
	address = {Sophia Antipolis, France},
	title = {Use of {Model} {Transformations} for {Distributed} {Speech} {Recognition}},
	url = {http://ict.usc.edu/pubs/Use%20of%20Model%20Transformations%20for%20Distributed%20Speech%20Recognition.pdf},
	abstract = {Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech - either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition performance. The diversity of the applications using DSR implies that a variety of speech encoders can be used to compress speech. By treating the encoder variability as a mismatch we propose using model transformation to reduce the speech recognition performance degradation. The advantage of using model transformation is that only a single model set needs to be trained at the server, which can be adapted on the fly to the input speech data. We were able to reduce the word error rate by 61.9\%, 63.3\% and 56.3\% for MELP, GSM and MFCC-encoded data, respectively, by using MAP adaptation, which shows the generality of our proposed scheme.},
	booktitle = {4th {ISCA} {Tutorial} and {Research} {Workshop} on {Speech} {Synthesis}},
	author = {Srinivasamurthy, Naveen and Narayanan, Shrikanth and Ortega, Antonio},
	year = {2001},
	pages = {113--116}
}
Powered by bibtexbrowser