Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

1980 | DAVIS | IEEE | MERMELSTEIN | PARDO

Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.

Leer en PDF

Notas/Comentarios de José Manuel Pardo:
Los autores proponen y explican con éxito el principal método de extracción de parámetros para el reconocimiento de habla, los parámetros "Mel frequency cepstrum".

Especificaciones

Autor/es: S. Davis; P. Mermelstein.
Fecha: 1980-08
Publicado en: IEEE Transactions on Acoustics, Speech, and Signal Processing (Volume: 28, Issue: 4, Aug 1980, Pages: 357-366).
Idioma: Inglés
Formato: PDF
Contribución: José Manuel Pardo Muñoz.
Palabras clave: Inteligencia computacional y artificial, Ordenadores y tratamiento de la información, Proceso de señal

Especificaciones

Foro Histórico

C/ Almagro 2. 1º Izq. 28010. Madrid

Teléfono 91 391 10 66 coit@coit.es

C/ General Arrando, 38. 28010. Madrid

Teléfono 91 308 16 66 aeit@aeit.es