Class acoustic models in automatic speech recognition

Word pronunciation variability is one of the basic problems in automatic speech recognition (ASR). There is a significant performance degradation of a system trained on all types of data nondiscriminatively when the test signal differs noticeably from the speaker-independent (SI) acoustic model (AM). We explore a class recognition system with minimum phone error (MPE) training. Class models provide an improvement in recognition quality compared to SI models. For melfrequency cepstral coefficients (MFCC) the phone accuracy increase is 2-5 % depending on the number of classes, for split context posterior estimator (left and right context - LCRC) it is 1-3 %. Index Terms: class acoustic models, speaker adaptation.