This dissertation proposes a robust acoustic model adaptation for automatic speech recognition (ASR) systems using continuous density hidden Markov models (CDHMMs) in unknown environments. We focus on the environmental adaptation method for practical ...
This dissertation proposes a robust acoustic model adaptation for automatic speech recognition (ASR) systems using continuous density hidden Markov models (CDHMMs) in unknown environments. We focus on the environmental adaptation method for practical applications using linear spectral transformation (LST). In the cepstral domain, it is difficult to handle environmental noise. LST deals with such noise in the linear spectral domain and it is able to operate with a relatively small number of transformation parameters. Hence, we can develop rapid adaptation approaches using a small amount of adaptation data; maximum likelihood (ML)-LST mean transformation, ML-LST mean and variance transformation, and maximum mutual information (MMI)-LST. However, these approaches require computational iteration procedures to obtain transformation parameters. To reduce the computational complexity and obtain real-time adaptation for practical applications, we exploit an approximate objective function of ML and propose a closed-form solution of ML-LST (CML-LST). To avoid erroneous transcription in adaptation, a lattice-based confidence measure is analyzed and this method is applied to the CML-LST for unsupervised adaptation. In addition, we propose an incremental CML-LST adaptation technique which accumulates the data statistics and achieves consistently improved performance during repeated attempts at adaptations. Finally, we combine the proposed methods and gain promising results by examining the TIMIT / FFMTIMIT and Aurora4 evaluations.