In this paper, acoustic modeling and OOV rejection method were studied for Korean vocabulary-independent speech recognizer.
To accurately model the phoneme, triphone was used and state-tying method was introduced for robust modeling with limited spee...
In this paper, acoustic modeling and OOV rejection method were studied for Korean vocabulary-independent speech recognizer.
To accurately model the phoneme, triphone was used and state-tying method was introduced for robust modeling with limited speech corpus. The problem of unseen model which appears in recognition phase but not in training phase was solved with Tree-Based Clustering that is one of top-down methodologies. In TBC, several phonetic question sets were organized and the best recognition result was achieved with question set that includes versatile phonetic question and excludes monophone-based question. Therefore, phonetic question set for TBC must include various phonetic phenomena and doesn't have to include monophone-based question.
By measuring the confidence of recognized result, OOV rejection experiment was conducted. Two different methods were compared. One was based on utterance-level LLR and the other was based on frame-level LLR. For utterance-level OOV rejection experiment, best and 2^(nd) best result were used to get LLR. By normalizing the result with the length of utterance, better result was obtained. In comparison to the utterance-level OOV rejection, frame-level OOV rejection showed the better performance. In frame-level OOV rejection, filler model made from CI models was used for alternate hypothesis and the number of clusters that constitute filler model was varied. With filler model composed of two clusters, EER of 0.5% was achieved. This amounts to rejecting one IV word and accepting one OOV word out of 200 words.
For future study, a novel method should be investigated for improved acoustic modeling. And for the OOV rejection, anti-modeling based on discriminative training method also should be tried. Additionally, for real-field application noise processing and keyword spotting also have to be implemented.