In automatic speech recognition, decision trees are widely used in an unsupervised fashion to cluster and classify the states of HMMs (hidden Markov models). Decision tree-based clustering uses prior knowledge of a language for its attribute set, whic...
In automatic speech recognition, decision trees are widely used in an unsupervised fashion to cluster and classify the states of HMMs (hidden Markov models). Decision tree-based clustering uses prior knowledge of a language for its attribute set, which is called a phonetic attribute set or question set. The phonetic attribute set, which contains questions about the context of current speech sound, is usually provided by linguistic or phonetic experts. The knowledge-based phonetic attribute set, however, has some drawbacks in terms of its quality and may degrade the homogeneity of the cluster generated based on it. Also, it is inconvenient because we have to get help from human experts every time the language of the PLU (phone-like unit) of a recognition system is changed. In this thesis, therefore, problems of the knowledge-based attribute set are analyzed and a novel approach to producing a data-driven phonetic attribute set is proposed. Since the proposed method generates the attributes using backed-off HMMs, it may enhance the quality of the attributes and remove the inconvenience associated with the manual labor involved. In large vocabulary speech recognition experiments, it was found the proposed algorithms reduce the error rate by 4.0% for the TIMIT English corpus and 14.3% for the Korean corpus.