RISS 검색 - 국내학술지논문 상세보기

부가정보

국문 초록 (Abstract)

본 논문에서는 MSVQ(Multi-Section Vector Quantization)와 시간지연 회귀 신경회로망(TDRNN)을 이용한 하이브리드 구조의 음성인식 방법을 제안한다. MSVQ는 음성의 길이를 일정한 구간 수로 정규화한 코...

본 논문에서는 MSVQ(Multi-Section Vector Quantization)와 시간지연 회귀 신경회로망(TDRNN)을 이용한 하이브리드 구조의 음성인식 방법을 제안한다. MSVQ는 음성의 길이를 일정한 구간 수로 정규화한 코드북을 생성하고, 시간지연 회귀 신경회로망은 이 코드북을 이용하여 음성을 인식한다. 시간지연 회귀 신경회로망은 음성의 시계열 문맥정보를 잘 학습할 수 있는 구조로 구성되었다. 음성특징으로 인지선형예측(PLP) 계수가 사용되었다. 음성인식 실험을 수행한 결과 MSVQ/TDRNN 음성인식기는 97.9 %의 화자독립 음성 인식률을 보였다.

다국어 초록 (Multilingual Abstract)

This paper presents a method for speech recognition using multi-section vector-quantization (MSVQ) and time-delay recurrent neural network (TDTNN). The MSVQ generates the codebook with normalized uniform sections of voice signal, and the TDRNN performs the speech recognition using the MSVQ codebook. The TDRNN is a time-delay recurrent neural network classifier with two different representations of dynamic context: the time-delayed input nodes represent local dynamic context, while the recursive nodes are able to represent long-term dynamic context of voice signal. The cepstral PLP coefficients were used as speech features. In the speech recognition experiments, the MSVQ/TDRNN speech recognizer shows 97.9 % word recognition rate for speaker independent recognition.

참고문헌 (Reference)

1 S. S. Kim, "Time-delay recurrent neural network for temporal correlations and prediction" 20 : 253-263, 1998

2 K. Lippmann, "Reviews of neural networks for speech recognition" 1 : 1-38, 1989

3 H. Hermansky, "Perceptual linear predictive (PLP)analysis of speech" 87 : 1738-1752, 1990

4 D. E. Rumelhart, "Parallel Distributed Processing 1" MIT Press 318-362, 1986

5 A. Waibel, "Modularity and scaling in large phoneme neural networks" 37 : 1188-1197, 1989

6 X. D. Huang, "Hidden Markov Models for Speech Recognition" Edinburgh University Press 1990

7 H. Bourlard, "Connectionist Speech Recognition - A Hybrid Approach" Kluwer 185-200, 1994

8 S. S. Kim, "Automatic recognition of pitch movements using multi-layer prceptron and time-delay recursive neural network" 11 : 645-648, 2004

9 Z. Rong, "An improved multisection vector quantization model with application to Chinese digits recognition" 1 : 749-752, 1996

10 T. Robinson, "An application of recurrent nets to phone probability estimation" 5 : 298-305, 1994