http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
차원별 Eigenvoice와 화자적응 모드 선택에 기반한 고속화자적응 성능 향상
송화전,이윤근,김형순 한국음향학회 2003 韓國音響學會誌 Vol.22 No.1
Eigenvoice 방법은 고속화자적응에 적합하다고 알려져 있지만, 이 방법은 발화수가 증가하더라도 추가적인 인식성능향상이 이루어지지 않는 단점이 있다. 본 논문에서는 이 문제를 해결하기 위해 음성 특징벡터의 차원별로 eigenvoice의 가중치를 구하여 적응시키는 방법과 또한 적응 데이터 수에 따라 높은 인식률을 얻는 적응 방식을 선택하는 방식을 제안한다. 화자독립모델 및 eigenvoice들을 구성하기 위해 POW (Phonetically Optimized Words)데이터베이스를 사용하였으며, PBW(Phonetically Balanced Words) 452단어 중50개까지 발화 수를 변화시키면서 교사방식 (Supervised mode)로 적응에 사용하고 나머지 중 400개를 인식실험에 사용하였다. 차원별 eigenvoice 방법이 발화수가 증가함에 따라 기존의 eigenvoice 나 MLLR 방법보다 높은 성능을 보였으며, eigenvoice와 차원별 eigenvoice방법 사이의 적응 모드 선택을 통해 기존의 eigenvoice 방식에 비해 최고 26%의 단어 오인식률 감소를 얻었다. Eigenvoice method is known to be adequate for fast speaker adaptation, but it hardly shows additional improvement with increased amount of adaptation data. In this paper, to deal with this problem, we propose a modified method estimating the weights of eigenvoices in each feature vector dimension. We also propose an adaptation mode selection scheme that one method with higher performance among several adaptation methods is selected according to the amount of adaptation data. We used POW DB to construct the speaker independent model and eigenvoices, and utterances(ranging from 1 to 50) from PBW 452 DB and the remaining 400 utterances were used for adaptation and evaluation, respectively. With the increased amount of adaptation data, proposed dimensional eigenvoice method showed higher performance than both conventional eigenvoice method and MLLR. Up to 26% of word error rate was reduced by the adaptation mode selection between eigenvoice and dimensional eigenvoice methods in comparison with conventional eigenvoice method.
송화전,김현우,정의석,오성찬,이전우,강동오,정준영,이윤근,Song, H.J.,Kim, H.W.,Chung, E.,Oh, S.,Lee, J.W.,Kang, D.,Jung, J.Y.,Lee, Y.K. 한국전자통신연구원 2019 전자통신동향분석 Vol.34 No.4
Currently, a majority of artificial intelligence is used to secure big data; however, it is concentrated in a few of major companies. Therefore, automatic data augmentation and efficient learning algorithms for small-scale data will become key elements in future artificial intelligence competitiveness. In addition, it is necessary to develop a technique to learn meanings, correlations, and time-related associations of complex modal knowledge similar to that in humans and expand and transfer semantic prediction/knowledge inference about unknown data. To this end, a neural memory model, which imitates how knowledge in the human brain is processed, needs to be developed to enable knowledge expansion through modality cooperative learning. Moreover, declarative and procedural knowledge in the memory model must also be self-developed through human interaction. In this paper, we reviewed this essential methodology and briefly described achievements that have been made so far.
Eigen - Environment 잡음 보상 방법을 이용한 강인한 음성인식
송화전,김형순,Song Hwa Jeon,Kim Hyung Soon 대한음성학회 2004 말소리 Vol.52 No.-
In this paper, a new noise compensation method based on the eigenvoice framework in feature space is proposed to reduce the mismatch between training and testing environments. The difference between clean and noisy environments is represented by the linear combination of K eigenvectors that represent the variation among environments. In the proposed method, the performance improvement of speech recognition systems is largely affected by how to construct the noisy models and the bias vector set. In this paper, two methods, the one based on MAP adaptation method and the other using stereo DB, are proposed to construct the noisy models. In experiments using Aurora 2 DB, we obtained 44.86% relative improvement with eigen-environment method in comparison with baseline system. Especially, in clean condition training mode, our proposed method yielded 66.74% relative improvement, which is better performance than several methods previously proposed in Aurora project.
다양한 변별분석을 통한 한국어 연결숫자 인식 성능향상에 관한 연구
송화전,김형순,Song Hwa Jeon,Kim Hyung Soon 대한음성학회 2002 말소리 Vol.44 No.-
In Korean, each digit is monosyllable and some pairs are known to have high confusability, causing performance degradation of connected digit recognition systems. To improve the performance, in this paper, we employ various discriminant analyses (DA) including Linear DA (LDA), Weighted Pairwise Scatter LDA WPS-LDA), Heteroscedastic Discriminant Analysis (HDA), and Maximum Likelihood Linear Transformation (MLLT). We also examine several combinations of various DA for additional performance improvement. Experimental results show that applying any DA mentioned above improves the string accuracy, but the amount of improvement of each DA method varies according to the model complexity or number of mixtures per state. Especially, more than 20% of string error reduction is achieved by applying MLLT after WPS-LDA, compared with the baseline system, when class level of DA is defined as a tied state and 1 mixture per state is used.
Sub-Stream 기반의 Eigenvoice를 이용한 고속 화자적응
송화전,이종석,김형순,Song, Hwa-Jeon,Lee, Jong-Seok,Kim, Hyung-Soon 대한음성학회 2005 말소리 Vol.55 No.-
In this paper, sub-stream based eigenvoice method is proposed to overcome the weak points of conventional eigenvoice and dimensional eigenvoice. In the proposed method, sub-streams are automatically constructed by the statistical clustering analysis that uses the correlation information between dimensions. To obtain the reliable distance matrix from covariance matrix for dividing into optimal sub-streams, MAP adaptation technique is employed to the covariance matrix of training data and the sample covariance of adaptation data. According to our experiments, the proposed method shows $41\%$ error rate reduction when the number of adaptation data is 50.
Probabilistic Bilinear Transformation Space-Based Joint Maximum A Posteriori Adaptation
송화전,이윤근,김형순 한국전자통신연구원 2012 ETRI Journal Vol.34 No.5
This letter proposes a more advanced joint maximum a posteriori (MAP) adaptation using a prior model based on a probabilistic scheme utilizing the bilinear transformation (BIT) concept. The proposed method not only has scalable parameters but is also based on a single prior distribution without the heuristic parameters of the previous joint BIT-MAP method. Experiment results, irrespective of the amount of adaptation data, show that the proposed method leads to a consistent improvement over the previous method.
Eigenspace-based MLLR에 기반한 고속 화자적응 및 환경보상
송화전,김형순,Song Hwa-Jeon,Kim Hyung-Soon 대한음성학회 2006 말소리 Vol.58 No.-
Maximum likelihood linear regression (MLLR) adaptation experiences severe performance degradation with very tiny amount of adaptation data. Eigenspace- based MLLR, as an alternative to MLLR for fast speaker adaptation, also has a weak point that it cannot deal with the mismatch between training and testing environments. In this paper, we propose a simultaneous fast speaker and environment adaptation based on eigenspace-based MLLR. We also extend the sub-stream based eigenspace-based MLLR to generalize the eigenspace-based MLLR with bias compensation. A vocabulary-independent word recognition experiment shows the proposed algorithm is superior to eigenspace-based MLLR regardless of the amount of adaptation data in diverse noisy environments. Especially, proposed sub-stream eigenspace-based MLLR with bias compensation yields 67% relative improvement with 10 adaptation words in 10 dB SNR environment, in comparison with the conventional eigenspace-based MLLR.