http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
The p-Norm of Likelihood Difference Estimation Algorithm for Hidden Markov Models
Sungrack Yun,Chang D. Yoo 대한전자공학회 2007 ITC-CSCC :International Technical Conference on Ci Vol.2007 No.7
This paper proposes a discriminative training algorithm which estimates the continuous-density hidden Markov model (CDHMM) parameters by minimizing the pnorm of log-likelihood difference (PLD) between the utterance probability given the correct transcription and the most competitive transcription.
Automatic Floating-point to Fixed-point Conversion for Speech Recognition in Fixed-point DSP
Sungrack Yun,Chang D. Yoo 대한전자공학회 2007 ITC-CSCC :International Technical Conference on Ci Vol.2007 No.7
This paper proposes a simple automatic conversion method from floating-point value operations to fixed-point value operations for implementing an automatic speech recognition (ASR) algorithm in fixed-point digital signal processor (DSP). The speech recognition algorithm requires very low floating-point value operations, and thus an underflow may occur. The proposed method uses two integers to compute very low floating-point values so that it prevents the underflow and satisfies the real-time processing with low cost.
Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification
Sungrack Yun,Yoo, C. D. IEEE 2012 IEEE - ACM Transactions on Audio, Speech, and Lang Vol.20 No.2
<P>This paper considers a learning framework for speech emotion classification using a discriminant function based on Gaussian mixture models (GMMs). The GMM parameter set is estimated by margin scaling with a loss function to reduce the risk of predicting emotions with high loss. Here, the loss function is defined as a function of a distance metric using the Watson and Tellegen's emotion model. Margin scaling is known to have good generalization ability and can be considered appropriate for emotion modeling where the parameter set is likely to be over-fitted to the training data set whose characteristics may differ from those of the testing data set. Our learning framework is formulated as a constrained optimization problem which is solved using semi-definite programming. Three tasks were evaluated: acted emotion classification, natural emotion classification, and cross database emotion classification. In each task, four loss functions were evaluated. In all experiments, results consistently show that margin scaling improves the classification accuracy over other learning frameworks based on the maximum-likelihood, maximum mutual information and max-margin framework without margin scaling. Experiment results also show that margin scaling substantially reduces the overall loss compared to the max-margin framework without margin scaling.</P>
로그 우도 차이의 p-norm에 기반한 은닉 마르코프 파라미터 추정 알고리듬
윤성락(Sungrack Yun),유창동(Chang D. Yoo) 대한전자공학회 2007 대한전자공학회 학술대회 Vol.2007 No.7
This paper proposes a discriminative training algorithm for estimating hidden Markov model (HMM) parameters. The proposed algorithm estimates the parameters by minimizing the p-norm of log-likelihood difference (PLD) between the utterance probability given the correct transcription and the most competitive transcription.
은닉 마르코프 모델의 최대 마진 훈련을 이용한 음성 감정 인식
윤성락(Sungrack Yun),이동훈(Donghoon Lee),백승렬(Seungryul Baek),박상혁(Sanghyuk Park),장달원(Dalwon Jang),유창동(Chag D. Yoo) 대한전자공학회 2010 대한전자공학회 학술대회 Vol.2010 No.6
In this paper, we propose a max-margin learning algorithm of hidden Markov model for speech emotion recognition. A max-margin learning leads to a good generalization ability on testing data even with small number of training data which may lead to an over-fitting. In the experiment, we observed that the proposed learning algorithm outperforms the learning criteria such as the maximum likelihood and maximum mutual information.
임베디드 디바이스에서 음성 인식 알고리듬 구현을 위한 부동 소수점 연산의 고정 소수점 연산 변환 기법
윤성락(Sungrack Yun),유창동(Chang D. Yoo) 대한전자공학회 2007 대한전자공학회 학술대회 Vol.2007 No.7
This paper proposes an automatic conversion method from floating-point value computations to fixed-point value computations for implementing automatic speech recognition (ASR) algorithms in embedded device.
Large Margin Discriminative Semi-Markov Model for Phonetic Recognition
Sungwoong Kim,Sungrack Yun,Yoo, C. D. IEEE 2011 IEEE - ACM Transactions on Audio, Speech, and Lang Vol.19 No.7
<P>This paper considers a large margin discriminative semi-Markov model (LMSMM) for phonetic recognition. The hidden Markov model (HMM) framework that is often used for phonetic recognition assumes only local statistical dependencies between adjacent observations, and it is used to predict a label for each observation without explicit phone segmentation. On the other hand, the semi-Markov model (SMM) framework allows simultaneous segmentation and labeling of sequential data based on a segment-based Markovian structure that assumes statistical dependencies among all the observations within a phone segment. For phonetic recognition which is inherently a joint segmentation and labeling problem, the SMM framework has the potential to perform better than the HMM framework at the expense of slight increase in computational complexity. The SMM framework considered in this paper is based on a non-probabilistic discriminant function that is linear in the joint feature map which attempts to capture long-range statistical dependencies among observations. The parameters of the discriminant function are estimated by a large margin learning framework for structured prediction. The parameter estimation problem in hand leads to an optimization problem with many margin constraints, and this constrained optimization problem is solved using a stochastic gradient descent algorithm. The proposed LMSMM outperformed the large margin discriminative HMM in the TIMIT phonetic recognition task.</P>
The Korean Large Vocabulary Continuous Speech Recognition Platform
Oh Wook Kwon,Sukbong Kwon,Sungrack Yun,Gyucheol Jang,Yong?Rae Kim,Bong?Wan Kim,Hoirin Kim,Changdong Yoo,Yong?Ju Lee 한국어정보학회 2008 한국어정보학 Vol.10 No.1
For educational and research purposes, we design and evaluate a Korean speech recognition platform to build a decoder. The platform has an object‐oriented architecture so that researchers can modify the platform easily and evaluate the performance of a recognition algorithm of their interests. The platform has the following functionalities: Noise reduction, speech detection, feature extraction, hidden Markov model (HMM)‐based acoustic modeling, cross‐word modeling, ngram language modeling, n‐best search, word graph generation, and Korean‐specific language processing. The platform can handle both lexical search trees for large vocabulary speech recognition, and finite‐state networks for small‐tomedium vocabulary speech recognition. It performs the word‐dependent n‐best search algorithm with a bigram language model in the first forward search stage, then extracts a word lattice, and finally rescores the lattice with a trigram language model in the second backward search stage. In a large vocabulary continuous speech recognition task, we compare the performance of the platform with HTK and Julius.
권오욱,권석봉,장규철,윤성락,김용래,장광동,김회린,유창동,김봉완,이용주,Kwon Oh-Wook,Kwon Sukbong,Jang Gyucheol,Yun Sungrack,Kim Yong-Rae,Jang Kwang-Dong,Kim Hoi-Rin,Yoo Changdong,Kim Bong-Wan,Lee Yong-Ju 한국음향학회 2005 韓國音響學會誌 Vol.24 No.8
We introduce a Korean speech recognition platform (ECHOS) developed for education and research Purposes. ECHOS lowers the entry barrier to speech recognition research and can be used as a reference engine by providing elementary speech recognition modules. It has an easy simple object-oriented architecture, implemented in the C++ language with the standard template library. The input of the ECHOS is digital speech data sampled at 8 or 16 kHz. Its output is the 1-best recognition result. N-best recognition results, and a word graph. The recognition engine is composed of MFCC/PLP feature extraction, HMM-based acoustic modeling, n-gram language modeling, finite state network (FSN)- and lexical tree-based search algorithms. It can handle various tasks from isolated word recognition to large vocabulary continuous speech recognition. We compare the performance of ECHOS and hidden Markov model toolkit (HTK) for validation. In an FSN-based task. ECHOS shows similar word accuracy while the recognition time is doubled because of object-oriented implementation. For a 8000-word continuous speech recognition task, using the lexical tree search algorithm different from the algorithm used in HTK, it increases the word error rate by $40\%$ relatively but reduces the recognition time to half. 교육 및 연구 목적을 위하여 개발된 한국어 음성인식 플랫폼인 ECHOS를 소개한다. 음성인식을 위한 기본 모듈을 제공하는 BCHOS는 이해하기 쉽고 간단한 객체지향 구조를 가지며, 표준 템플릿 라이브러리 (STL)를 이용한 C++ 언어로 구현되었다. 입력은 8또는 16 kHz로 샘플링된 디지털 음성 데이터이며. 출력은 1-beat 인식결과, N-best 인식결과 및 word graph이다. ECHOS는 MFCC와 PLP 특징추출, HMM에 기반한 음향모델, n-gram 언어모델, 유한상태망 (FSN)과 렉시컬트리를 지원하는 탐색알고리듬으로 구성되며, 고립단어인식으로부터 대어휘 연속음성인식에 이르는 다양한 태스크를 처리할 수 있다. 플랫폼의 동작을 검증하기 위하여 ECHOS와 hidden Markov model toolkit (HTK)의 성능을 비교한다. ECHOS는 FSN 명령어 인식 태스크에서 HTK와 거의 비슷한 인식률을 나타내고 인식시간은 객체지향 구현 때문에 약 2배 정도 증가한다. 8000단어 연속음성인식에서는 HTK와 달리 렉시컬트리 탐색 알고리듬을 사용함으로써 단어오류율은 $40\%$ 증가하나 인식시간은 0.5배로 감소한다.