http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
임베디드 디바이스에서 음성 인식 알고리듬 구현을 위한 부동 소수점 연산의 고정 소수점 연산 변환 기법
윤성락(Sungrack Yun),유창동(Chang D. Yoo) 대한전자공학회 2007 대한전자공학회 학술대회 Vol.2007 No.7
This paper proposes an automatic conversion method from floating-point value computations to fixed-point value computations for implementing automatic speech recognition (ASR) algorithms in embedded device.
로그 우도 차이의 p-norm에 기반한 은닉 마르코프 파라미터 추정 알고리듬
윤성락(Sungrack Yun),유창동(Chang D. Yoo) 대한전자공학회 2007 대한전자공학회 학술대회 Vol.2007 No.7
This paper proposes a discriminative training algorithm for estimating hidden Markov model (HMM) parameters. The proposed algorithm estimates the parameters by minimizing the p-norm of log-likelihood difference (PLD) between the utterance probability given the correct transcription and the most competitive transcription.
은닉 마르코프 모델의 최대 마진 훈련을 이용한 음성 감정 인식
윤성락(Sungrack Yun),이동훈(Donghoon Lee),백승렬(Seungryul Baek),박상혁(Sanghyuk Park),장달원(Dalwon Jang),유창동(Chag D. Yoo) 대한전자공학회 2010 대한전자공학회 학술대회 Vol.2010 No.6
In this paper, we propose a max-margin learning algorithm of hidden Markov model for speech emotion recognition. A max-margin learning leads to a good generalization ability on testing data even with small number of training data which may lead to an over-fitting. In the experiment, we observed that the proposed learning algorithm outperforms the learning criteria such as the maximum likelihood and maximum mutual information.
권석봉,윤성락,장규철,김용래,김봉완,김회린,유창동,이용주,권오욱,Kwon, Suk-Bong,Yun, Sung-Rack,Jang, Gyu-Cheol,Kim, Yong-Rae,Kim, Bong-Wan,Kim, Hoi-Rin,Yoo, Chang-Dong,Lee, Yong-Ju,Kwon, Oh-Wook 대한음성학회 2006 말소리 Vol.59 No.-
We report the evaluation results of the Korean speech recognition platform called ECHOS. The platform has an object-oriented and reusable architecture so that researchers can easily evaluate their own algorithms. The platform has all intrinsic modules to build a large vocabulary speech recognizer: Noise reduction, end-point detection, feature extraction, hidden Markov model (HMM)-based acoustic modeling, cross-word modeling, n-gram language modeling, n-best search, word graph generation, and Korean-specific language processing. The platform supports both lexical search trees and finite-state networks. It performs word-dependent n-best search with bigram in the forward search stage, and rescores the lattice with trigram in the backward stage. In an 8000-word continuous speech recognition task, the platform with a lexical tree increases 40% of word errors but decreases 50% of recognition time compared to the HTK platform with flat lexicon. ECHOS reduces 40% of recognition errors through incorporation of cross-word modeling. With the number of Gaussian mixtures increasing to 16, it yields word accuracy comparable to the previous lexical tree-based platform, Julius.
권오욱,권석봉,장규철,윤성락,김용래,장광동,김회린,유창동,김봉완,이용주,Kwon Oh-Wook,Kwon Sukbong,Jang Gyucheol,Yun Sungrack,Kim Yong-Rae,Jang Kwang-Dong,Kim Hoi-Rin,Yoo Changdong,Kim Bong-Wan,Lee Yong-Ju 한국음향학회 2005 韓國音響學會誌 Vol.24 No.8
We introduce a Korean speech recognition platform (ECHOS) developed for education and research Purposes. ECHOS lowers the entry barrier to speech recognition research and can be used as a reference engine by providing elementary speech recognition modules. It has an easy simple object-oriented architecture, implemented in the C++ language with the standard template library. The input of the ECHOS is digital speech data sampled at 8 or 16 kHz. Its output is the 1-best recognition result. N-best recognition results, and a word graph. The recognition engine is composed of MFCC/PLP feature extraction, HMM-based acoustic modeling, n-gram language modeling, finite state network (FSN)- and lexical tree-based search algorithms. It can handle various tasks from isolated word recognition to large vocabulary continuous speech recognition. We compare the performance of ECHOS and hidden Markov model toolkit (HTK) for validation. In an FSN-based task. ECHOS shows similar word accuracy while the recognition time is doubled because of object-oriented implementation. For a 8000-word continuous speech recognition task, using the lexical tree search algorithm different from the algorithm used in HTK, it increases the word error rate by $40\%$ relatively but reduces the recognition time to half. 교육 및 연구 목적을 위하여 개발된 한국어 음성인식 플랫폼인 ECHOS를 소개한다. 음성인식을 위한 기본 모듈을 제공하는 BCHOS는 이해하기 쉽고 간단한 객체지향 구조를 가지며, 표준 템플릿 라이브러리 (STL)를 이용한 C++ 언어로 구현되었다. 입력은 8또는 16 kHz로 샘플링된 디지털 음성 데이터이며. 출력은 1-beat 인식결과, N-best 인식결과 및 word graph이다. ECHOS는 MFCC와 PLP 특징추출, HMM에 기반한 음향모델, n-gram 언어모델, 유한상태망 (FSN)과 렉시컬트리를 지원하는 탐색알고리듬으로 구성되며, 고립단어인식으로부터 대어휘 연속음성인식에 이르는 다양한 태스크를 처리할 수 있다. 플랫폼의 동작을 검증하기 위하여 ECHOS와 hidden Markov model toolkit (HTK)의 성능을 비교한다. ECHOS는 FSN 명령어 인식 태스크에서 HTK와 거의 비슷한 인식률을 나타내고 인식시간은 객체지향 구현 때문에 약 2배 정도 증가한다. 8000단어 연속음성인식에서는 HTK와 달리 렉시컬트리 탐색 알고리듬을 사용함으로써 단어오류율은 $40\%$ 증가하나 인식시간은 0.5배로 감소한다.