http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
스펙트럼의 변동계수를 이용한 잡음에 강인한 음성 구간 검출
김영민,한민수,Kim Youngmin,Hahn Minsoo 대한음성학회 2003 말소리 Vol.48 No.-
This paper deals with a new parameter for voice detection which is used for many areas of speech engineering such as speech synthesis, speech recognition and speech coding. CV (Coefficient of Variation) of speech spectrum as well as other feature parameters is used for the detection of speech. CV is calculated only in the specific range of speech spectrum. Average magnitude and spectral magnitude are also employed to improve the performance of detector. From the experimental results the proposed voice detector outperformed the conventional energy-based detector in the sense of error measurements.
박태선,한민수,Park Taesun,Hahn Minsoo 대한음성학회 2003 말소리 Vol.47 No.-
This paper describes the use of pitch information for speaker identification. The recognition system is a GMM based one with 4 connected Korean digits speech database. The mean of the pitch period in voiced sections of speech are shown to be ,useful at discriminating between speakers. Utilizing this feature with Gaussian mixture model in the speaker identification system gave a marked improvement, maximum 6% improvement comparing to the baseline Gaussian mixture model.
코퍼스 기반 음성합성기를 위한 합성단위 경계 스펙트럼 평탄화 알고리즘
김상진,장경애,한민수,Kim Sang-Jin,Jang Kyung Ae,Hahn Minsoo 대한음성학회 2005 말소리 Vol.56 No.-
Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the fullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.
손성용,서정일,한민수,Son Sung Young,Seo Joung Il,Hahn Minsoo 대한음성학회 2003 말소리 Vol.47 No.-
It is difficult to implement sound field effect on real time using linear convolution in time domain because linear convolution needs much multiply operations. In this paper three ways is introduced to reduce multiplication operations. Firstly, linear convolution in time domain is replaced with circular convolution in frequency domain. It means that it operates multiplication in place of convolution. Secondly, one frame will be divided into several frames. It will reduce the multiplication operation in processing that transforms time domain into frequency domain. Finally, QFT will be used in place of FFT. Three ways result much reduction in multiplication operations. The reduction of the multiplication operation makes the real time implementation possible.
Block Filtering과 QFT를 이용한 실시간 음장 효과구현
손성용,서정일,한민수,Sohn Sung-Yong,Seo Jeongil,Hahn Minsoo 대한음성학회 2004 말소리 Vol.51 No.-
It is almost impossible to generate the sound field effect in real time with the time-domain linear convolution because of its large multiplication operation requirement. To solve this, three methods are introduced to reduce the number of multiplication operations in this paper. Firstly, the time-domain linear convolution is replaced with the frequency-domain circular convolution. In other words, the linear convolution result can be derived from that of the circular convolution. This technique reduces the number of multiplication operations remarkably, Secondly, a subframe concept is introduced, i.e., one original frame is divided into several subframes. Then the FFT is executed for each subframe and, as a result, the number of multiplication operations can be reduced. Finally, the QFT is used in stead of the FFT. By combining all the above three methods into our final the SFE generation algorithm, the number of computations are reduced sufficiently and the real-time SFE generation becomes possible with a general PC.
김상훈,이영직,한민수,Kim Sanghun,Lee Youngjik,Hahn Minsoo 대한음성학회 2003 말소리 Vol.47 No.-
This paper presents about the activities of speech database standardization in ETRI. Recently, with the support of government, ETRI and SiTEC have been gathering the large speech corpus for the domestic speech related companies. First, due to the lack of sharing the knowledge of speech database specification, the distributed speech database has a different format. Hence it seems to be needed to have the same format as soon as possible. ETRI and SiTEC are trying to find the better representation format of speech database. Second, we introduce a new description method of the annotation information of speech database. As one of the structured description method, XML based description will be applied to represent the metadata of the speech database. It will be continuously revised through the speech technology standard forum during this year.
공공 공간에서의 앰비언트 인포메이션 시스템을 위한 디자인 요소의 실제적 활용: 인터랙티브 LED 조명
김문정(Moonjung Kim),한민수(Minsoo Hahn) 한국HCI학회 2009 한국HCI학회 학술대회 Vol.2009 No.2
앰비언트 인포메이션 시스템(Ambient Information System)은 앰비언트 디스플레이(우리 주변에서의 빛과 소리 움직임의 미묘한 변화를 통하여 공간 안에서 정보를 보여주는 미디어)의 확장된 개념으로써 비관입적인 방법으로 정보를 전달한다[4]. 최근 사용자 개인의 감각과 경험에 영향을 주는 미디어의 중요성이 증대되고 있을 뿐만 아니라, 공공 장소에서의 앰비언트 인포메이션 시스템의 역할도 증대되고 있는 실정이다. 따라서 본 논문에서는 공공장소에서의 정보 인식에 대하여 여러 가지 디자인 요소를 비추어 봤을 때 가장 적합한 시스템 형태를 찾고 그것을 활용하는 데에 주안점을 두었다. 우리는 기존 연구를 토대로, 앰비언트 인포메이션 시스템의 실제적 활용에서 숙고해야 할 디자인 요소를 13가지로 추출하였다. 또한 이를 검증하기 위하여 'Followingflow'라고 명명한 인터랙티브 조명을 설치하고 실험하였다. 그 결과, 정보에 따른 추상적 패턴 정의 시 누구라도 이해할 수 있도록'통상적인 멘탈모델'을 고려할 필요가 있다고 사료되었다. 또한 공공 장소에 적합한 정보는 날씨와 같은 알려져도 좋을 사실과 소수 그룹의 정보라도 행인이 알 수 없다면 좋다고 여겼다. Ambient Information System is an expanded term from Ambient Displays that are aesthetically pleasing displays of information which sit on the periphery of a user's attention. It describes a large set of applications that publish information in a highly non-intrusive manner[4]. Recently, as importance of the media that affects human senses and individuals' experiences increases, the role of an Ambient Information System for Public is also increase. Therefore, in this paper, the emphasis is on finding a suitable form of the Ambient Information System based on some design dimensions and we sampled possible 13 design dimensions of the practical application of an Ambient Information System for Public. For verifying the validity of our findings, we installed "Follwingflow", an interactive LED illumination and carry out an experiment on the dimensions.
TBE 모델을 사용하는 HMM 기반 음성합성기 성능 향상을 위한 하모닉 선택에 기반한 MVF 예측 방법
박지훈(Park, Jihoon),한민수(Hahn, Minsoo) 한국음성학회 2012 말소리와 음성과학 Vol.4 No.4
In the two-band excitation (TBE) model, maximum voiced frequency (MVF) is the most important feature of the excitation parameter because the synthetic speech quality depends on MVF. Thus, this paper proposes an enhanced MVF estimation scheme based on the peak picking method. In the proposed scheme, the local peak and the peak lobe are picked from the spectrum of a linear predictive residual signal. The normalized distance between neighboring peak lobes is calculated and utilized as a feature to estimate MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves synthetic speech quality compared with that of the conventional one.