RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 원문제공처
          펼치기
        • 등재정보
          펼치기
        • 학술지명
          펼치기
        • 주제분류
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 저자
          펼치기

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 무료
      • 기관 내 무료
      • 유료
      • KCI등재

        한국어 방언 음성 인식에 대한 한국어학적 고찰

        김아름 국어문학회 2023 국어문학 Vol.82 No.-

        This study aims to provide a linguistic analysis of the problem of recognizing Korean dialects in speech recognition, by examining the characteristics of dialects that cause errors in speech recognition experiments using dialectal speech data. To accomplish this, we classified dialectal features that differ from standard Korean into two categories: substitution and deletion. Our analysis of the results of speech recognition experiments revealed that dialectal features such as the merging of phonemes, substitution between vowels that are not discriminable in some environments, and omission of phonemes that only exist in dialects or are numerically infrequent, act as obstacles to speech recognition. We also describe how grammatical morphemes used in dialects that are homonymous with those in standard Korean or differences in pronunciation at the word level between dialects and standard Korean also lead to a significant number of recognition errors. This study sheds light on the challenges of recognizing Korean dialects in speech recognition and provides insights for developing more accurate and efficient speech recognition systems. 본 연구는 한국어 방언 음성 인식 문제를 한국어학적 관점에서 관찰해 보고자 하는 시론적 성격의 논의로, 방언 음성 데이터를 활용한 음성 인식 실험을 통해 음성 인식 오류를 유발하는 방언 특성을 개괄하는 것을 목적으로 한다. 이를 위해 표준어형과 차이를 보이는 방언 특성을, ‘치환’ 유형과 ‘삭제’ 유형으로 나누어 음성 인식 실험 결과를 분석하였다. 그 결과 음소 차원의 합류를 보이거나, 일부 환경에서 서로 변별되지 않는 모음들 간의 치환 유형, 방언형에만 존재하는 탈락 현상 또는 수의적으로 적용되는 탈락 현상 등이 음성 인식의 방해 요소로 작용하는 방언 특성임을 주장하였다. 또한 방언형에서 사용되는 문법 형태소가 표준어형의 문법 형태소와 동음 관계에 있거나, 개별 단어들의 발음 차원에서 표준어형과 차이를 보이는 어형들도 다수의 인식 오류를 유발한다는 점을 서술하였다.

      • Speech and emotion recognition using a multi-output model

        Min Dong Jin,Jongho Won,Deok-Hwan Kim 한국차세대컴퓨팅학회 2022 한국차세대컴퓨팅학회 학술대회 Vol.2022 No.10

        Voice language, the primary way of human communication, delivers not only verbal information but also emotional information through various characteristics such as voice intonation, height, and surrounding environment. Currently, many studies focus on grasping emotion and speech recognition on voice for human-computer interaction and are developing deep neural network models by extracting various frequency characteristics of speech. Representatives of these speech-based deep learning algorithms include speech recognition, namely speech-to-text or automatic speech recognition, and speech emotion recognition. The development of these two algorithms has been developed for a long time, but multi-output algorithms that process them in parallel at the same time are rare. This paper introduces a multi-output model that recognizes speech and emotion in one voice, thinking that simultaneously understanding language and emotion, which are the most critical information in a human voice, will significantly help human-computer interaction. This model confirmed that there was no significant difference between the training of the language and emotion recognition models separately, with a word error rate of 6.59% in the speech recognition section and an accuracy of 79.67% on average in the emotion recognition section.

      • KCI등재

        가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상

        정경용,오상엽 한국디지털정책학회 2018 디지털융복합연구 Vol.16 No.11

        Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system. 정확한 인식률을 보이고 있는 상업적인 음성인식 시스템은 화자종속 고립데이터로부터 학습 모델을 사용한다. 그러나 잡음 환경에서 데이터양에 따라 음성인식의 성능이 저하되는 문제점이 있다. 본 논문에서는 가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상을 제안한다. 제안하는 방법은 음성에 대한 특징을 가지고 벡터 양자화와 Maximum Log Likelihood 음성 특징 추출 방법을 이용하여 유사 음성에 대한 음성 인식의 정확성을 높이는 최적 학습 모델 구성 방법이다. 이를 위해 HMM을 기반으로 음성 특징을 추출하는 방법을 사용한다. 제안하는 방법을 사용하여 기존 시스템에서 생성되어 사용되는 음성 모델에 대한 부정확한 음성 모델에 대한 정확성을 향상시킬 수 있으므로 음성 인식에 강인한 모델을 구성할 수 있다. 제안하는 방법은 음성 인식 시스템에서 향상된 인식의 정확도를 보인다.

      • KCI등재

        멀티밴드 스펙트럼 차감법과 엔트로피 하모닉을 이용한 잡음환경에 강인한 분산음성인식

        최갑근(Gab-Keun Choi),김순협(Soon-Hyob Kim) 大韓電子工學會 2011 電子工學會論文誌-CI (Computer and Information) Vol.48 No.1

        음성인식의 실용화에 가장 저해되는 요소는 배경잡음과 채널에 의한 왜곡이다. 일반적으로 잡음은 음성인식 시스템의 성능을 저하시키고 이로 인해 사용 장소의 제약을 많이 받고 있다. DSR(Distributed Speech Recognition) 기반의 음성인식 역시 이와 같은 문제로 성능 향상에 어려움을 겪고 있다. 이 논문은 잡음환경에서 DSR기반의 음성인식률 향상을 위해 정확한 음성구간을 검출하고, 잡음을 제거하여 잡음에 강인한 특징추출을 하도록 설계하였다. 제안된 방법은 엔트로피와 음성의 하모닉을 이용해 음성구간을 검출하며 멀티밴드 스펙트럼 차감법을 이용하여 잡음을 제거한다. 음성의 스펙트럴 에너지에 대한 엔트로피를 사용하여 음성검출을 하게 되면 비교적 높은 SNR 환경(SNR 15dB)에서는 성능이 우수하나 잡음환경의 변화에 따라 음성과 비음성의 문턱 값이 변화하여 낮은 SNR환경(SNR 0dB)에서는 정확한 음성 검출이 어렵다. 이 논문은 낮은 SNR 환경(0dB)에서도 정확한 음성을 검출할 수 있도록 음성의 스펙트럴 엔트로피와 하모닉 성분을 이용하였으며 정확한 음성 구간 검출에 따라 잡음을 제거하여 잡음에 강인한 특징을 추출하도록 하였다. 실험결과 잡음환경에 따른 인식조건에서 개선된 인식성능을 보였다. The background noises and distortions by channel are major factors that disturb the practical use of speech recognition. Usually, noise reduce the performance of speech recognition system. DSR(Distributed Speech Recognition) based speech recognition also has difficulty of improving performance for this reason. Therefore, to improve DSR-based speech recognition under noisy environment, this paper proposes a method which detects accurate speech region to extract accurate features. The proposed method distinguish speech and noise by using entropy and detection of spectral energy of speech. The speech detection by the spectral energy of speech shows good performance under relatively high SNR(SNR 15dB). But when the noise environment varies, the threshold between speech and noise also varies, and speech detection performance reduces under low SNR(SNR 0dB) environment. The proposed method uses the spectral entropy and harmonics of speech for better speech detection. Also, the performance of AFE is increased by precise speech detections. According to the result of experiment, the proposed method shows better recognition performance under noise environment.

      • KCI등재

        Error Correction for Korean Speech Recognition using a LSTM-based Sequence-to-Sequence Model

        Hye-won Jin(진혜원),A-Hyeon Lee(이아현),Ye-Jin Chae(채예진),Su-Hyun Park(박수현),Yu-Jin Kang(강유진),Soowon Lee(이수원) 한국컴퓨터정보학회 2021 韓國컴퓨터情報學會論文誌 Vol.26 No.10

        현재 대부분의 음성인식 오류 교정에 관한 연구는 영어를 기준으로 연구되어 한국어 음성인식에 대한 연구는 미비한 실정이다. 하지만 영어 음성인식에 비해 한국어 음성인식은 한국어의 언어적인 특성으로 인해 된소리, 연음 등의 발음이 있어, 비교적 많은 오류를 보이므로 한국어 음성인식에 대한 연구가 필요하다. 또한, 기존의 한국어 음성인식 연구는 주로 편집 거리 알고리즘과 음절 복원 규칙을 사용하기 때문에, 된소리와 연음의 오류 유형을 교정하기 어렵다. 본 연구에서는 된소리, 연음 등 발음으로 인한 한국어 음성인식 오류를 교정하기 위하여 LSTM을 기반으로 한 인공 신경망 모델 Sequence-to-Sequence와 Bahdanau Attention을 결합하는 문맥 기반 음성인식 후처리 모델을 제안한다. 실험 결과, 해당 모델을 사용함으로써 음성인식 성능은 된소리의 경우 64%에서 77%, 연음의 경우 74%에서 90%, 평균 69%에서 84%로 인식률이 향상되었다. 이를 바탕으로 음성인식을 기반으로 한 실제 응용프로그램에도 본 연구에서 제안한 모델을 적용할 수 있다고 사료된다. Recently, since most of the research on correcting speech recognition errors is based on English, there is not enough research on Korean speech recognition. Compared to English speech recognition, however, Korean speech recognition has many errors due to the linguistic characteristics of Korean language, such as Korean Fortis and Korean Liaison, thus research on Korean speech recognition is needed. Furthermore, earlier works primarily focused on editorial distance algorithms and syllable restoration rules, making it difficult to correct the error types of Korean Fortis and Korean Liaison. In this paper, we propose a context-sensitive post-processing model of speech recognition using a LSTM-based sequence-to-sequence model and Bahdanau attention mechanism to correct Korean speech recognition errors caused by the pronunciation. Experiments showed that by using the model, the speech recognition performance was improved from 64% to 77% for Fortis, 74% to 90% for Liaison, and from 69% to 84% for average recognition than before. Based on the results, it seems possible to apply the proposed model to real-world applications based on speech recognition.

      • KCI등재

        대학생들이 또렷한 음성과 대화체로 발화한 영어문단의 구글음성인식

        양병곤(Yang, Byunggon) 한국음성학회 2017 말소리와 음성과학 Vol.9 No.4

        These days voice models of speech recognition software are sophisticated enough to process the natural speech of people without any previous training. However, not much research has reported on the use of speech recognition tools in the field of pronunciation education. This paper examined Google speech recognition of a short English paragraph produced by Korean college students in clear and casual speech styles in order to diagnose and resolve students’ pronunciation problems. Thirty three Korean college students participated in the recording of the English paragraph. The Google soundwriter was employed to collect data on the word recognition rates of the paragraph. Results showed that the total word recognition rate was 73% with a standard deviation of 11.5%. The word recognition rate of clear speech was around 77.3% while that of casual speech amounted to 68.7%. The reasons for the low recognition rate of casual speech were attributed to both individual pronunciation errors and the software itself as shown in its fricative recognition. Various distributions of unrecognized words were observed depending on each participant and proficiency groups. From the results, the author concludes that the speech recognition software is useful to diagnose each individual or group’s pronunciation problems. Further studies on progressive improvements of learners’ erroneous pronunciations would be desirable.

      • KCI등재

        Recognition of Emotion and Emotional Speech Based on Prosodic Processing

        Kim, Sung-Ill The Acoustical Society of Korea 2004 韓國音響學會誌 Vol.23 No.e3

        This paper presents two kinds of new approaches, one of which is concerned with recognition of emotional speech such as anger, happiness, normal, sadness, or surprise. The other is concerned with emotion recognition in speech. For the proposed speech recognition system handling human speech with emotional states, total nine kinds of prosodic features were first extracted and then given to prosodic identifier. In evaluation, the recognition results on emotional speech showed that the rates using proposed method increased more greatly than the existing speech recognizer. For recognition of emotion, on the other hands, four kinds of prosodic parameters such as pitch, energy, and their derivatives were proposed, that were then trained by discrete duration continuous hidden Markov models(DDCHMM) for recognition. In this approach, the emotional models were adapted by specific speaker's speech, using maximum a posteriori(MAP) estimation. In evaluation, the recognition results on emotional states showed that the rates on the vocal emotions gradually increased with an increase of adaptation sample number.

      • KCI등재

        양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식

        강가람(Garam Kang),권오병(Ohbyung Kwon) 한국지능정보시스템학회 2021 지능정보연구 Vol.27 No.2

        Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speakers attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speakers speech determines the type of sentence or has functions and information such as the speakers intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

      • KCI등재후보

        Noisy Speech Recognition Based on Noise-Adapted HMMs Using Speech Feature Compensation

        Chung, Yong-Joo The Korea Institute of Convergence Signal Processi 2014 융합신호처리학회 논문지 (JISPS) Vol.15 No.2

        The vector Taylor series (VTS) based method usually employs clean speech Hidden Markov Models (HMMs) when compensating speech feature vectors or adapting the parameters of trained HMMs. It is well-known that noisy speech HMMs trained by the Multi-condition TRaining (MTR) and the Multi-Model-based Speech Recognition framework (MMSR) method perform better than the clean speech HMM in noisy speech recognition. In this paper, we propose a method to use the noise-adapted HMMs in the VTS-based speech feature compensation method. We derived a novel mathematical relation between the train and the test noisy speech feature vector in the log-spectrum domain and the VTS is used to estimate the statistics of the test noisy speech. An iterative EM algorithm is used to estimate train noisy speech from the test noisy speech along with noise parameters. The proposed method was applied to the noise-adapted HMMs trained by the MTR and MMSR and could reduce the relative word error rate significantly in the noisy speech recognition experiments on the Aurora 2 database.

      • 네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계

        이길호,윤재삼,오유리,김홍국,Lee, Gil-Ho,Yoon, Jae-Sam,Oh, Yoo-Rhee,Kim, Hong-Kook 대한음성학회 2005 말소리 Vol.54 No.-

        Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼