RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      Robust voice activity detection using formant frequencies

      한글로보기

      https://www.riss.kr/link?id=T13838878

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract)

      Voice activity detection (VAD) can distinguish human speech from other sounds. Various applications?including speech coding and speech recognition?can benefit from VAD. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. For many real-life applications, noise can frequently occur in an unexpected manner, and it is therefore difficult to accurately determine the characteristics of noise in such situations. As a result, robust VAD algorithms that are less dependent on correct noise estimates are more desirable for real-life applications. Formants are the major spectral peaks of human voice and are highly useful for distinguishing human vowel sounds. Because of the characteristics of their spectral peaks, formants are likely to survive in a signal after severe corruption by noise, making them attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, nonrelevant spectral peaks from background noise make it difficult to accurately extract formants from noisy signals. In this paper, a simple formant-based VAD algorithm is proposed that overcomes the problem of formant detection under conditions with severe noise. The proposed method has much faster processing time and outperforms standard VAD algorithms under various noise conditions. The robustness against various types of noise and the light computational load of the proposed method make it suitable for various applications.
      번역하기

      Voice activity detection (VAD) can distinguish human speech from other sounds. Various applications?including speech coding and speech recognition?can benefit from VAD. To accurately detect voice activity, the algorithm must take into account the char...

      Voice activity detection (VAD) can distinguish human speech from other sounds. Various applications?including speech coding and speech recognition?can benefit from VAD. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. For many real-life applications, noise can frequently occur in an unexpected manner, and it is therefore difficult to accurately determine the characteristics of noise in such situations. As a result, robust VAD algorithms that are less dependent on correct noise estimates are more desirable for real-life applications. Formants are the major spectral peaks of human voice and are highly useful for distinguishing human vowel sounds. Because of the characteristics of their spectral peaks, formants are likely to survive in a signal after severe corruption by noise, making them attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, nonrelevant spectral peaks from background noise make it difficult to accurately extract formants from noisy signals. In this paper, a simple formant-based VAD algorithm is proposed that overcomes the problem of formant detection under conditions with severe noise. The proposed method has much faster processing time and outperforms standard VAD algorithms under various noise conditions. The robustness against various types of noise and the light computational load of the proposed method make it suitable for various applications.

      더보기

      목차 (Table of Contents)

      • CHAPTER 1 INTRODUCTION 1
      • CHAPTER 2 RELATED WORKS 5
      • 2.1 Speech-Related Features 7
      • 2.1.1 Energy and zero-crossing rate (ZCR) 7
      • 2.1.2 Spectral entropy 9
      • CHAPTER 1 INTRODUCTION 1
      • CHAPTER 2 RELATED WORKS 5
      • 2.1 Speech-Related Features 7
      • 2.1.1 Energy and zero-crossing rate (ZCR) 7
      • 2.1.2 Spectral entropy 9
      • 2.1.3 Band-partitioned spectral entropy 10
      • 2.2 Statistical Methods 12
      • 2.2.1 Likelihood ratio test (LRT)-based method 12
      • 2.2.2 Distributional modeling of speech signals 14
      • 2.2.3 Parametric representation of speech signals 15
      • 2.3 G.729 Annex.B Algorithm 16
      • 2.3.1 Feature extraction 17
      • 2.3.2 Background noise parameter estimation 19
      • 2.3.3 Multiboundary VAD decision 20
      • 2.3.4 VAD decision smoothing 22
      • 2.4 ETSI AMR Option 1 Algorithm 23
      • 2.4.1 Feature extraction 24
      • 2.4.2 Background noise parameter estimation 24
      • 2.4.3 Initial VAD decision 25
      • 2.4.4 Hang-over addition 25
      • 2.5 ETSI AMR Option 2 Algorithm 26
      • 2.5.1 Feature extraction 27
      • 2.5.2 Background noise parameter estimation 28
      • 2.5.3 VAD decision 29
      • 2.5.4 Hang-over addition 31
      • 2.6 Summary 33
      • CHAPTER 3 IN-DEPTH ANALYSIS OF SIGNAL CORRUPTIONS BY NOISES 34
      • 3.1 Analysis of Spectral Peaks 36
      • 3.2 Vector Distance Metrics 39
      • 3.2.1 Unnormalized vector distance metric 39
      • 3.2.2 Normalized vector distance metric by total energies 41
      • 3.2.3 Normalized vector distance metric by maximum energies 43
      • 3.3 Spectral Peak-Based Metric 46
      • 3.3.1 Direct comparison of spectral peak bands 46
      • 3.3.2 Peak extraction-based approach 48
      • 3.4 Summary 50
      • CHAPTER 4 DIRECT SIMILARITY COMPUTATION BETWEEN PEAK SIGNATURE AND CORRUPTED SPECTRUM 51
      • 4.1 Peak Valley Difference (PVD) 52
      • 4.1.1 Analysis of differences in average energy 52
      • 4.1.2 VAD using average energy differences 54
      • 4.1.3 Remarks on PVD algorithm 55
      • 4.2 Peak-Neighbor Difference (PND) 56
      • 4.2.1 VAD using formant frequencies 56
      • 4.2.2 Band-limited computation for increased robustness against noises 58
      • 4.2.3 Threshold calculation and post processing 60
      • CHAPTER 5 EXPERIMENTS 61
      • 5.1 Experimental Conditions 61
      • 5.1.1 Data preparation 61
      • 5.1.2 Evaluation metrics 62
      • 5.1.3 Noise mixing using FaNT 63
      • 5.1.4 Baseline systems 64
      • 5.1.5 Test sets 64
      • 5.2 Aurora-2 Results 66
      • 5.2.1 Averaged accuracy by noise type 66
      • 5.2.2 Averaged accuracy by SNR level 67
      • 5.3 NOISEX-92 Results 68
      • 5.3.1 Averaged accuracy by noise type 68
      • 5.3.2 Averaged accuracy by SNR level 69
      • 5.4 Music Results 70
      • 5.4.1 Averaged accuracy by noise type 70
      • 5.4.2 Averaged accuracy by SNR level 71
      • 5.5 Contours of VAD algorithms 72
      • 5.6 Computational overheads 75
      • CHAPTER 6 CONCLUSION 77
      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼